Hallo Dan, thanks for your patience!
Il 5/22/2020 1:57 PM, Dan van der Ster ha scritto:
The procedure to overwrite a corrupted osdmap on a
given osd is
described at
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html
I wouldn't do that type of low level manipulation just yet -- better
to understand the root cause of he corruptions first before
potentially making things worse.
Ok.
There is also this issue which seems related:
https://tracker.ceph.com/issues/24423
(It has a fix in mimic and nautilus).
Could you share some more logs e.g. with the full backtrace from the
time they first crashed, and now failing to start. And maybe
/var/log/messages shows crc mismatches?
Please find at
https://pastebin.com/UjfmPT37 the first occurrence of
problem on OSD 72, and at
https://pastebin.com/paumyAZ1
what happens when I start ceph-osd@72 after leaving it stopped for one
minute (the same open_db failure is the only thing I find in
/var/log/messages).
On a different OSD (63), same kind as 72, I see in /var/log/messages:
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.087529
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected
0x1d9fdf3, device location [0x3105041f000~1000], logical extent
0x3cf000~1000, object
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088194
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected
0x1d9fdf3, device location [0x3105041f000~1000], logical extent
0x3cf000~1000, object
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088856
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected
0x1d9fdf3, device location [0x3105041f000~1000], logical extent
0x3cf000~1000, object
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.089659
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected
0x1d9fdf3, device location [0x3105041f000~1000], logical extent
0x3cf000~1000, object
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc:
In function 'void BlueStore::_do_write_small(BlueStore::TransContext*,
BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t,
ceph::buffer::list::iterator&, BlueStore::WriteContext*)' thread
7fa90a000700 time 2020-05-21 13:20:40.089716
May 21 13:20:40 r1srv07 ceph-osd:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc:
10176: FAILED assert(r >= 0 && r <= (int)head_read)
For this same OSD 63 I tried ceph-bluestore-tool:
[root(a)r1srv07.pa1 ceph]# time ceph-bluestore-tool --path
/var/lib/ceph/osd/cephpa1-63 --deep 1 --command repair ; date
2020-05-22 16:12:19.643443 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device
location [0x10000~1000], logical extent 0x0~1000, object
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.644037 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device
location [0x10000~1000], logical extent 0x0~1000, object
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.644542 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device
location [0x10000~1000], logical extent 0x0~1000, object
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.645019 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device
location [0x10000~1000], logical extent 0x0~1000, object
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.645050 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error:
#-1:7b3f43c4:::osd_superblock:0# error during read: 0~21a (5)
Input/output error
2020-05-22 16:20:05.454969 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07,
device location [0x304e857c000~1000], logical extent 0x36c000~1000,
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.509771 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07,
device location [0x304e857c000~1000], logical extent 0x36c000~1000,
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.577770 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07,
device location [0x304e857c000~1000], logical extent 0x36c000~1000,
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.606838 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07,
device location [0x304e857c000~1000], logical extent 0x36c000~1000,
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.607031 7ff93afaeec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error:
#1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# error
during read: 0~400000 (5) Input/output error
.....
still going on, but I am not confident it will end up in anything good,
will see.
Thanks!
Fulvio
-- dan
On Fri, May 22, 2020 at 1:02 PM Fulvio Galeazzi <fulvio.galeazzi(a)garr.it> wrote:
Hallo Dan, thanks for your reply! Very good to know about compression...
will not try to use it before upgrading to Nautilus.
Problem is, I did not activate it on this cluster (see below).
Moreover, that would only account for the issue on disks dedicated to
object storage, if I understand it correctly.
The symptoms of your "*Second problem*" look similar, failing to load
an osdmap. TBH though, BTW, I believe that that lz4 bug might only
corrupt an osdmap when compression is enabled for the whole OSD, i.e.
with bluestore_compression_mode=aggressive, not when compression is
enabled just for specific pools.
Alas, if you didn't have that enabled, then your bug must be something else.
> Another important info which I forgot in the previous message: my SAN is
> actually composed of 6 independent chains, such that a glitch at the SAN
> level is hardly the culprit, while something along the lines of the bug
> you pointed me to sounds more reasonable.
>
> Also, how to deal with the "failed to load with the OSD map"?
>
> Thanks!
>
> Fulvio
>
>
> [root(a)r1srv05.pa1 ~]# grep compress /etc/ceph/cephpa1.conf
> [root(a)r1srv05.pa1 ~]#
> [root(a)r1srv05.pa1 ~]# ceph --cluster cephpa1 osd pool ls | sort | xargs
> -i ceph --cluster ceph osd pool get {} compression_mode
> Error ENOENT: option 'compression_mode' is not set on pool
'cephfs_data'
> Error ENOENT: option 'compression_mode' is not set on pool
'cephfs_metadata'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'cinder-ceph-ec-pa1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'cinder-ceph-ec-pa1-cl1-cache'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'cinder-ceph-pa1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'cinder-ceph-pa1-devel'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'cinder-ceph-rr-pa1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.buckets.data'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.buckets.index'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.control'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.data.root'
> Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.gc'
> Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.log'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.users.keys'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'default.rgw.users.uid'
> Error ENOENT: option 'compression_mode' is not set on pool
'ec-pool-fg'
> Error ENOENT: option 'compression_mode' is not set on pool
'glance-ct1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
'glance-pa1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'glance-pa1-devel'
> Error ENOENT: option 'compression_mode' is not set on pool
'gnocchi-pa1-cl1'
> Error ENOENT: option 'compression_mode' is not set on pool
'iscsivolumes'
> Error ENOENT: option 'compression_mode' is not set on pool 'k8s'
> Error ENOENT: option 'compression_mode' is not set on pool 'rbd'
> Error ENOENT: option 'compression_mode' is not set on pool
'.rgw.root'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.intent-log'
> Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.log'
> Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.rgw.buckets'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.rgw.buckets.extra'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.rgw.buckets.index'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.rgw.control'
> Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.gc'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.rgw.root'
> Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.usage'
> Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.users'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.users.email'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.users.swift'
> Error ENOENT: option 'compression_mode' is not set on pool
> 'testrgw.users.uid'
>
>
> Il 5/22/2020 8:50 AM, Dan van der Ster ha scritto:
>> Hi Fulvio,
>>
>> The symptom of several osd's all asserting at the same time in
>> OSDMap::get_map really sounds like this bug:
>>
https://tracker.ceph.com/issues/39525
>>
>> lz4 compression is buggy on CentOS 7 and Ubuntu 18.04 -- you need to
>> disable compression or use a different algorithm. Mimic and nautilus
>> will get a workaround, but it's not planned to be backported to
>> luminous.
>>
>> -- Dan
>>
>> On Thu, May 21, 2020 at 11:18 PM Fulvio Galeazzi
>> <fulvio.galeazzi(a)garr.it> wrote:
>>>
>>> Hallo all,
>>> hope you can help me with very strange problems which arose
>>> suddenly today. Tried to search, also in this mailing list, but could
>>> not find anything relevant.
>>>
>>> At some point today, without any action from my side, I noticed some
>>> OSDs in my production cluster would go down and never come up.
>>> I am on Luminous 12.2.13, CentOS7, kernel 3.10: my setup is non-standard
>>> as OSD disks are served off a SAN (which is for sure OK now, although I
>>> cannot exclude some glitch).
>>> Tried to reboot OSD servers a few times, ran "activate --all",
added
>>> bluestore_ignore_data_csum=true in the [osd] section in ceph.conf...
>>> the number of "down" OSDs changed for a while but now seems rather
stable.
>>>
>>>
>>> There are actually two classes of problems (bit more details right below):
>>> - ERROR: osd init failed: (5) Input/output error
>>> - failed to load OSD map for epoch 141282, got 0 bytes
>>>
>>>
>>> *First problem*
>>> This affects 50 OSDs (all disks of this kind, on all but one server):
>>> these OSDs are reserved for object storage but I am not yet using them
>>> so I may in principle recreate them. But would be interested in
>>> understanding what the problem is, and learn how to solve it for future
>>> reference.
>>> Here is what I see in logs:
>>> .....
>>> 2020-05-21 21:17:48.661348 7fa2e9a95ec0 1 bluefs add_block_device bdev
>>> 1 path /var/lib/ceph/osd/cephpa1-72/block size 14.5TiB
>>> 2020-05-21 21:17:48.661428 7fa2e9a95ec0 1 bluefs mount
>>> 2020-05-21 21:17:48.662040 7fa2e9a95ec0 1 bluefs _init_alloc id 1
>>> alloc_size 0x10000 size 0xe83a3400000
>>> 2020-05-21 21:52:43.858464 7fa2e9a95ec0 -1 bluefs mount failed to replay
>>> log: (5) Input/output error
>>> 2020-05-21 21:52:43.858589 7fa2e9a95ec0 1 fbmap_alloc 0x55c6bba92e00
>>> shutdown
>>> 2020-05-21 21:52:43.858728 7fa2e9a95ec0 -1
>>> bluestore(/var/lib/ceph/osd/cephpa1-72) _open_db failed bluefs mount:
>>> (5) Input/output error
>>> 2020-05-21 21:52:43.858790 7fa2e9a95ec0 1 bdev(0x55c6bbdb6600
>>> /var/lib/ceph/osd/cephpa1-72/block) close
>>> 2020-05-21 21:52:44.103536 7fa2e9a95ec0 1 bdev(0x55c6bbdb8600
>>> /var/lib/ceph/osd/cephpa1-72/block) close
>>> 2020-05-21 21:52:44.352899 7fa2e9a95ec0 -1 osd.72 0 OSD:init: unable to
>>> mount object store
>>> 2020-05-21 21:52:44.352956 7fa2e9a95ec0 -1 ESC[0;31m ** ERROR: osd init
>>> failed: (5) Input/output errorESC[0m
>>>
>>> *Second problem*
>>> This affects 11 OSDs, which I use *in production* for Cinder block
>>> storage: looks like all PGs for this pool are currently OK.
>>> Here is the excerpt from the logs.
>>> .....
>>> -5> 2020-05-21 20:52:06.756469 7fd2ccc19ec0 0 _get_class not
>>> permitted to load kvs
>>> -4> 2020-05-21 20:52:06.759686 7fd2ccc19ec0 1 <cls>
>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/rgw/cls_rgw.cc:3869:
>>>
>>> Loaded rgw class!
>>> -3> 2020-05-21 20:52:06.760021 7fd2ccc19ec0 1 <cls>
>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/log/cls_log.cc:299:
>>>
>>> Loaded log class!
>>> -2> 2020-05-21 20:52:06.760730 7fd2ccc19ec0 1 <cls>
>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/replica_log/cls_replica_log.cc:135:
>>>
>>> Loaded replica log class!
>>> -1> 2020-05-21 20:52:06.760873 7fd2ccc19ec0 -1 osd.63 0 failed to
>>> load OSD map for epoch 141282, got 0 bytes
>>> 0> 2020-05-21 20:52:06.763277 7fd2ccc19ec0 -1
>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:
>>>
>>> In function 'OSDMapRef OSDService::get_map(epoch_t)' thread
7fd2ccc19ec0
>>> time 2020-05-21 20:52:06.760916
>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:
>>>
>>> 994: FAILED assert(ret)
>>>
>>> Has anyone any idea how I could fix these problems, or what I could
>>> do to try and shed some light? And also, what caused them, and whether
>>> there is some magic configuration flag I could use to protect my cluster?
>>>
>>> Thanks a lot for your help!
>>>
>>> Fulvio
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
> Fulvio Galeazzi
> GARR-CSD Department
> skype: fgaleazzi70
> tel.: +39-334-6533-250
>
--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250