Here is the output of ceph-bluestore-tool bluefs-bdev-sizes
inferring bluefs devices from bluestore path
slot 1 /var/lib/ceph/osd/ceph-5/block -> /dev/dm-17
1 : device size 0x746c0000000 : own 0x[37e1eb00000~4a82900000] =
0x4a82900000 : using 0x5bc780000(23 GiB)
the result of the debug-bluestore (and debug-bluefs) set to 20 for osd.5
is at the following address (28MB).
https://wetransfer.com/downloads/a193ab15ab5e2395fe2462c963507a7f2020042814…
Thanks for your help.
F.
Le 28/04/2020 à 13:33, Igor Fedotov a écrit :
Hi Francois,
Could you please share OSD startup log with debug-bluestore (and
debug-bluefs) set to 20.
Also please run ceph-bluestore-tool's bluefs-bdev-sizes command and
share the output.
Thanks,
Igor
On 4/28/2020 12:55 AM, Francois Legrand wrote:
> Hi all,
>
> *** Short version ***
> Is there a way to repair a rocksdb from errors "Encountered error
> while reading data from compression dictionary block Corruption:
> block checksum mismatch" and "_open_db erroring opening db" ?
>
>
> *** Long version ***
> We operate a nautilus ceph cluster (with 100 disks of 8TB in 6
> servers + 4 mons/mgr + 3 mds).
> We recently (Monday 20) upgraded from 14.2.7 to 14.2.8. This
> triggered a rebalancing of some data.
> Two days later (Wednesday 22) we had a very short power outage. Only
> one of the osd servers went down (and unfortunately died).
> This triggered a reconstruction of the losts osds. Operations went
> fine until Saturday 25 where some osds in the 5 remaining servers
> started to crash apparently with no reasons.
> We tryed to restart them, but they crashed again. We ended with 18
> osd down (+ 16 in the dead server so 34 osd downs out of 100).
> Looking at the logs we found for all the crashed osd :
>
> -237> 2020-04-25 16:32:51.835 7f1f45527a80 3 rocksdb:
> [table/block_based_table_reader.cc:1117] Encountered error while
> reading data from compression dictionary block Corruption: block
> checksum mismatch: expected 0, got 2729370997 in db/181355.sst
> offset 18446744073709551615 size 18446744073709551615
>
> and
>
> 2020-04-25 16:05:47.251 7fcbd1e46a80 -1
> bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring opening db:
>
> We also noticed that the "Encountered error while reading data from
> compression dictionary block Corruption: block checksum mismatch" was
> present few days before the crash.
> We also have some osd with this error but still up.
>
> We tryed to repair with :
> ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-3 repair
> But no success (it ends with _open_db erroring opening db).
>
> Thus does somebody have an idea to fix this or at least know if it's
> possible to repair and correct the "Encountered error while reading
> data from compression dictionary block Corruption: block checksum
> mismatch" and "_open_db erroring opening db" !
> Thanks for your help (we are desperate because we will loose datas
> and are fighting to save something) !!!
> F.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io