Hi Francois,
Could you please share OSD startup log with debug-bluestore (and
debug-bluefs) set to 20.
Also please run ceph-bluestore-tool's bluefs-bdev-sizes command and
share the output.
Thanks,
Igor
On 4/28/2020 12:55 AM, Francois Legrand wrote:
> Hi all,
>
> *** Short version ***
> Is there a way to repair a rocksdb from errors "Encountered error
> while reading data from compression dictionary block Corruption: block
> checksum mismatch" and "_open_db erroring opening db" ?
>
>
> *** Long version ***
> We operate a nautilus ceph cluster (with 100 disks of 8TB in 6 servers
> + 4 mons/mgr + 3 mds).
> We recently (Monday 20) upgraded from 14.2.7 to 14.2.8. This triggered
> a rebalancing of some data.
> Two days later (Wednesday 22) we had a very short power outage. Only
> one of the osd servers went down (and unfortunately died).
> This triggered a reconstruction of the losts osds. Operations went
> fine until Saturday 25 where some osds in the 5 remaining servers
> started to crash apparently with no reasons.
> We tryed to restart them, but they crashed again. We ended with 18 osd
> down (+ 16 in the dead server so 34 osd downs out of 100).
> Looking at the logs we found for all the crashed osd :
>
> -237> 2020-04-25 16:32:51.835 7f1f45527a80 3 rocksdb:
> [table/block_based_table_reader.cc:1117] Encountered error while
> reading data from compression dictionary block Corruption: block
> checksum mismatch: expected 0, got 2729370997 in db/181355.sst offset
> 18446744073709551615 size 18446744073709551615
>
> and
>
> 2020-04-25 16:05:47.251 7fcbd1e46a80 -1
> bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring opening db:
>
> We also noticed that the "Encountered error while reading data from
> compression dictionary block Corruption: block checksum mismatch" was
> present few days before the crash.
> We also have some osd with this error but still up.
>
> We tryed to repair with :
> ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-3 repair
> But no success (it ends with _open_db erroring opening db).
>
> Thus does somebody have an idea to fix this or at least know if it's
> possible to repair and correct the "Encountered error while reading
> data from compression dictionary block Corruption: block checksum
> mismatch" and "_open_db erroring opening db" !
> Thanks for your help (we are desperate because we will loose datas and
> are fighting to save something) !!!
> F.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io