[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

15 Nov 2019

Hi Simon,

Do you mean both standalone DB and(!!) standalone WAL devices/partitions 
by having SSD DB/WAL?

If so then BlueFS might eventually overwrite some data at you DB volume 
with BlueFS log content. Which most probably makes OSD crash and unable 
to restart one day. This is quite random and not very frequent event 
which is to some degree dependent from cluster loading. And the period 
between actual data corruption and any evidence of this is non-zero most 
of the time - we tend to see it mostly when RocksDB was performing 
compaction.

Other OSD configuration which might suffer from the issue is main device 
+ WAL devices.

Much less failure probability exists for main + DB layout. It requires 
almost full DB to get any chances to appear.

Main-only device configurations aren't under the threat as far as I can 
tell.

Thanks,

Igor

On 11/15/2019 12:40 PM, Simon Ironside wrote:
> Hi,
>
> I have two new-ish 14.2.4 clusters that began life on 14.2.0 , all 
> with HDD OSDs with SSD DB/WALs but neither have experienced obvious 
> problems yet.
>
> What's the impact of this? Does possible data corruption mean possible 
> silent data corruption?
> Or does the corruption cause the OSD failures mentioned on the tracker 
> and you're basically ok if you either haven't had a failure or if you 
> keep on top of failures the way you would if they were normal disk 
> failures?
>
> Thanks,
> Simon
>
> On 14/11/2019 16:10, Sage Weil wrote:
>> Hi everyone,
>>
>> We've identified a data corruption bug[1], first introduced[2] (by yours
>> truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption
>> appears as a rocksdb checksum error or assertion that looks like
>>
>> os/bluestore/fastbmap_allocator_impl.h: 750: FAILED 
>> ceph_assert(available >= allocated)
>>
>> or in some cases a rocksdb checksum error.  It only affects BlueStore 
>> OSDs
>> that have a separate 'db' or 'wal' device.
>>
>> We have a fix[3] that is working its way through testing, and will
>> expedite the next Nautilus point release (14.2.5) once it is ready.
>>
>> If you are running 14.2.2 or 14.2.1 and use BlueStore OSDs with
>> separate 'db' volumes, you should consider waiting to upgrade
>> until 14.2.5 is released.
>>
>> A big thank you to Igor Fedotov and several *extremely* helpful users 
>> who
>> managed to reproduce and track down this problem!
>>
>> sage
>>
>>
>> [1] https://tracker.ceph.com/issues/42223
>> [2] 
>>
https://github.com/ceph/ceph/commit/096033b9d931312c0688c2eea7e14626bfde0ad…
>> [3] https://github.com/ceph/ceph/pull/31621
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4