[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

20 May 2020

I attached the log but was too big and got moderated.

Here is it in a paste bin : https://pastebin.pl/view/69b2beb9

I have cut the log to start from the point of the original upgrade.

Thanks

---- On Wed, 20 May 2020 20:55:51 +0800 Igor Fedotov &lt;ifedotov(a)suse.de&gt; wrote ----

Dan, thanks for the info. Good to know. 

Failed QA run in the ticket uses snappy though. 

And in fact any stuff writing to process memory can  introduce data 
corruption in the similar manner. 

So will keep that in mind but IMO relation to compression is still not 
evident... 

Kind regards, 

Igor 

On 5/20/2020 3:32 PM, Dan van der Ster wrote: 
> lz4 ? It's not obviously related, but I've seen it involved in really 
> non-obvious ways: https://tracker.ceph.com/issues/39525 
> 
> -- dan 
> 
> On Wed, May 20, 2020 at 2:27 PM Ashley Merrick
<mailto:singapore@amerrick.co.uk> wrote: 
>> Thanks, fyi the OSD's that went down back two pools, an Erasure code Meta
(RBD) and cephFS Meta. The cephFS Pool does have compresison enabled ( I noticed it
mentioned in the ceph tracker) 
>> 
>> 
>> 
>> Thanks 
>> 
>> 
>> 
>> 
>> 
>> ---- On Wed, 20 May 2020 20:17:33 +0800 Igor Fedotov
<mailto:ifedotov@suse.de> wrote ---- 
>> 
>> 
>> 
>> Hi Ashley, 
>> 
>> looks like this is a regression. Neha observed similar error(s) during 
>> here QA run, see https://tracker.ceph.com/issues/45613 
>> 
>> 
>> Please preserve broken OSDs for a while if possible, likely I'll come 
>> back to you for more information to troubleshoot. 
>> 
>> 
>> Thanks, 
>> 
>> Igor 
>> 
>> On 5/20/2020 1:26 PM, Ashley Merrick wrote: 
>> 
>>> So reading online it looked a dead end error, so I recreated the 3 OSD's
on that node and now working fine after a reboot. 
>>> 
>>> 
>>> 
>>> However I restarted the next server with 3 OSD's and one of them is now
facing the same issue. 
>>> 
>>> 
>>> 
>>> Let me know if you need any more logs. 
>>> 
>>> 
>>> 
>>> Thanks 
>>> 
>>> 
>>> 
>>> ---- On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick
<mailto:mailto:singapore@amerrick.co.uk> wrote ---- 
>>> 
>>> 
>>> I just upgraded a cephadm cluster from 15.2.1 to 15.2.2. 
>>> 
>>> 
>>> 
>>> Everything went fine on the upgrade, however after restarting one node that
has 3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error:

>>> 
>>> 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb: [db/version_set.cc:3757] Recovered from manifest
file:db/MANIFEST-002768 succeeded,manifest_file_number is 2768, next_file_number is 2775,
last_sequence is 188026749, log_number is 2767,prev_log_number is 0,max_column_family is
0,min_log_number_to_keep is 0 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log
number is 2767 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1589963382599157,
"job": 1, "event": "recovery_started",
"log_files": [2769]} 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #2769 mode 0 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 537526 bytes;
Corruption: error in middle of record 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 23263 bytes;
Corruption: missing start of fragmented record(2) 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  4 rocksdb: [db/db_impl.cc:563] Shutdown complete 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1 rocksdb: Corruption: error in middle of record 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db: 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  1 bdev(0x558a28dd0700 /var/lib/ceph/osd/ceph-0/block) close 
>>> 
>>> May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.870+0000
7fbcc46f7ec0  1 bdev(0x558a28dd0000 /var/lib/ceph/osd/ceph-0/block) close 
>>> 
>>> May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1 osd.0 0 OSD:init: unable to mount object store 
>>> 
>>> May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1  ** ERROR: osd init failed: (5) Input/output error 
>>> 
>>> 
>>> 
>>> Have I hit a bug, or is there something I can do to try and fix these
OSD's? 
>>> 
>>> 
>>> 
>>> Thanks 
>>> _______________________________________________ 
>>> ceph-users mailing list -- mailto:mailto:mailto:ceph-users@ceph.io 
>>> To unsubscribe send an email to mailto:mailto:mailto:ceph-users-leave@ceph.io

>>> _______________________________________________ 
>>> ceph-users mailing list -- mailto:mailto:ceph-users@ceph.io 
>>> To unsubscribe send an email to mailto:mailto:ceph-users-leave@ceph.io 
>> _______________________________________________ 
>> ceph-users mailing list -- mailto:ceph-users@ceph.io 
>> To unsubscribe send an email to mailto:ceph-users-leave@ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record