[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

20 May 2020

Do you still have any original failure logs?

On 5/20/2020 3:45 PM, Ashley Merrick wrote:
...
  Is a single shared main device.

 Sadly I had already rebuilt the failed OSD's to bring me back in the 
 green after a while.
 I have just tried a few restarts and none are failing (seems after a 
 rebuild using 15.2.2 they are stable?)

 I don't have any other servers/OSD's I am willing to risk not starting 
 right this minute ,if it does happen again I will grab the logs.

 *@Dan* yeah is using lz4

 Thanks
 ---- On Wed, 20 May 2020 20:30:27 +0800 *Igor Fedotov 
 &lt;ifedotov(a)suse.de&gt;* wrote ----

     I don't believe compression is related to be honest.

     Wondering if these OSDs have standalone WAL and/or DB devices or
     just a single shared main device.

     Also could you please set debug-bluefs/debug-bluestore to 20 and
     collect startup log for broken OSD.

     Kind regards,

     Igor

     On 5/20/2020 3:27 PM, Ashley Merrick wrote:

         Thanks, fyi the OSD's that went down back two pools, an
         Erasure code Meta (RBD) and cephFS Meta. The cephFS Pool does
         have compresison enabled ( I noticed it mentioned in the ceph
         tracker)

         Thanks

         ---- On Wed, 20 May 2020 20:17:33 +0800 *Igor Fedotov
         &lt;ifedotov(a)suse.de&gt; <mailto:ifedotov@suse.de>* wrote ----

             Hi Ashley,

             looks like this is a regression. Neha observed similar
             error(s) during
             here QA run, see https://tracker.ceph.com/issues/45613

             Please preserve broken OSDs for a while if possible,
             likely I'll come
             back to you for more information to troubleshoot.

             Thanks,

             Igor

             On 5/20/2020 1:26 PM, Ashley Merrick wrote:

  So reading online it looked a dead end error, so
I              recreated the 3 OSD's on that node and now working fine
             after a reboot.

 However I restarted the next server with 3 OSD's and one              of them
is now facing the same issue.

 Let me know if you need any more logs.

 Thanks

 ---- On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick             
&lt;singapore(a)amerrick.co.uk
             <mailto:singapore@amerrick.co.uk>> wrote ----

 I just upgraded a cephadm cluster from 15.2.1 to 15.2.2.

 Everything went fine on the upgrade, however after              restarting one node
that has 3 OSD's for ecmeta two of the
             3 ODS's now wont boot with the following error:

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb:
             [db/version_set.cc:3757] Recovered from manifest
             file:db/MANIFEST-002768 succeeded,manifest_file_number is
             2768, next_file_number is 2775, last_sequence is
             188026749, log_number is 2767,prev_log_number is
             0,max_column_family is 0,min_log_number_to_keep is 0

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb:
             [db/version_set.cc:3766] Column family [default] (ID 0),
             log number is 2767

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb:
             EVENT_LOG_v1 {"time_micros": 1589963382599157, "job": 1,
             "event": "recovery_started", "log_files":
[2769]}

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  4 rocksdb:
             [db/db_impl_open.cc:583] Recovering log #2769 mode 0

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 537526
             bytes; Corruption: error in middle of record

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.598+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 32757
             bytes; Corruption: missing start of fragmented record(1)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  3 rocksdb:
             [db/db_impl_open.cc:518] db/002769.log: dropping 23263
             bytes; Corruption: missing start of fragmented record(2)

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  4 rocksdb:
             [db/db_impl.cc:390] Shutdown: canceling all background work

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  4 rocksdb:
             [db/db_impl.cc:563] Shutdown complete

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1 rocksdb:
             Corruption: error in middle of record

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1
             bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring
             opening db:

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.602+0000
7fbcc46f7ec0  1
             bdev(0x558a28dd0700 /var/lib/ceph/osd/ceph-0/block) close

 May 20 08:29:42 sn-m01 bash[6833]: debug              2020-05-20T08:29:42.870+0000
7fbcc46f7ec0  1
             bdev(0x558a28dd0000 /var/lib/ceph/osd/ceph-0/block) close

 May 20 08:29:43 sn-m01 bash[6833]: debug              2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1 osd.0 0
             OSD:init: unable to mount object store

 May 20 08:29:43 sn-m01 bash[6833]: debug              2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1  ** ERROR:
             osd init failed: (5) Input/output error

 Have I hit a bug, or is there something I can do to try              and fix these
OSD's?

 Thanks
 _______________________________________________
 ceph-users mailing list -- mailto:ceph-users@ceph.io             
<mailto:ceph-users@ceph.io>
  To unsubscribe send an email to            
 mailto:ceph-users-leave@ceph.io
             <mailto:ceph-users-leave@ceph.io>
  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io             
<mailto:ceph-users@ceph.io>
  To unsubscribe send an email to
ceph-users-leave(a)ceph.io              <mailto:ceph-users-leave@ceph.io>

2024

2023

2022

2021

2020

2019

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record