lz4 ? It's not obviously related, but I've seen it involved in really
non-obvious ways:
https://tracker.ceph.com/issues/39525
-- dan
On Wed, May 20, 2020 at 2:27 PM Ashley Merrick <singapore(a)amerrick.co.uk> wrote:
>
> Thanks, fyi the OSD's that went down back two pools, an Erasure code Meta (RBD)
and cephFS Meta. The cephFS Pool does have compresison enabled ( I noticed it mentioned in
the ceph tracker)
>
>
>
> Thanks
>
>
>
>
>
> ---- On Wed, 20 May 2020 20:17:33 +0800 Igor Fedotov <ifedotov(a)suse.de> wrote
----
>
>
>
> Hi Ashley,
>
> looks like this is a regression. Neha observed similar error(s) during
> here QA run, see
https://tracker.ceph.com/issues/45613
>
>
> Please preserve broken OSDs for a while if possible, likely I'll come
> back to you for more information to troubleshoot.
>
>
> Thanks,
>
> Igor
>
> On 5/20/2020 1:26 PM, Ashley Merrick wrote:
>
> > So reading online it looked a dead end error, so I recreated the 3 OSD's on
that node and now working fine after a reboot.
> >
> >
> >
> > However I restarted the next server with 3 OSD's and one of them is now
facing the same issue.
> >
> >
> >
> > Let me know if you need any more logs.
> >
> >
> >
> > Thanks
> >
> >
> >
> > ---- On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick
<mailto:singapore@amerrick.co.uk> wrote ----
> >
> >
> > I just upgraded a cephadm cluster from 15.2.1 to 15.2.2.
> >
> >
> >
> > Everything went fine on the upgrade, however after restarting one node that has
3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error:
> >
> >
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 4 rocksdb: [db/version_set.cc:3757] Recovered from manifest
file:db/MANIFEST-002768 succeeded,manifest_file_number is 2768, next_file_number is 2775,
last_sequence is 188026749, log_number is 2767,prev_log_number is 0,max_column_family is
0,min_log_number_to_keep is 0
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log
number is 2767
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1589963382599157,
"job": 1, "event": "recovery_started",
"log_files": [2769]}
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #2769 mode 0
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 537526 bytes;
Corruption: error in middle of record
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 32757 bytes;
Corruption: missing start of fragmented record(1)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 3 rocksdb: [db/db_impl_open.cc:518] db/002769.log: dropping 23263 bytes;
Corruption: missing start of fragmented record(2)
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1 rocksdb: Corruption: error in middle of record
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.602+0000
7fbcc46f7ec0 1 bdev(0x558a28dd0700 /var/lib/ceph/osd/ceph-0/block) close
> >
> > May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.870+0000
7fbcc46f7ec0 1 bdev(0x558a28dd0000 /var/lib/ceph/osd/ceph-0/block) close
> >
> > May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1 osd.0 0 OSD:init: unable to mount object store
> >
> > May 20 08:29:43 sn-m01 bash[6833]: debug 2020-05-20T08:29:43.118+0000
7fbcc46f7ec0 -1 ** ERROR: osd init failed: (5) Input/output error
> >
> >
> >
> > Have I hit a bug, or is there something I can do to try and fix these
OSD's?
> >
> >
> >
> > Thanks
> > _______________________________________________
> > ceph-users mailing list -- mailto:mailto:ceph-users@ceph.io
> > To unsubscribe send an email to mailto:mailto:ceph-users-leave@ceph.io
> > _______________________________________________
> > ceph-users mailing list -- mailto:ceph-users@ceph.io
> > To unsubscribe send an email to mailto:ceph-users-leave@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io