Unfortunately, we are note prepared to upgrade to Nautilus yet. Are there
any other ideas to try with Mimic?
On Tue, Jul 30, 2019 at 9:48 PM Yan, Zheng <ukernel(a)gmail.com> wrote:
On Tue, Jul 30, 2019 at 9:53 PM Wyllys Ingersoll
<wyllys.ingersoll(a)keepertech.com> wrote:
I had a bad experience upgrading from Luminous to Mimic, in no small
part due to a
few critical errors on my part. Some data was lost (none of
it critical), but the majority was unaffected. At this point I'd just like
to get access to the data that is still there, but I cannot get the MDS
servers to start. They initially were failing with the following errors:
2019-07-29 11:38:55.481 7f6167a35700 -1
/build/ceph-13.2.6/src/mds/MDCache.cc: In
function 'void
MDCache::add_inode(CInode*)' thread 7f6167a35700 time 2019-07-29
11:38:55.478864
/build/ceph-13.2.6/src/mds/MDCache.cc: 290:
FAILED assert(!p)
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
1: (ceph::__ceph_assert_fail(char const*, char
const*, int, char
const*)+0x14e) [0x7f616f6f397e]
2: (()+0x2fab07) [0x7f616f6f3b07]
3: /usr/bin/ceph-mds() [0x5a9d0e]
4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&,
CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
5:
(Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d)
[0x560a3d]
6:
(Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
7: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
9: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
10: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
11: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
12: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
14: (()+0x76ba) [0x7f616ef6f6ba]
15: (clone()+0x6d) [0x7f616e79841d]
2019-07-29 11:38:55.485 7f6167a35700 -1 *** Caught signal (Aborted) **
in thread 7f6167a35700 thread_name:ms_dispatch
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
1: (()+0x11390) [0x7f616ef79390]
2: (gsignal()+0x38) [0x7f616e6c6428]
3: (abort()+0x16a) [0x7f616e6c802a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256)
[0x7f616f6f3a86]
5: (()+0x2fab07) [0x7f616f6f3b07]
6: /usr/bin/ceph-mds() [0x5a9d0e]
7: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&,
CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
8:
(Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d)
[0x560a3d]
9:
(Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
10: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
13: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
14: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
15: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
17: (()+0x76ba) [0x7f616ef6f6ba]
18: (clone()+0x6d) [0x7f616e79841d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to
interpret this.
I then followed all of the MDS recovery instructions (
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster…)
which took several days to complete. Now, when I attempt to start the
MDS, I get a different crash and the mds still fails to start up:
-10000> 2019-07-30 09:29:31.943 7f6ba2c21700 -1
/build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void
MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t,
CInode**, CDentry::linkage_t*)' thread 7f6ba2c21700 time 2019-07-30
09:29:31.943654
/build/ceph-13.2.6/src/mds/MDCache.cc: 1680:
FAILED assert(follows >=
realm->get_newest_seq())
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
1: (ceph::__ceph_assert_fail(char const*, char
const*, int, char
const*)+0x14e) [0x7f6bada9497e]
2: (()+0x2fab07) [0x7f6bada94b07]
3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t,
CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
4: (MDCache::journal_dirty_inode(MutationImpl*,
EMetaBlob*, CInode*,
snapid_t)+0xc0) [0x5f8450]
5:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
6:
(Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
7: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
8: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*,
bool)+0x3dd)
[0x652f6d]
9: (Locker::scatter_tick()+0x1e4) [0x6535a4]
10: (Locker::tick()+0x9) [0x6538b9]
11: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
12: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
13: (Context::complete(int)+0x9) [0x4d31d9]
14: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
15: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
16: (()+0x76ba) [0x7f6bad3106ba]
17: (clone()+0x6d) [0x7f6bacb3941d]
-10000> 2019-07-30 09:29:31.951 7f6ba2c21700 -1 *** Caught signal
(Aborted) **
> in thread 7f6ba2c21700 thread_name:safe_timer
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
1: (()+0x11390) [0x7f6bad31a390]
2: (gsignal()+0x38) [0x7f6baca67428]
3: (abort()+0x16a) [0x7f6baca6902a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256)
[0x7f6bada94a86]
5: (()+0x2fab07) [0x7f6bada94b07]
6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t,
CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
7: (MDCache::journal_dirty_inode(MutationImpl*,
EMetaBlob*, CInode*,
snapid_t)+0xc0) [0x5f8450]
8:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
9:
(Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
10: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
11: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*,
bool)+0x3dd)
[0x652f6d]
12: (Locker::scatter_tick()+0x1e4) [0x6535a4]
13: (Locker::tick()+0x9) [0x6538b9]
14: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
15: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
16: (Context::complete(int)+0x9) [0x4d31d9]
17: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
18: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
19: (()+0x76ba) [0x7f6bad3106ba]
20: (clone()+0x6d) [0x7f6bacb3941d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to
interpret this.
Is there any hope of getting my MDS servers back up and accessing the
cephfs data
at this point? None of the guides I've followed thus far have
been successful.
nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix
snaptable. hopefully it will fix your issue.
> thanks,
> Wyllys Ingersoll
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io