Oh, good idea. ok thanks, I'll try that.
On Wed, Jul 31, 2019 at 9:28 AM Yan, Zheng <ukernel(a)gmail.com> wrote:
On Wed, Jul 31, 2019 at 9:24 PM Wyllys Ingersoll
<wyllys.ingersoll(a)keepertech.com> wrote:
Unfortunately, we are note prepared to upgrade to Nautilus yet. Are
there any
other ideas to try with Mimic?
you don't need to upgrade. Just install nautilus in a temp machine or
compile ceph.
> On Tue, Jul 30, 2019 at 9:48 PM Yan, Zheng <ukernel(a)gmail.com> wrote:
>>
>> On Tue, Jul 30, 2019 at 9:53 PM Wyllys Ingersoll
>> <wyllys.ingersoll(a)keepertech.com> wrote:
>>
>>
>> > I had a bad experience upgrading from Luminous to
Mimic, in no small
part due to a few critical errors on my part. Some data was lost (none of
it critical), but the majority was unaffected. At this point I'd just like
to get access to the data that is still there, but I cannot get the MDS
servers to start. They initially were failing with the following errors:
>>
>>
>> > 2019-07-29 11:38:55.481 7f6167a35700 -1
/build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void
MDCache::add_inode(CInode*)' thread 7f6167a35700 time 2019-07-29
11:38:55.478864
>> > /build/ceph-13.2.6/src/mds/MDCache.cc: 290: FAILED assert(!p)
>>
>> > ceph version
13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
> > 1: (ceph::__ceph_assert_fail(char
const*, char const*, int, char
const*)+0x14e) [0x7f616f6f397e]
> > 2: (()+0x2fab07) [0x7f616f6f3b07]
> > 3: /usr/bin/ceph-mds() [0x5a9d0e]
> > 4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&,
CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
> > 5:
(Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d)
[0x560a3d]
>> > 6: (Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
>> > 7: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
>> > 8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
>> > 9: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
>> > 10: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
>> > 11: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
>> > 12: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
>> > 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
>> > 14: (()+0x76ba) [0x7f616ef6f6ba]
>> > 15: (clone()+0x6d) [0x7f616e79841d]
>>
>> > 2019-07-29
11:38:55.485 7f6167a35700 -1 *** Caught signal (Aborted) **
>> > in thread 7f6167a35700 thread_name:ms_dispatch
>>
>> > ceph version
13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
> > 1: (()+0x11390) [0x7f616ef79390]
> > 2: (gsignal()+0x38) [0x7f616e6c6428]
> > 3: (abort()+0x16a) [0x7f616e6c802a]
> > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256) [0x7f616f6f3a86]
> > 5: (()+0x2fab07) [0x7f616f6f3b07]
> > 6: /usr/bin/ceph-mds() [0x5a9d0e]
> > 7: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&,
CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
> > 8:
(Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d)
[0x560a3d]
> > 9:
(Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
> > 10: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
> > 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
> > 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
> > 13: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
> > 14: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
> > 15: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
> > 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
> > 17: (()+0x76ba) [0x7f616ef6f6ba]
> > 18: (clone()+0x6d) [0x7f616e79841d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
>>
>>
>> > I then followed all of the MDS recovery instructions
(
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster…)
which took several days to complete. Now, when I attempt to start the
MDS, I get a different crash and the mds still fails to start up:
>>
>> > -10000>
2019-07-30 09:29:31.943 7f6ba2c21700 -1
/build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void
MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t,
CInode**, CDentry::linkage_t*)' thread 7f6ba2c21700 time 2019-07-30
09:29:31.943654
> > /build/ceph-13.2.6/src/mds/MDCache.cc:
1680: FAILED assert(follows >=
realm->get_newest_seq())
>>
>> > ceph version
13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
> > 1: (ceph::__ceph_assert_fail(char
const*, char const*, int, char
const*)+0x14e) [0x7f6bada9497e]
> > 2: (()+0x2fab07) [0x7f6bada94b07]
> > 3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
> > 4:
(MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
snapid_t)+0xc0)
[0x5f8450]
> > 5:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
> > 6:
(Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
> > 7: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
> > 8: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*,
bool)+0x3dd) [0x652f6d]
>> > 9: (Locker::scatter_tick()+0x1e4) [0x6535a4]
>> > 10: (Locker::tick()+0x9) [0x6538b9]
>> > 11: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
>> > 12: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
>> > 13: (Context::complete(int)+0x9) [0x4d31d9]
>> > 14: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
>> > 15: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
>> > 16: (()+0x76ba) [0x7f6bad3106ba]
>> > 17: (clone()+0x6d) [0x7f6bacb3941d]
>>
>> > -10000>
2019-07-30 09:29:31.951 7f6ba2c21700 -1 *** Caught signal
(Aborted) **
>> > in thread 7f6ba2c21700 thread_name:safe_timer
>>
>> > ceph version
13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
> > 1: (()+0x11390) [0x7f6bad31a390]
> > 2: (gsignal()+0x38) [0x7f6baca67428]
> > 3: (abort()+0x16a) [0x7f6baca6902a]
> > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256) [0x7f6bada94a86]
> > 5: (()+0x2fab07) [0x7f6bada94b07]
> > 6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
> > 7:
(MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
snapid_t)+0xc0)
[0x5f8450]
> > 8:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
> > 9:
(Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
> > 10: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
> > 11: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*,
bool)+0x3dd) [0x652f6d]
> > 12: (Locker::scatter_tick()+0x1e4)
[0x6535a4]
> > 13: (Locker::tick()+0x9) [0x6538b9]
> > 14: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
> > 15: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
> > 16: (Context::complete(int)+0x9) [0x4d31d9]
> > 17: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
> > 18: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
> > 19: (()+0x76ba) [0x7f6bad3106ba]
> > 20: (clone()+0x6d) [0x7f6bacb3941d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
>>
>>
>>
>>
> Is there any hope of getting my MDS servers back up and accessing the
cephfs data at this point? None of the guides I've followed thus far have
been successful.
>>
>>
>> nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix
>> snaptable. hopefully it will fix your issue.
>>
>> > thanks,
>> > Wyllys Ingersoll
>>
>>
>> > _______________________________________________
>> > Dev mailing list -- dev(a)ceph.io
>> > To unsubscribe send an email to dev-leave(a)ceph.io