On Wed, Jul 31, 2019 at 9:33 AM Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote:

Oh, good idea. ok thanks, I'll try that.

On Wed, Jul 31, 2019 at 9:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
On Wed, Jul 31, 2019 at 9:24 PM Wyllys Ingersoll
<wyllys.ingersoll@keepertech.com> wrote:
>
> Unfortunately, we are note prepared to upgrade to Nautilus yet. Are there any other ideas to try with Mimic?
>

you don't need to upgrade. Just install nautilus in a temp machine or
compile ceph.

> On Tue, Jul 30, 2019 at 9:48 PM Yan, Zheng <ukernel@gmail.com> wrote:
>>
>> On Tue, Jul 30, 2019 at 9:53 PM Wyllys Ingersoll
>> <wyllys.ingersoll@keepertech.com> wrote:
>> >
>> >
>> > I had a bad experience upgrading from Luminous to Mimic, in no small part due to a few critical errors on my part. Some data was lost (none of it critical), but the majority was unaffected. At this point I'd just like to get access to the data that is still there, but I cannot get the MDS servers to start. They initially were failing with the following errors:
>> >
>> >
>> > 2019-07-29 11:38:55.481 7f6167a35700 -1 /build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void MDCache::add_inode(CInode*)' thread 7f6167a35700 time 2019-07-29 11:38:55.478864
>> > /build/ceph-13.2.6/src/mds/MDCache.cc: 290: FAILED assert(!p)
>> >
>> > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f616f6f397e]
>> > 2: (()+0x2fab07) [0x7f616f6f3b07]
>> > 3: /usr/bin/ceph-mds() [0x5a9d0e]
>> > 4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
>> > 5: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d) [0x560a3d]
>> > 6: (Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
>> > 7: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
>> > 8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
>> > 9: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
>> > 10: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
>> > 11: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
>> > 12: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
>> > 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
>> > 14: (()+0x76ba) [0x7f616ef6f6ba]
>> > 15: (clone()+0x6d) [0x7f616e79841d]
>> >
>> > 2019-07-29 11:38:55.485 7f6167a35700 -1 *** Caught signal (Aborted) **
>> > in thread 7f6167a35700 thread_name:ms_dispatch
>> >
>> > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
>> > 1: (()+0x11390) [0x7f616ef79390]
>> > 2: (gsignal()+0x38) [0x7f616e6c6428]
>> > 3: (abort()+0x16a) [0x7f616e6c802a]
>> > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f616f6f3a86]
>> > 5: (()+0x2fab07) [0x7f616f6f3b07]
>> > 6: /usr/bin/ceph-mds() [0x5a9d0e]
>> > 7: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xfcf) [0x55e37f]
>> > 8: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xd5d) [0x560a3d]
>> > 9: (Server::handle_client_request(MClientRequest*)+0x49b) [0x563beb]
>> > 10: (Server::dispatch(Message*)+0x2fb) [0x5678cb]
>> > 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x4da3c4]
>> > 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x4f17db]
>> > 13: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x4f1e43]
>> > 14: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x4d2073]
>> > 15: (DispatchQueue::entry()+0xb92) [0x7f616f7b48c2]
>> > 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f616f85172d]
>> > 17: (()+0x76ba) [0x7f616ef6f6ba]
>> > 18: (clone()+0x6d) [0x7f616e79841d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>> >
>> >
>> > I then followed all of the MDS recovery instructions (http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts) which took several days to complete. Now, when I attempt to start the MDS, I get a different crash and the mds still fails to start up:
>> >
>> > -10000> 2019-07-30 09:29:31.943 7f6ba2c21700 -1 /build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f6ba2c21700 time 2019-07-30 09:29:31.943654
>> > /build/ceph-13.2.6/src/mds/MDCache.cc: 1680: FAILED assert(follows >= realm->get_newest_seq())
>> >
>> > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f6bada9497e]
>> > 2: (()+0x2fab07) [0x7f6bada94b07]
>> > 3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
>> > 4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc0) [0x5f8450]
>> > 5: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
>> > 6: (Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
>> > 7: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
>> > 8: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*, bool)+0x3dd) [0x652f6d]
>> > 9: (Locker::scatter_tick()+0x1e4) [0x6535a4]
>> > 10: (Locker::tick()+0x9) [0x6538b9]
>> > 11: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
>> > 12: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
>> > 13: (Context::complete(int)+0x9) [0x4d31d9]
>> > 14: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
>> > 15: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
>> > 16: (()+0x76ba) [0x7f6bad3106ba]
>> > 17: (clone()+0x6d) [0x7f6bacb3941d]
>> >
>> > -10000> 2019-07-30 09:29:31.951 7f6ba2c21700 -1 *** Caught signal (Aborted) **
>> > in thread 7f6ba2c21700 thread_name:safe_timer
>> >
>> > ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
>> > 1: (()+0x11390) [0x7f6bad31a390]
>> > 2: (gsignal()+0x38) [0x7f6baca67428]
>> > 3: (abort()+0x16a) [0x7f6baca6902a]
>> > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f6bada94a86]
>> > 5: (()+0x2fab07) [0x7f6bada94b07]
>> > 6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0xd3f) [0x5f821f]
>> > 7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc0) [0x5f8450]
>> > 8: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x4b1) [0x5f9141]
>> > 9: (Locker::scatter_writebehind(ScatterLock*)+0x465) [0x64a615]
>> > 10: (Locker::simple_sync(SimpleLock*, bool*)+0x176) [0x64e506]
>> > 11: (Locker::scatter_nudge(ScatterLock*, MDSInternalContextBase*, bool)+0x3dd) [0x652f6d]
>> > 12: (Locker::scatter_tick()+0x1e4) [0x6535a4]
>> > 13: (Locker::tick()+0x9) [0x6538b9]
>> > 14: (MDSRankDispatcher::tick()+0x1e9) [0x4f00d9]
>> > 15: (FunctionContext::finish(int)+0x2c) [0x4d52dc]
>> > 16: (Context::complete(int)+0x9) [0x4d31d9]
>> > 17: (SafeTimer::timer_thread()+0x18b) [0x7f6bada9120b]
>> > 18: (SafeTimerThread::entry()+0xd) [0x7f6bada9286d]
>> > 19: (()+0x76ba) [0x7f6bad3106ba]
>> > 20: (clone()+0x6d) [0x7f6bacb3941d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>> >
>> >
>> >
>> > Is there any hope of getting my MDS servers back up and accessing the cephfs data at this point? None of the guides I've followed thus far have been successful.
>> >
>>
>> nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix
>> snaptable. hopefully it will fix your issue.
>>
>> > thanks,
>> > Wyllys Ingersoll
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list -- dev@ceph.io
>> > To unsubscribe send an email to dev-leave@ceph.io