Hi Saber,
I don't think this is related. New assertion happens along the write
path while the original one occurred on allocator shutdown.
Unfortunately there are not much information to troubleshoot this...
Are you able to reproduce the case?
Thanks,
Igor
On 9/25/2020 4:21 AM, Saber(a)PlanetHoster.info wrote:
Hi Igor,
We had an osd crash a week after running Nautilus. I have attached the
logs, is it related to the same bug?
Thanks,
Saber
CTO @PlanetHoster
On Sep 14, 2020, at 10:22 AM, Igor Fedotov
<ifedotov(a)suse.de
<mailto:ifedotov@suse.de>> wrote:
Thanks!
Now got the root cause. The fix is on its way...
Meanwhile you might want to try to workaround the issue via setting
"bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator,
e.g. avl for bluestore_allocator (and optionally for bluefs_allocator
too).
Hope this helps,
Igor.
On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:
> Alright, here’s the full log file.
>
>
>
>
>
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
> 4414-4416 Louis B Mayer
> Laval, QC, H7P 0G1, Canada
> TEL : +1.514.802.1644 - Poste : 2644
> FAX : +1.514.612.0678
> CA/US : 1.855.774.4678
> FR : 01 76 60 41 43
> UK : 0808 189 0423
>
>
>
>
>
>
>> Le 14 sept. 2020 à 06:49, Igor Fedotov <ifedotov(a)suse.de
>> <mailto:ifedotov@suse.de>> a écrit :
>>
>> Well, I can see duplicate admin socket command
>> registration/de-registration (and the second de-registration
>> asserts) but don't understand how this could happen.
>>
>> Would you share the full log, please?
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
>>> Here’s the out file, as requested.
>>>
>>>
>>>
>>>
>>> Jean-Philippe Méthot
>>> Senior Openstack system administrator
>>> Administrateur système Openstack sénior
>>> PlanetHoster inc.
>>> 4414-4416 Louis B Mayer
>>> Laval, QC, H7P 0G1, Canada
>>> TEL : +1.514.802.1644 - Poste : 2644
>>> FAX : +1.514.612.0678
>>> CA/US : 1.855.774.4678
>>> FR : 01 76 60 41 43
>>> UK : 0808 189 0423
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Le 11 sept. 2020 à 10:38, Igor Fedotov <ifedotov(a)suse.de
>>>> <mailto:ifedotov@suse.de>> a écrit :
>>>>
>>>> Could you please run:
>>>>
>>>> CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool
>>>> repair --path <...> ; cat log | grep asok > out
>>>>
>>>> and share 'out' file.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
>>>>> Hi,
>>>>>
>>>>> We’re upgrading our cluster OSD node per OSD node to Nautilus
>>>>> from Mimic. From some release notes, it was recommended to run
>>>>> the following command to fix stats after an upgrade :
>>>>>
>>>>> ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0
>>>>>
>>>>> However, running that command gives us the following error message:
>>>>>
>>>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>> <http://allocator.cc/>: In
>>>>>> function 'virtual Allocator::SocketHook::~SocketHook()'
thread
>>>>>> 7f1a6467eec0 time 2020-09-10 14:40:25.872353
>>>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>> <http://allocator.cc/>: 53
>>>>>> : FAILED ceph_assert(r == 0)
>>>>>> ceph version 14.2.11
>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>> char const*)+0x14a) [0x7f1a5a823025]
>>>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8)
[0x55b335274528]
>>>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>> [0x55b3352749a1]
>>>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>> 2020-09-10 14:40:25.873 7f1a6467eec0 -1
>>>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>> <http://allocator.cc/>: In function 'virtual
>>>>>> Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0
time
>>>>>> 2020-09-10 14:40:25.872353
>>>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>> <http://allocator.cc/>: 53: FAILED ceph_assert(r == 0)
>>>>>>
>>>>>> ceph version 14.2.11
>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>> char const*)+0x14a) [0x7f1a5a823025]
>>>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8)
[0x55b335274528]
>>>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>> [0x55b3352749a1]
>>>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>> *** Caught signal (Aborted) **
>>>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>>>> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
>>>>>> nautilus (stable)
>>>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>> char const*)+0x199) [0x7f1a5a823074]
>>>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8)
[0x55b335274528]
>>>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>> [0x55b3352749a1]
>>>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>> 2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal
>>>>>> (Aborted) **
>>>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>>>>
>>>>>> ceph version 14.2.11
>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>> char const*)+0x199) [0x7f1a5a823074]
>>>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8)
[0x55b335274528]
>>>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>> [0x55b3352749a1]
>>>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>> NOTE: a copy of the executable, or `objdump -rdS
<executable>`
>>>>>> is needed to interpret this.
>>>>>
>>>>> What could be the source of this error? I haven’t found much of
>>>>> anything about it online.
>>>>>
>>>>>
>>>>> Jean-Philippe Méthot
>>>>> Senior Openstack system administrator
>>>>> Administrateur système Openstack sénior
>>>>> PlanetHoster inc.
>>>>> 4414-4416 Louis B Mayer
>>>>> Laval, QC, H7P 0G1, Canada
>>>>> TEL : +1.514.802.1644 - Poste : 2644
>>>>> FAX : +1.514.612.0678
>>>>> CA/US : 1.855.774.4678
>>>>> FR : 01 76 60 41 43
>>>>> UK : 0808 189 0423
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> <mailto:ceph-users@ceph.io>
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>> <mailto:ceph-users-leave@ceph.io>
>>>
>