Thanks!
Now got the root cause. The fix is on its way...
Meanwhile you might want to try to workaround the issue via setting
"bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator, e.g.
avl for bluestore_allocator (and optionally for bluefs_allocator too).
Hope this helps,
Igor.
On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:
Alright, here’s the full log file.
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423
Le 14 sept. 2020 à 06:49, Igor Fedotov
<ifedotov(a)suse.de
<mailto:ifedotov@suse.de>> a écrit :
Well, I can see duplicate admin socket command
registration/de-registration (and the second de-registration asserts)
but don't understand how this could happen.
Would you share the full log, please?
Thanks,
Igor
On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
> Here’s the out file, as requested.
>
>
>
>
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
> 4414-4416 Louis B Mayer
> Laval, QC, H7P 0G1, Canada
> TEL : +1.514.802.1644 - Poste : 2644
> FAX : +1.514.612.0678
> CA/US : 1.855.774.4678
> FR : 01 76 60 41 43
> UK : 0808 189 0423
>
>
>
>
>
>
>> Le 11 sept. 2020 à 10:38, Igor Fedotov <ifedotov(a)suse.de
>> <mailto:ifedotov@suse.de>> a écrit :
>>
>> Could you please run:
>>
>> CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool
>> repair --path <...> ; cat log | grep asok > out
>>
>> and share 'out' file.
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
>>> Hi,
>>>
>>> We’re upgrading our cluster OSD node per OSD node to Nautilus from
>>> Mimic. From some release notes, it was recommended to run the
>>> following command to fix stats after an upgrade :
>>>
>>> ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0
>>>
>>> However, running that command gives us the following error message:
>>>
>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>> <http://allocator.cc/>: In
>>>> function 'virtual Allocator::SocketHook::~SocketHook()' thread
>>>> 7f1a6467eec0 time 2020-09-10 14:40:25.872353
>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>> <http://allocator.cc/>: 53
>>>> : FAILED ceph_assert(r == 0)
>>>> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
>>>> nautilus (stable)
>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x14a) [0x7f1a5a823025]
>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>> [0x55b3352749a1]
>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>> 2020-09-10 14:40:25.873 7f1a6467eec0 -1
>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>> <http://allocator.cc/>: In function 'virtual
>>>> Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time
>>>> 2020-09-10 14:40:25.872353
>>>>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>> <http://allocator.cc/>: 53: FAILED ceph_assert(r == 0)
>>>>
>>>> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
>>>> nautilus (stable)
>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x14a) [0x7f1a5a823025]
>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>> [0x55b3352749a1]
>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>> *** Caught signal (Aborted) **
>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
>>>> nautilus (stable)
>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x199) [0x7f1a5a823074]
>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>> [0x55b3352749a1]
>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>> 2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal
>>>> (Aborted) **
>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>>
>>>> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
>>>> nautilus (stable)
>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x199) [0x7f1a5a823074]
>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>> [0x55b3352749a1]
>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>`
>>>> is needed to interpret this.
>>>
>>> What could be the source of this error? I haven’t found much of
>>> anything about it online.
>>>
>>>
>>> Jean-Philippe Méthot
>>> Senior Openstack system administrator
>>> Administrateur système Openstack sénior
>>> PlanetHoster inc.
>>> 4414-4416 Louis B Mayer
>>> Laval, QC, H7P 0G1, Canada
>>> TEL : +1.514.802.1644 - Poste : 2644
>>> FAX : +1.514.612.0678
>>> CA/US : 1.855.774.4678
>>> FR : 01 76 60 41 43
>>> UK : 0808 189 0423
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> <mailto:ceph-users@ceph.io>
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> <mailto:ceph-users-leave@ceph.io>
>