On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe <giuseppe.lore(a)cscs.ch> wrote:
Dear all,
We have started to use more intensively cephfs for some wlcg related workload.
We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of
the other configs are default ones.
One of them has crashed this night leaving the log below.
Do you have any hint on what could be the cause and how to avoid it?
Not atm. Telemetry reported similar crashes
https://tracker.ceph.com/issues/54959 (cephfs)
https://tracker.ceph.com/issues/54685 (mgr)
BT indicates tcmalloc involvement, but not sure what's going on.
Regards,
Giuseppe
[root@naret-monitor03 ~]# journalctl -u
ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service
...
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: ceph
version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 1:
/lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 2:
abort()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 3:
/lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 4:
/lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 5:
/lib64/libstdc++.so.6(+0x95559) [0x7fe291253559]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 6:
__gxx_personality_v0()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 7:
/lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 8:
_Unwind_Resume()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 9:
/usr/bin/ceph-mds(+0x18c104) [0x5638351e7104]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 10:
/lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 11:
gsignal()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 12:
abort()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 13:
/lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 14:
/lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 15:
/lib64/libstdc++.so.6(+0x96597) [0x7fe291254597]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 16:
/lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 17:
/lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 18:
(tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 19:
(std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 20:
(CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 21:
(CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 22:
(Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 23:
(Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 24:
(Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 25:
(Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 26:
(MDSRank::handle_message(boost::intrusive_ptr<Message const> const&>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 27:
(MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 28:
(MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 29:
(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 30:
(DispatchQueue::entry()+0x126a) [0x7fe2930a5aba]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 31:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 32:
/lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf]
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 33:
clone()
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: NOTE:
a copy of the executable, or `objdump -rdS <executable>` is neede>
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: ---
begin dump of recent events ---
Jan 19 04:49:40 naret-monitor03
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
terminate called recursively
Jan 19 04:49:43 naret-monitor03 systemd[1]:
ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service: Main
process exited, code=exited, status=127/n/a
Jan 19 04:49:43 naret-monitor03 systemd[1]:
ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service:
Failed with result 'exit-code'.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Cheers,
Venky