If the crash is easily reproducible at your end, could you set debug_client
to 20 in the client-side conf file and then reattempt the operation.
You could then send over the collected logs and we could take a look at
them.
FYI - there's also a bug tracker that has identified a similar problem:
https://tracker.ceph.com/issues/56288, but we don't have the logs for it.
On Fri, Jun 30, 2023 at 11:16 AM <hakesudu(a)gmail.com> wrote:
Hi,
I've deployed a ceph-quincy for HPC. Recently, I always encounter the
problem of ceph-fuse crash
kernel version is 4.18.0-348.el8.0.2.x86_64
here is part of ceph-fuse log:
-59> 2023-06-28T09:51:00.452+0800 155546ff7700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3
-58> 2023-06-28T09:51:00.452+0800 15554cc49700 3 client.159239
ll_opendir 0x10003e1408d.head
-57> 2023-06-28T09:51:00.452+0800 15554cc49700 3 client.159239
may_open 0x1554e79123d0 = 0
-56> 2023-06-28T09:51:00.452+0800 15554cc49700 3 client.159239
ll_opendir 0x10003e1408d.head = 0 (0x155328079380)
-55> 2023-06-28T09:51:00.453+0800 1555473f9700 3 client.159239
seekdir(0x155328079380, 0)
-54> 2023-06-28T09:51:00.452+0800 155546bf5700 5 client.159239
put_cap_ref dropped last FILE_CACHE ref on 0x20004a6e548.head(faked_ino=0
nref=14 ll_ref=2 cap_refs={1024=0,2048=1} open={1=1} mode=100644
size=5626/0 nlink=1 btime=2023-06-05T14:38:36.471178+0800
mtime=2023-06-05T14:38:36.471178+0800 ctime=2023-06-05T14:38:36.471178+0800
change_attr=1 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[0x20004a6e548 ts
0/0 objects 1 dirty_or_tx 0] 0x1554e7902300)
-53> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239 ll_read
0x15531806b970 0~8192 = 5626
-52> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
may_lookup 0x155420004840 = 0
-51> 2023-06-28T09:51:00.453+0800 1554a5dee700 3 client.159239
ll_lookup 0x10003e1406f.head html
-50> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3 -> 0 (100019b7a43)
-49> 2023-06-28T09:51:00.453+0800 1554a5dee700 3 client.159239
may_lookup 0x1554c1a89e40 = 0
-48> 2023-06-28T09:51:00.453+0800 1554a7dfe700 3 client.159239
ll_flush 0x15531806b970 0x20004a6e548
-47> 2023-06-28T09:51:00.453+0800 1555469f4700 3 client.159239
ll_lookup 0x100019b7a43.head envs
-46> 2023-06-28T09:51:00.453+0800 1555469f4700 3 client.159239
may_lookup 0x155420343eb0 = 0
-45> 2023-06-28T09:51:00.453+0800 1555469f4700 3 client.159239
ll_lookup 0x100019b7a43.head envs -> 0 (200035de5d8)
-44> 2023-06-28T09:51:00.453+0800 155545fef700 3 client.159239
ll_release (fh)0x15531806b970 0x20004a6e548
-43> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239
seekdir(0x15544c054010, 1152360438801891331)
-42> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
seekdir(0x15544c029ee0, 1152690945930559491)
-41> 2023-06-28T09:51:00.453+0800 15554cc49700 3 client.159239
ll_releasedir 0x15544c029ee0
-40> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x20004a66459.head tests
-39> 2023-06-28T09:51:00.453+0800 15554c244700 3 client.159239
ll_lookup 0x100040bd04e.head att5410-w -> 0 (200047cbc02)
-38> 2023-06-28T09:51:00.453+0800 1554a7dfe700 3 client.159239
ll_lookup 0x200035de5d8.head steven-colossal
-37> 2023-06-28T09:51:00.453+0800 1554a7dfe700 3 client.159239
may_lookup 0x1554c20229c0 = 0
-36> 2023-06-28T09:51:00.453+0800 1554a7dfe700 3 client.159239
ll_lookup 0x200035de5d8.head steven-colossal -> 0 (100040bcf3a)
-35> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
may_lookup 0x155533c970f0 = 0
-34> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x20004a66459.head tests -> 0 (20004a6e3e0)
-33> 2023-06-28T09:51:00.453+0800 155545bed700 3 client.159239
ll_releasedir 0x15544c054010
-32> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239
ll_getattr 0x200047cbc02.head = 0
-31> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3
-30> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
may_lookup 0x155420004840 = 0
-29> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3 -> 0 (100019b7a43)
-28> 2023-06-28T09:51:00.453+0800 155545fef700 3 client.159239
ll_lookup 0x100040bcf3a.head lib
-27> 2023-06-28T09:51:00.453+0800 155545fef700 3 client.159239
may_lookup 0x1554d7d1d240 = 0
-26> 2023-06-28T09:51:00.453+0800 155545fef700 3 client.159239
ll_lookup 0x100040bcf3a.head lib -> 0 (100040bd018)
-25> 2023-06-28T09:51:00.453+0800 1555469f4700 3 client.159239
ll_lookup 0x20004a6e3e0.head test_trainer
-24> 2023-06-28T09:51:00.453+0800 1555469f4700 3 client.159239
may_lookup 0x155531e5a6c0 = 0
-23> 2023-06-28T09:51:00.453+0800 15554cc49700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3
-22> 2023-06-28T09:51:00.453+0800 15554cc49700 3 client.159239
may_lookup 0x155420004840 = 0
-21> 2023-06-28T09:51:00.453+0800 15554cc49700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3 -> 0 (100019b7a43)
-20> 2023-06-28T09:51:00.453+0800 15554c244700 3 client.159239
ll_lookup 0x100019b7a43.head envs
-19> 2023-06-28T09:51:00.453+0800 15554c244700 3 client.159239
may_lookup 0x155420343eb0 = 0
-18> 2023-06-28T09:51:00.453+0800 15554c244700 3 client.159239
ll_lookup 0x100019b7a43.head envs -> 0 (200035de5d8)
-17> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
ll_lookup 0x100019b7a43.head envs
-16> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
may_lookup 0x155420343eb0 = 0
-15> 2023-06-28T09:51:00.453+0800 155546ff7700 3 client.159239
ll_lookup 0x100019b7a43.head envs -> 0 (200035de5d8)
-14> 2023-06-28T09:51:00.453+0800 155545bed700 3 client.159239
ll_lookup 0x100040bd018.head terminfo
-13> 2023-06-28T09:51:00.453+0800 1555475fa700 3 client.159239
ll_lookup 0x1000383ca0a.head train.py -> 0 (1000383caf7)
-12> 2023-06-28T09:51:00.453+0800 1555465f2700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3
-11> 2023-06-28T09:51:00.453+0800 1555465f2700 3 client.159239
may_lookup 0x155420004840 = 0
-10> 2023-06-28T09:51:00.453+0800 1555465f2700 3 client.159239
ll_lookup 0x200017f674a.head anaconda3 -> 0 (100019b7a43)
-9> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x200035de5d8.head steven-colossal
-8> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
may_lookup 0x1554c20229c0 = 0
-7> 2023-06-28T09:51:00.453+0800 1555452da700 3 client.159239
ll_lookup 0x200035de5d8.head steven-colossal -> 0 (100040bcf3a)
-6> 2023-06-28T09:51:00.453+0800 15554cc49700 3 client.159239
ll_lookup 0x100019b7a43.head envs
-5> 2023-06-28T09:51:00.453+0800 1555465f2700 3 client.159239
ll_getattr 0x1000383caf7.head = 0
-4> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239
ll_lookup 0x200035de5d8.head llmzoo
-3> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239
may_lookup 0x1554c20229c0 = 0
-2> 2023-06-28T09:51:00.453+0800 155546bf5700 3 client.159239
ll_lookup 0x200035de5d8.head llmzoo -> 0 (10003e11a9e)
-1> 2023-06-28T09:51:00.453+0800 15554c646700 3 client.159239
ll_lookup 0x10003e11a9e.head lib
0> 2023-06-28T09:51:00.458+0800 1554a77fb700 -1 *** Caught signal
(Segmentation fault) **
in thread 1554a77fb700 thread_name:ceph-fuse
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)
1: /lib64/libpthread.so.0(+0x12ce0) [0x1555535eece0]
2: (Client::_readdir_cache_cb(dir_result_t*, int (*)(void*, dirent*,
ceph_statx*, long, Inode*), void*, int, bool)+0x2f4) [0x555555647d64]
3: (Client::readdir_r_cb(dir_result_t*, int (*)(void*, dirent*,
ceph_statx*, long, Inode*), void*, unsigned int, unsigned int, bool)+0xae7)
[0x55555564cd37]
4: ceph-fuse(+0xadbf8) [0x555555601bf8]
5: /lib64/libfuse.so.2(+0x16706) [0x1555550fd706]
6: /lib64/libfuse.so.2(+0x17868) [0x1555550fe868]
7: /lib64/libfuse.so.2(+0x14440) [0x1555550fb440]
8: /lib64/libpthread.so.0(+0x81cf) [0x1555535e41cf]
9: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
1/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/ 5 rgw_datacache
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
1/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
0/ 5 seastore
0/ 5 seastore_onode
0/ 5 seastore_odata
0/ 5 seastore_omap
0/ 5 seastore_tm
0/ 5 seastore_cleaner
0/ 5 seastore_lba
0/ 5 seastore_cache
0/ 5 seastore_journal
0/ 5 seastore_device
0/ 5 alienstore
1/ 5 mclock
1/ 5 ceph_exporter
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
1554947e3700 / ceph-fuse
155494fe7700 / ceph-fuse
1554975fa700 / ceph-fuse
1554a57eb700 / ceph-fuse
1554a59ec700 / ceph-fuse
1554a5dee700 / ceph-fuse
1554a5fef700 / ceph-fuse
1554a61f0700 /
1554a63f1700 / ceph-fuse
1554a65f2700 / ceph-fuse
1554a67f3700 / ceph-fuse
1554a6bf5700 / ceph-fuse
1554a6df6700 / ceph-fuse
1554a71f8700 / ceph-fuse
1554a73f9700 / ceph-fuse
1554a75fa700 /
1554a77fb700 / ceph-fuse
1554a7dfe700 / ceph-fuse
1554a7fff700 / ceph-fuse
1555452da700 / ceph-fuse
1555455ea700 / ceph-fuse
1555459ec700 / ceph-fuse
155545bed700 / ceph-fuse
155545fef700 / ceph-fuse
1555465f2700 / ceph-fuse
1555469f4700 / ceph-fuse
155546bf5700 / ceph-fuse
155546df6700 / ceph-fuse
155546ff7700 / ceph-fuse
1555471f8700 / ceph-fuse
1555473f9700 / ceph-fuse
1555475fa700 / ceph-fuse
1555477fb700 /
155547dfe700 / ceph-fuse
155547fff700 / ceph-fuse
15554c244700 / ceph-fuse
15554c646700 / ceph-fuse
15554c847700 / ceph-fuse
15554cc49700 / ceph-fuse
15554ce4a700 / ceph-fuse
15554da50700 / ms_dispatch
max_recent 10000
max_new 10000
log_file
/var/lib/ceph/crash/2023-06-28T01:51:00.459615Z_3ddbaa44-d8cd-437b-908b-c3772520c7a6/log
--- end dump of recent events ---
Has anyone encountered this kind of problem?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Milind