February 2020 - ceph-users

Cache tier OSDs crashing due to unfound hitset object 14.2.7

by Lincoln Bryant

Hello Ceph experts, In the last day or so, we had a few nodes randomly reboot and now unfound objects are reported in Ceph health during cluster during recovery. It appears that the object in question is a hit set object, which I now cannot mark lost because Ceph cannot probe the OSDs that keep crashing due to missing the hit set object. Pasted below is the crash message[1] for osd.299, and some of the unfound objects[2]. Lastly [3] shows a sample of the hit set objects that are lost. I would greatly appreciate any insight you may have on how to move forward. As of right now this cluster is inoperable due to 3 down PGs. Thanks, Lincoln Bryant [1] -4> 2020-02-26 22:26:29.455 7ff52edaa700 0 0x559587fa91e0 36.321b unexpected need for 36:d84c0000:.ceph-internal::hit_set_36.321b_archive_2020-02-24 21%3a15%3a16.792846_2020-02-24 21%3a15%3a32.457855:head have 1352209'2834660 flags = none tried to add 1352209'2834660 flags = none -3> 2020-02-26 22:26:29.455 7ff52edaa700 0 0x559587fa91e0 36.321b unexpected need for 36:d84c0000:.ceph-internal::hit_set_36.321b_archive_2020-02-24 21%3a15%3a16.792846_2020-02-24 21%3a15%3a32.457855:head have 1352209'2834660 flags = none tried to add 1359781'2835659 flags = delete -2> 2020-02-26 22:26:29.456 7ff53adc2700 3 osd.299 1367392 handle_osd_map epochs [1367392,1367392], i have 1367392, src has [1349017,1367392] -1> 2020-02-26 22:26:29.460 7ff52edaa700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/PG.h: In function 'void PG::MissingLoc::add_active_missing(const pg_missing_t&)' thread 7ff52edaa700 time 2020-02-26 22:26:29.457170 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.7/rpm/el7/BUILD/ceph-14.2.7/src/osd/PG.h: 838: FAILED ceph_assert(i->second.need == j->second.need) ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55955fdafc0f] 2: (()+0x4dddd7) [0x55955fdafdd7] 3: (PG::MissingLoc::add_active_missing(pg_missing_set<false> const&)+0x1e0) [0x55955ffa0cb0] 4: (PG::activate(ObjectStore::Transaction&, unsigned int, std::map<int, std::map<spg_t, pg_query_t, std::less<spg_t>, std::allocator<std::pair<spg_t const, pg_query_t> > >, std::less<int>, std::allocator<std::pair<int const, std::map<spg_t, pg_query_t, std::less<spg_t>, std::allocator<std::pair<spg_t const, pg_query_t> > > > > >&, std::map<int, std::vector<std::pair<pg_notify_t, PastIntervals>, std::allocator<std::pair<pg_notify_t, PastIntervals> > >, std::less<int>, std::allocator<std::pair<int const, std::vector<std::pair<pg_notify_t, PastIntervals>, std::allocator<std::pair<pg_notify_t, PastIntervals> > > > > >*, PG::RecoveryCtx*)+0x1916) [0x55955ff3f1e6] 5: (PG::RecoveryState::Active::Active(boost::statechart::state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::my_context)+0x370) [0x55955ff62d20] 6: (boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xfb) [0x55955ffa8d5b] 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x97) [0x55955ff88507] 8: (PG::handle_activate_map(PG::RecoveryCtx*)+0x1a8) [0x55955ff75848] 9: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*)+0x61d) [0x55955feb161d] 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa6) [0x55955feb2d16] 11: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x51) [0x55956011a481] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x55955fea7bbf] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x559560448976] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55956044b490] 15: (()+0x7e25) [0x7ff5669bae25] 16: (clone()+0x6d) [0x7ff565a9a34d] 0> 2020-02-26 22:26:29.465 7ff52edaa700 -1 *** Caught signal (Aborted) ** in thread 7ff52edaa700 thread_name:tp_osd_tp ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 1: (()+0xf5e0) [0x7ff5669c25e0] 2: (gsignal()+0x37) [0x7ff5659d71f7] 3: (abort()+0x148) [0x7ff5659d88e8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x55955fdafc5e] 5: (()+0x4dddd7) [0x55955fdafdd7] 6: (PG::MissingLoc::add_active_missing(pg_missing_set<false> const&)+0x1e0) [0x55955ffa0cb0] 7: (PG::activate(ObjectStore::Transaction&, unsigned int, std::map<int, std::map<spg_t, pg_query_t, std::less<spg_t>, std::allocator<std::pair<spg_t const, pg_query_t> > >, std::less<int>, std::allocator<std::pair<int const, std::map<spg_t, pg_query_t, std::less<spg_t>, std::allocator<std::pair<spg_t const, pg_query_t> > > > > >&, std::map<int, std::vector<std::pair<pg_notify_t, PastIntervals>, std::allocator<std::pair<pg_notify_t, PastIntervals> > >, std::less<int>, std::allocator<std::pair<int const, std::vector<std::pair<pg_notify_t, PastIntervals>, std::allocator<std::pair<pg_notify_t, PastIntervals> > > > > >*, PG::RecoveryCtx*)+0x1916) [0x55955ff3f1e6] 8: (PG::RecoveryState::Active::Active(boost::statechart::state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::my_context)+0x370) [0x55955ff62d20] 9: (boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xfb) [0x55955ffa8d5b] 10: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x97) [0x55955ff88507] 11: (PG::handle_activate_map(PG::RecoveryCtx*)+0x1a8) [0x55955ff75848] 12: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*)+0x61d) [0x55955feb161d] 13: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa6) [0x55955feb2d16] 14: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x51) [0x55956011a481] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x55955fea7bbf] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x559560448976] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55956044b490] 18: (()+0x7e25) [0x7ff5669bae25] 19: (clone()+0x6d) [0x7ff565a9a34d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.299.log --- end dump of recent events --- [2] [root@ceph-mon01 ~]# ceph pg 36.321b list_unfound { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "hit_set_36.321b_archive_2020-02-24 21:15:16.792846_2020-02-24 21:15:32.457855", "key": "", "snapid": -2, "hash": 12827, "max": 0, "pool": 36, "namespace": ".ceph-internal" }, "need": "1352209'2834660", "have": "0'0", "flags": "none", "locations": [] } ], "more": false } [root@ceph-mon01 ~]# ceph pg 36.324a list_unfound { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "hit_set_36.324a_archive_2020-02-25 12:40:58.130723_2020-02-25 12:46:25.260587", "key": "", "snapid": -2, "hash": 12874, "max": 0, "pool": 36, "namespace": ".ceph-internal" }, "need": "1361100'2822063", "have": "0'0", "flags": "none", "locations": [] } ], "more": false } [root@ceph-mon01 ~]# ceph pg 36.10dc list_unfound { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "hit_set_36.10dc_archive_2020-02-25 12:40:58.129048_2020-02-25 12:45:02.202268", "key": "", "snapid": -2, "hash": 4316, "max": 0, "pool": 36, "namespace": ".ceph-internal" }, "need": "1361089'2838543", "have": "0'0", "flags": "none", "locations": [] } ], "more": false }

4 years, 1 month

1
0
0 0

Nautilus OSD memory consumption?

by Nigel Williams

The OOM-killer is on the rampage and striking down hapless OSDs when the cluster is under heavy client IO. The memory target does not seem to be much of a limit, is this intentional? root@cnx-11:~# ceph-conf --show-config|fgrep osd_memory_target osd_memory_target = 4294967296 osd_memory_target_cgroup_limit_ratio = 0.800000 root@cnx-31:~# pmap 4327|fgrep total total 6794892K Are there any tips for controlling the OSD memory consumption? The hosts involved have 128GB or 192GB memory, 12 x OSDs (SATA), so even with 4GB per OSD there should be a large amount of free memory.

4 years, 1 month

2
3
0 0

default data pools for cephfs: replicated vs. ec

by thoralf schulze

hi there, recently, we've come across a lot of advice to only use replicated rados pools as default- (ie: root-) data pools for cephfs¹. unfortunately, we either skipped or blatantly ignored this advice while creating our cephfs, so our default data pool is an erasure coded one with k=2 and m=4, which _should_ be fine availability-wise. could anyone elaborate on the impacts regarding the performance of the whole setup? if a migration to a replicated pool is recommend: would a simple ceph osd pool set $default_data crush_rule $something_replicated suffice, or would you recommend a more elaborated approach, something along the lines of taking the cephfs down, copy contents of default_pool to default_new, rename default_new default_pool, taking the cephfs up again? thank you very much & with kind regards, t. ¹ - see, for instance, https://tracker.ceph.com/issues/42450 .

4 years, 1 month

2
1
0 0

radosgw lifecycle seems work strangely

by quexian da

ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable) I made a bucket named "test_lc" and ran `s3cmd expire --expiry-date=2019-01-01 s3://test_lc` to set the lifecycle (2019-01-01 is earlier than current date so every object will be removed). Then I ran `radosgw-admin lc process`, the objects got deleted as expected, and the status from `radosgw-admin lc list` is "completed". However, if I upload some objects, and ran `radosgw-admin lc process` again, the objects were not deleted. Could you please tell me what the reason is and what I should do in this case? Thanks in advance!

4 years, 1 month

2
1
0 0

cepfs: ceph-fuse clients getting stuck + causing degraded PG

by Andras Pataki

We've been running into a strange problem repeating every day or so with a specific HPC job on a Mimic cluster (13.2.8) using ceph-fuse (14.2.7). It seems like some cephfs clients are stuck (perhaps deadlocked) trying to access a file and are not making progress. Ceph reports the following problems (ignore the oversized cache, large omap objects and the failing to respond to cache pressure - we get those regularly without any problems): [root@cephmon00 ~]# ceph -s cluster: id: d7b33135-0940-4e48-8aa6-1d2026597c2f health: HEALTH_WARN 1 MDSs report oversized cache 1 clients failing to respond to capability release 5 clients failing to respond to cache pressure 1 MDSs report slow requests 2 large omap objects Degraded data redundancy: 1/24563062389 objects degraded (0.000%), 1 pg degraded 1 slow ops, oldest one blocked for 14723 sec, osd.1058 has slow ops services: mon: 5 daemons, quorum cephmon00,cephmon01,cephmon02,cephmon03,cephmon04 mgr: cephmon01(active), standbys: cephmon02, cephmon03, cephmon04, cephmon00 mds: cephfs-1/1/1 up {0=cephmds00=up:active}, 5 up:standby osd: 3264 osds: 3264 up, 3264 in data: pools: 4 pools, 67072 pgs objects: 3.10 G objects, 10 PiB usage: 17 PiB used, 6.0 PiB / 23 PiB avail pgs: 1/24563062389 objects degraded (0.000%) 67006 active+clean 40 active+clean+scrubbing+deep 25 active+clean+scrubbing 1 active+recovering+degraded io: client: 501 MiB/s rd, 4.2 GiB/s wr, 1.63 kop/s rd, 6.46 kop/s wr recovery: 28 MiB/s, 7 objects/s [root@cephmon00 ~]# ceph health detail HEALTH_WARN 1 MDSs report oversized cache; 1 clients failing to respond to capability release; 5 clients failing to respond to cache pressure; 1 MDSs report slow requests; 2 large omap objects; Degraded data redundancy: 1/24563067054 objects degraded (0.000%), 1 pg degraded; 1 slow ops, oldest one blocked for 14723 sec, osd.1058 has slow ops MDS_CACHE_OVERSIZED 1 MDSs report oversized cache mdscephmds00(mds.0): MDS cache is too large (73GB/16GB); 27115370 inodes in use by clients, 3811 stray files MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdscephmds00(mds.0): Client worker1003:cephfs failing to respond to capability release client_id: 90142364 MDS_CLIENT_RECALL 5 clients failing to respond to cache pressure mdscephmds00(mds.0): Client gw162.flatironinstitute.org failing to respond to cache pressure client_id: 72550178 mdscephmds00(mds.0): Client ccblin053 failing to respond to cache pressure client_id: 76380719 mdscephmds00(mds.0): Client rusty2 failing to respond to cache pressure client_id: 79497905 mdscephmds00(mds.0): Client infolin003:cephfs failing to respond to cache pressure client_id: 88565901 mdscephmds00(mds.0): Client ceph-nfs1.flatironinstitute.org:cephfs-ro failing to respond to cache pressure client_id: 88899914 MDS_SLOW_REQUEST 1 MDSs report slow requests mdscephmds00(mds.0): 7 slow requests are blocked > 30 secs LARGE_OMAP_OBJECTS 2 large omap objects 2 large objects found in pool 'cephfs_metadata' Search the cluster log for 'Large omap object found' for more details. PG_DEGRADED Degraded data redundancy: 1/24563067054 objects degraded (0.000%), 1 pg degraded pg 2.14b7 is active+recovering+degraded, acting [1058,2098,734] SLOW_OPS 1 slow ops, oldest one blocked for 14723 sec, osd.1058 has slow ops The client that is failing to respond to capability release seems to be the real problem here. This is a ceph-fuse client (14.2.7) - it looks like it has a bunch of open requests in flight to the MDS: [root@worker1003 ~]# ceph daemon /var/run/ceph/ceph-client.cephfs.6969.93825141492560.asok mds_requests { "request": { "tid": 3801617, "op": "open", "path": "#0x10048b87a3b", "path2": "", "ino": "0x10048b87a3b", "hint_ino": "0x0", "sent_stamp": "2020-02-26 03:42:58.359939", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 3801617, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 }, "request": { "tid": 3900154, "op": "lookup", "path": "#0x10048b61f09/tesscut_20200226034255.zip", "path2": "", "ino": "0x10048b61f09", "hint_ino": "0x0", "sent_stamp": "2020-02-26 04:08:50.572111", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 3801617, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 }, "request": { "tid": 4111017, "op": "lookup", "path": "#0x10048b61f09/tesscut_20200226034255.zip", "path2": "", "ino": "0x10048b61f09", "hint_ino": "0x0", "sent_stamp": "2020-02-26 04:53:29.372694", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 3801617, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 }, "request": { "tid": 4288428, "op": "lookup", "path": "#0x10048b61f09/tesscut_20200226034255.zip", "path2": "", "ino": "0x10048b61f09", "hint_ino": "0x0", "sent_stamp": "2020-02-26 05:39:52.257977", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 3801617, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 }, "request": { "tid": 4425681, "op": "lookup", "path": "#0x10048b61f09/tesscut_20200226034255.zip", "path2": "", "ino": "0x10048b61f09", "hint_ino": "0x0", "sent_stamp": "2020-02-26 06:23:52.201814", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 3801617, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 } } What seems to be odd is that the last 4 are for the same file. The MDS also sees these ops as pending: [root@cephmds00 ~]# ceph daemon /var/run/ceph/ceph-mds.cephmds00.asok ops { "ops": [ { "description": "client_request(client.90142364:4288428 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 05:39:52.257975 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 05:39:52.258280", "age": 8189.169056, "duration": 8189.169070, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90142364:4288428", "op_type": "client_request", "client_info": { "client": "client.90142364", "tid": 4288428 }, "events": [ { "time": "2020-02-26 05:39:52.258280", "event": "initiated" }, { "time": "2020-02-26 05:39:52.258280", "event": "header_read" }, { "time": "2020-02-26 05:39:52.258281", "event": "throttled" }, { "time": "2020-02-26 05:39:52.258283", "event": "all_read" }, { "time": "2020-02-26 05:39:52.258362", "event": "dispatched" }, { "time": "2020-02-26 05:39:52.258383", "event": "failed to rdlock, waiting" } ] } }, { "description": "client_request(client.90142364:4425681 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 06:23:52.201811 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 06:23:52.201698", "age": 5549.225638, "duration": 5549.225679, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90142364:4425681", "op_type": "client_request", "client_info": { "client": "client.90142364", "tid": 4425681 }, "events": [ { "time": "2020-02-26 06:23:52.201698", "event": "initiated" }, { "time": "2020-02-26 06:23:52.201698", "event": "header_read" }, { "time": "2020-02-26 06:23:52.201699", "event": "throttled" }, { "time": "2020-02-26 06:23:52.201702", "event": "all_read" }, { "time": "2020-02-26 06:23:52.201720", "event": "dispatched" }, { "time": "2020-02-26 06:23:52.201740", "event": "failed to rdlock, waiting" } ] } }, { "description": "client_request(client.90142364:4111017 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 04:53:29.372691 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 04:53:29.373246", "age": 10972.054090, "duration": 10972.054155, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90142364:4111017", "op_type": "client_request", "client_info": { "client": "client.90142364", "tid": 4111017 }, "events": [ { "time": "2020-02-26 04:53:29.373246", "event": "initiated" }, { "time": "2020-02-26 04:53:29.373246", "event": "header_read" }, { "time": "2020-02-26 04:53:29.373246", "event": "throttled" }, { "time": "2020-02-26 04:53:29.373249", "event": "all_read" }, { "time": "2020-02-26 04:53:29.373299", "event": "dispatched" }, { "time": "2020-02-26 04:53:29.373323", "event": "failed to rdlock, waiting" } ] } }, { "description": "client_request(client.90118987:32628772 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 04:09:03.139173 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 04:09:03.138539", "age": 13638.288797, "duration": 13638.288884, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90118987:32628772", "op_type": "client_request", "client_info": { "client": "client.90118987", "tid": 32628772 }, "events": [ { "time": "2020-02-26 04:09:03.138539", "event": "initiated" }, { "time": "2020-02-26 04:09:03.138539", "event": "header_read" }, { "time": "2020-02-26 04:09:03.138540", "event": "throttled" }, { "time": "2020-02-26 04:09:03.138543", "event": "all_read" }, { "time": "2020-02-26 04:09:03.138569", "event": "dispatched" }, { "time": "2020-02-26 04:09:03.138592", "event": "failed to rdlock, waiting" } ] } }, { "description": "client_request(client.90142364:3801617 open #0x10048b87a3b 2020-02-26 03:42:58.359934 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 03:42:58.359502", "age": 15203.067834, "duration": 15203.067944, "type_data": { "flag_point": "failed to xlock, waiting", "reqid": "client.90142364:3801617", "op_type": "client_request", "client_info": { "client": "client.90142364", "tid": 3801617 }, "events": [ { "time": "2020-02-26 03:42:58.359502", "event": "initiated" }, { "time": "2020-02-26 03:42:58.359502", "event": "header_read" }, { "time": "2020-02-26 03:42:58.359502", "event": "throttled" }, { "time": "2020-02-26 03:42:58.359505", "event": "all_read" }, { "time": "2020-02-26 03:42:58.359525", "event": "dispatched" }, { "time": "2020-02-26 03:42:58.359557", "event": "failed to xlock, waiting" } ] } }, { "description": "client_request(client.90145864:3374743 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 06:24:02.082413 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 06:24:02.081762", "age": 5539.345573, "duration": 5539.345708, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90145864:3374743", "op_type": "client_request", "client_info": { "client": "client.90145864", "tid": 3374743 }, "events": [ { "time": "2020-02-26 06:24:02.081762", "event": "initiated" }, { "time": "2020-02-26 06:24:02.081762", "event": "header_read" }, { "time": "2020-02-26 06:24:02.081763", "event": "throttled" }, { "time": "2020-02-26 06:24:02.081766", "event": "all_read" }, { "time": "2020-02-26 06:24:02.081787", "event": "dispatched" }, { "time": "2020-02-26 06:24:02.081804", "event": "failed to rdlock, waiting" } ] } }, { "description": "client_request(client.90142364:3900154 lookup #0x10048b61f09/tesscut_20200226034255.zip 2020-02-26 04:08:50.572109 caller_uid=1261, caller_gid=1261{1193,1207,1223,1261,})", "initiated_at": "2020-02-26 04:08:50.572289", "age": 13650.855047, "duration": 13650.855205, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.90142364:3900154", "op_type": "client_request", "client_info": { "client": "client.90142364", "tid": 3900154 }, "events": [ { "time": "2020-02-26 04:08:50.572289", "event": "initiated" }, { "time": "2020-02-26 04:08:50.572289", "event": "header_read" }, { "time": "2020-02-26 04:08:50.572290", "event": "throttled" }, { "time": "2020-02-26 04:08:50.572293", "event": "all_read" }, { "time": "2020-02-26 04:08:50.572318", "event": "dispatched" }, { "time": "2020-02-26 04:08:50.572338", "event": "failed to rdlock, waiting" } ] } } ], "num_ops": 7 } There is another client on this list (90118987) also trying to access the same file - here is the client's op list from that (worker1020): [root@worker1020 ~]# ceph daemon /var/run/ceph/ceph-client.cephfs.6958.93825141492560.asok mds_requests { "request": { "tid": 32628772, "op": "lookup", "path": "#0x10048b61f09/tesscut_20200226034255.zip", "path2": "", "ino": "0x10048b61f09", "hint_ino": "0x0", "sent_stamp": "2020-02-26 04:09:03.139177", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 1, "got_unsafe": 0, "uid": 1261, "gid": 1261, "oldest_client_tid": 32628772, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0, "abort_rc": 0 } } Also, the kernel (CentOS 7 3.10.0-1062.9.1.el7.x86_64) also reports stuck processes. dmesg: [539762.054205] INFO: task python:1098178 blocked for more than 120 seconds. [539762.054210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054212] python D ffff8835f8788640 0 1098178 1098093 0x00000000 [539762.054218] Call Trace: [539762.054252] [<ffffffffc09abfcf>] ? fuse_do_open+0x14f/0x1e0 [fuse] [539762.054259] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054263] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054269] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054274] [<ffffffff95d28df6>] ima_file_check+0xa6/0x1b0 [539762.054280] [<ffffffff95c592ca>] do_last+0x59a/0x1290 [539762.054284] [<ffffffff95c5bdbd>] path_openat+0xcd/0x5a0 [539762.054290] [<ffffffff95c4737a>] ? __check_object_size+0x1ca/0x250 [539762.054294] [<ffffffff95c5d72d>] do_filp_open+0x4d/0xb0 [539762.054299] [<ffffffff95c6b207>] ? __alloc_fd+0x47/0x170 [539762.054303] [<ffffffff95c49684>] do_sys_open+0x124/0x220 [539762.054307] [<ffffffff95c4979e>] SyS_open+0x1e/0x20 [539762.054311] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054317] INFO: task python:1098179 blocked for more than 120 seconds. [539762.054318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054320] python D ffff88847f25ac80 0 1098179 1098093 0x00000000 [539762.054324] Call Trace: [539762.054328] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054331] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054335] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054344] [<ffffffffc09aeb3b>] fuse_file_aio_write+0xdb/0x390 [fuse] [539762.054348] [<ffffffff95c49d03>] do_sync_write+0x93/0xe0 [539762.054353] [<ffffffff95c4a7f0>] vfs_write+0xc0/0x1f0 [539762.054357] [<ffffffff95c4b60f>] SyS_write+0x7f/0xf0 [539762.054360] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054366] INFO: task python:1098180 blocked for more than 120 seconds. [539762.054368] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054369] python D ffff88847f25ac80 0 1098180 1098093 0x00000000 [539762.054372] Call Trace: [539762.054376] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054379] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054384] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054392] [<ffffffffc09aeb3b>] fuse_file_aio_write+0xdb/0x390 [fuse] [539762.054398] [<ffffffff95bf1249>] ? handle_pte_fault+0x2b9/0xe20 [539762.054402] [<ffffffff95c49d03>] do_sync_write+0x93/0xe0 [539762.054406] [<ffffffff95c4a7f0>] vfs_write+0xc0/0x1f0 [539762.054410] [<ffffffff95c4b60f>] SyS_write+0x7f/0xf0 [539762.054413] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054420] INFO: task python:1098181 blocked for more than 120 seconds. [539762.054421] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054423] python D ffff88847f45ac80 0 1098181 1098093 0x00000004 [539762.054426] Call Trace: [539762.054430] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054433] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054438] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054446] [<ffffffffc09ac5e6>] fuse_flush+0xb6/0x200 [fuse] [539762.054452] [<ffffffff96188678>] ? __do_page_fault+0x238/0x500 [539762.054455] [<ffffffff95c47ce7>] filp_close+0x37/0x90 [539762.054458] [<ffffffff95c6b4ac>] __close_fd+0x8c/0xb0 [539762.054461] [<ffffffff95c49803>] SyS_close+0x23/0x50 [539762.054465] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054482] INFO: task python:1098186 blocked for more than 120 seconds. [539762.054483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054485] python D ffff88847f45ac80 0 1098186 1098093 0x00000000 [539762.054488] Call Trace: [539762.054492] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054494] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054499] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054507] [<ffffffffc09ac5e6>] fuse_flush+0xb6/0x200 [fuse] [539762.054511] [<ffffffff95c47ce7>] filp_close+0x37/0x90 [539762.054514] [<ffffffff95c6b4ac>] __close_fd+0x8c/0xb0 [539762.054517] [<ffffffff95c49803>] SyS_close+0x23/0x50 [539762.054520] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054529] INFO: task python:1098188 blocked for more than 120 seconds. [539762.054531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054532] python D ffff88847f39ac80 0 1098188 1098093 0x00000004 [539762.054536] Call Trace: [539762.054539] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054542] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054546] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054554] [<ffffffffc09ac5e6>] fuse_flush+0xb6/0x200 [fuse] [539762.054559] [<ffffffff96188678>] ? __do_page_fault+0x238/0x500 [539762.054562] [<ffffffff95c47ce7>] filp_close+0x37/0x90 [539762.054565] [<ffffffff95c6b4ac>] __close_fd+0x8c/0xb0 [539762.054568] [<ffffffff95c49803>] SyS_close+0x23/0x50 [539762.054572] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054577] INFO: task python:1098189 blocked for more than 120 seconds. [539762.054579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054580] python D ffff88847f29ac80 0 1098189 1098093 0x00000004 [539762.054584] Call Trace: [539762.054587] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054590] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054595] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054602] [<ffffffffc09ac5e6>] fuse_flush+0xb6/0x200 [fuse] [539762.054606] [<ffffffff95c47ce7>] filp_close+0x37/0x90 [539762.054609] [<ffffffff95c6b4ac>] __close_fd+0x8c/0xb0 [539762.054612] [<ffffffff95c49803>] SyS_close+0x23/0x50 [539762.054615] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054626] INFO: task python:1098192 blocked for more than 120 seconds. [539762.054628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054630] python D ffff88847f31ac80 0 1098192 1098093 0x00000004 [539762.054633] Call Trace: [539762.054636] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054639] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054644] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054651] [<ffffffffc09ac5e6>] fuse_flush+0xb6/0x200 [fuse] [539762.054656] [<ffffffff96188678>] ? __do_page_fault+0x238/0x500 [539762.054659] [<ffffffff95c47ce7>] filp_close+0x37/0x90 [539762.054662] [<ffffffff95c6b4ac>] __close_fd+0x8c/0xb0 [539762.054665] [<ffffffff95c49803>] SyS_close+0x23/0x50 [539762.054668] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539762.054694] INFO: task python:1098202 blocked for more than 120 seconds. [539762.054696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539762.054697] python D ffff88847f49ac80 0 1098202 1098093 0x00000000 [539762.054701] Call Trace: [539762.054709] [<ffffffffc09abfcf>] ? fuse_do_open+0x14f/0x1e0 [fuse] [539762.054713] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539762.054715] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539762.054720] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539762.054723] [<ffffffff95d28df6>] ima_file_check+0xa6/0x1b0 [539762.054727] [<ffffffff95c592ca>] do_last+0x59a/0x1290 [539762.054731] [<ffffffff95c5bdbd>] path_openat+0xcd/0x5a0 [539762.054735] [<ffffffff95c4737a>] ? __check_object_size+0x1ca/0x250 [539762.054739] [<ffffffff95c5d72d>] do_filp_open+0x4d/0xb0 [539762.054744] [<ffffffff95c6b207>] ? __alloc_fd+0x47/0x170 [539762.054748] [<ffffffff95c49684>] do_sys_open+0x124/0x220 [539762.054751] [<ffffffff95c4979e>] SyS_open+0x1e/0x20 [539762.054755] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a [539882.054331] INFO: task python:1098178 blocked for more than 120 seconds. [539882.054334] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [539882.054335] python D ffff8835f8788640 0 1098178 1098093 0x00000004 [539882.054339] Call Trace: [539882.054367] [<ffffffffc09abfcf>] ? fuse_do_open+0x14f/0x1e0 [fuse] [539882.054371] [<ffffffff96181929>] schedule_preempt_disabled+0x29/0x70 [539882.054374] [<ffffffff9617f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [539882.054377] [<ffffffff9617ec8f>] mutex_lock+0x1f/0x2f [539882.054381] [<ffffffff95d28df6>] ima_file_check+0xa6/0x1b0 [539882.054385] [<ffffffff95c592ca>] do_last+0x59a/0x1290 [539882.054387] [<ffffffff95c5bdbd>] path_openat+0xcd/0x5a0 [539882.054391] [<ffffffff95c4737a>] ? __check_object_size+0x1ca/0x250 [539882.054394] [<ffffffff95c5d72d>] do_filp_open+0x4d/0xb0 [539882.054398] [<ffffffff95c6b207>] ? __alloc_fd+0x47/0x170 [539882.054400] [<ffffffff95c49684>] do_sys_open+0x124/0x220 [539882.054402] [<ffffffff95c4979e>] SyS_open+0x1e/0x20 [539882.054405] [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a Now - independently from this there is a degraded PG in the data pool (NOT in the metadata pool interestingly enough) [root@cephmon00 ~]# ceph pg dump | grep degraded dumped all 2.14b7 16292 0 1 0 0 25530224980 0 0 3037 3037 active+recovering+degraded 2020-02-26 08:03:40.561062 1492172'6549513 1492172:10385376 [1058,2098,734] 1058 [1058,2098,734] 1058 1492086'6537090 2020-02-22 20:06:11.092352 1485071'6362109 2020-01-29 21:48:02.821467 0 [root@cephmon00 ~]# ceph osd pool ls detail pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32768 pgp_num 32768 last_change 1484983 lfor 0/1355093 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 application cephfs pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 512 pgp_num 512 last_change 1484989 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 application cephfs pool 5 'rbd_vm' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 1487461 flags hashpspool,selfmanaged_snaps min_write_recency_for_promote 1 stripe_width 0 application rbd removed_snaps [1~3,5~2] pool 9 'cephfs_data_ec63' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 32768 pgp_num 32768 last_change 1484986 lfor 0/1355095 flags hashpspool,ec_overwrites stripe_width 24576 application cephfs osd.1058 keeps complaining about a missing object and crc mismatch: ... 2020-02-26 08:05:44.453 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 missing primary copy of 2:ed28cfe5:::10048b87a3b.00000001:head, will try copies on 734,2098 2020-02-26 08:05:44.585 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 full-object read crc 0x91584ffd != expected 0xbd4cf65d on 2:ed28cfe5:::10048b87a3b.00000001:head 2020-02-26 08:05:44.609 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 full-object read crc 0x91584ffd != expected 0xbd4cf65d on 2:ed28cfe5:::10048b87a3b.00000001:head 2020-02-26 08:05:44.609 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 missing primary copy of 2:ed28cfe5:::10048b87a3b.00000001:head, will try copies on 734,2098 2020-02-26 08:05:44.717 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 full-object read crc 0x91584ffd != expected 0xbd4cf65d on 2:ed28cfe5:::10048b87a3b.00000001:head 2020-02-26 08:05:44.741 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 full-object read crc 0x91584ffd != expected 0xbd4cf65d on 2:ed28cfe5:::10048b87a3b.00000001:head 2020-02-26 08:05:44.741 7fffcb3fe700 -1 log_channel(cluster) log [ERR] : 2.14b7 missing primary copy of 2:ed28cfe5:::10048b87a3b.00000001:head, will try copies on 734,2098 ... When I stop this OSD, one of its secondaries takes over and then THAT osd complains about the exact same issue. What is most bizarre about this is that if I power cycle the client nodes that are stuck - the whole issue resolves itself, including the degraded PG. So somehow these stuck ceph-fuse processes are responsible for this degraded PG as well. Not quite sure how. This issue happened 4 times in the past few days on different nodes - so it is definitely not a hardware issue. Are there any known issues we're running into here, or is this a new problem? Andras

4 years, 1 month

1
0
0 0

Running MDS server on a newer version than monitoring nodes

by Martin Palma

Hi, is it possible to run MDS on a newer version than the monitoring nodes? I mean we run monitoring nodes on 12.2.10 and would like to upgrade the MDS to 12.2.13 is this possible? Best, Martin

4 years, 1 month

2
2
0 0

Ceph standby-replay metadata server: MDS internal heartbeat is not healthy

by Martin Palma

Hi all, today we observe that out of the sudden our standby-replay metadata server continuously writes the following logs: 2020-02-13 11:56:50.216102 7fd2ad229700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2020-02-13 11:56:50.287699 7fd2ad229700 0 mds.beacon.dcucmds401 Skipping beacon heartbeat to monitors (last acked 100.836s ago); MDS internal heartbeat is not healthy! and it's memory is growing until no memory is available any more and the service gets restarted and then stops. The funny thing is that on the active MDS we are not seeing these log messages and any increase of memory. We are running ceph version 12.2.10 on all nodes of our Ceph cluster. Any suggestions? Best, Martin

4 years, 1 month

2
2
0 0

next Ceph Meetup Berlin, Germany

by Robert Sander

Hi, The Ceph Berlin MeetUp is a community organized group that met bi-monthly in the past years: https://www.meetup.com/Ceph-Berlin/ The meetups start at 6 pm and consist of one presentation or talk and a following discussion. The discussion often takes place over dinner in a nearby restaurant if available. The next date would be March 23rd in four weeks, enough time to organize a meetup. Before fixing that date I would like to ask everyone if someone is willing to host our MeetUp. This is quite uncomplicated, we just need a room for up to 20 people and a beamer for a little talk. Catering is completely optional but very welcome. If March 23rd does not fit your schedule please suggest another day. So if you or your company in Berlin is able and willing to host the next Ceph meetup please contact me. If you have done something with Ceph in the last year and want to talk about it, please also do not hesitate to contact me. Kindest Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 93818 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

4 years, 1 month

1
0
0 0

Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

by Uday Bhaskar jalagam

Hello Team , I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my cephfs metadata pools , anyone can explain why would this pool getting into this state frequently and how could I prevent this in future ? # ceph health detail HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool 'cephfs01-metadata' Search the cluster log for 'Large omap object found' for more details. Thanks , Uday

4 years, 1 month

2
5
0 0

Limited performance

by Fabian Zimmermann

Hi, we currently creating a new cluster. This cluster is (as far as we can tell) an config-copy (ansible) of our existing cluster, just 5 years later - with new hardware (nvme instead of ssd, bigger disks, ...) The setup: * NVMe for Journals and "Cache"-Pool * HDD with NVMe Journals for "Data"-Pool * Cache-Pool as writeback-Tier on Data-Pool * We are using 12.2.13 without bluestore. If we run an rados benchmark against this pool, everything seems fine, but as soon as we start a fio-benchmark -<- [global] ioengine=rbd clientname=cinder pool=cinder rbdname=fio_test rw=write bs=4M [rbd_iodepth32] iodepth=32 ->- after some seconds the bandwidth drops to <15 MB/s and our hdd-disks are doing more IOs than our Journal-Disks. We also unconfigured the caching completely, but the issue remains. The output of "ceph osd pool stats" shows ~100 op/s, but our disks are doing: -<- Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 0.00 278.50 0.00 34.07 250.51 0.14 0.50 0.00 0.50 0.03 0.80 nvme1n1 0.00 0.00 0.00 64.00 0.00 7.77 248.50 0.01 0.22 0.00 0.22 0.03 0.20 sda 0.00 1.50 0.00 557.00 0.00 29.49 108.45 180.57 160.59 0.00 160.59 1.80 100.00 sdb 0.00 42.00 0.00 592.00 0.00 28.21 97.60 176.51 1105.79 0.00 1105.79 1.69 100.00 sdc 0.00 14.50 0.00 528.50 0.00 27.95 108.31 183.02 179.47 0.00 179.47 1.89 100.00 sde 0.00 134.50 0.00 223.50 0.00 14.05 128.72 17.38 60.05 0.00 60.05 0.89 20.00 sdg 0.00 76.00 0.00 492.00 0.00 26.32 109.54 191.81 1474.96 0.00 1474.96 2.03 100.00 sdf 0.00 0.00 0.00 491.50 0.00 26.76 111.49 176.55 326.05 0.00 326.05 2.03 100.00 sdh 0.00 0.00 0.00 548.50 0.00 26.71 99.75 204.39 327.57 0.00 327.57 1.82 100.00 sdi 0.00 112.00 0.00 526.00 0.00 23.15 90.14 158.32 1325.61 0.00 1325.61 1.90 100.00 sdj 0.00 12.00 0.00 641.00 0.00 34.78 111.13 185.51 278.29 0.00 278.29 1.56 100.00 sdk 0.00 23.50 0.00 399.50 0.00 20.38 104.46 166.77 461.67 0.00 461.67 2.50 100.00 sdl 0.00 267.00 0.00 498.50 0.00 34.46 141.58 200.37 490.80 0.00 490.80 2.01 100.00 ->- Any hints how to debug the issue? Thanks a lot, Fabian

4 years, 1 month

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2020