January 2020 - ceph-users

Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

by Ilya Dryomov

On Tue, Jan 14, 2020 at 10:31 AM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > I think this is new since I upgraded to 14.2.6. kworker/7:3 100% > > [@~]# echo l > /proc/sysrq-trigger > > [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm: kworker/7:0 Not > tainted 3.10.0-1062.4.3.el7.x86_64 #1 > > [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn > [libceph] > [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti: ffffa0d3040f0000 > task.ti: ffffa0d3040f0000 > [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192e7e>] > [<ffffffffb0192e7e>] generic_swap+0x1e/0x30 > [Tue Jan 14 10:05:08 2020] RSP: 0018:ffffa0d3040f3a20 EFLAGS: 00000206 > [Tue Jan 14 10:05:08 2020] RAX: 0000000000000072 RBX: 0000000000000060 > RCX: 0000000000000072 > [Tue Jan 14 10:05:08 2020] RDX: 0000000000000006 RSI: ffffa0c788914a7a > RDI: ffffa0c788914a42 > [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a20 R08: 0000000000000028 > R09: 0000000000000016 > [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11: ffffe94984224500 > R12: ffffa0c788914a40 > [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14: ffffa0c788914a18 > R15: ffffa0c788914a78 > [Tue Jan 14 10:05:08 2020] FS: 0000000000000000(0000) > GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000 > [Tue Jan 14 10:05:08 2020] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3: 0000001813010000 > CR4: 00000000000627e0 > [Tue Jan 14 10:05:08 2020] Call Trace: > [Tue Jan 14 10:05:08 2020] [<ffffffffb0193055>] sort+0x1c5/0x260 > [Tue Jan 14 10:05:08 2020] [<ffffffffb0192e60>] ? u32_swap+0x10/0x10 > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d807b>] > build_snap_context+0x12b/0x290 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d820c>] > rebuild_snap_realms+0x2c/0x90 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d822b>] > rebuild_snap_realms+0x4b/0x90 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d91fc>] > ceph_update_snap_trace+0x3ec/0x530 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08e2239>] > handle_reply+0x359/0xc60 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08e48ba>] dispatch+0x11a/0xb00 > [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffb042e56a>] ? > kernel_recvmsg+0x3a/0x50 > [Tue Jan 14 10:05:08 2020] [<ffffffffc05fcff4>] try_read+0x544/0x1300 > [libceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffafee13ce>] ? > account_entity_dequeue+0xae/0xd0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafee4d5c>] ? > dequeue_entity+0x11c/0x5e0 > [Tue Jan 14 10:05:08 2020] [<ffffffffb042e417>] ? > kernel_sendmsg+0x37/0x50 > [Tue Jan 14 10:05:08 2020] [<ffffffffc05fdfb4>] > ceph_con_workfn+0xe4/0x1530 [libceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffb057f568>] ? > __schedule+0x448/0x9c0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebe21f>] > process_one_work+0x17f/0x440 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebf336>] > worker_thread+0x126/0x3c0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebf210>] ? > manage_workers.isra.26+0x2a0/0x2a0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec61f1>] kthread+0xd1/0xe0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec6120>] ? > insert_kthread_work+0x40/0x40 > [Tue Jan 14 10:05:08 2020] [<ffffffffb058cd37>] > ret_from_fork_nospec_begin+0x21/0x21 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec6120>] ? > insert_kthread_work+0x40/0x40 > > > > [Tue Jan 14 10:05:08 2020] CPU: 7 PID: 2909400 Comm: kworker/7:0 Not > tainted 3.10.0-1062.4.3.el7.x86_64 #1 > > [Tue Jan 14 10:05:08 2020] Workqueue: ceph-msgr ceph_con_workfn > [libceph] > [Tue Jan 14 10:05:08 2020] task: ffffa0d2cb9db150 ti: ffffa0d3040f0000 > task.ti: ffffa0d3040f0000 > [Tue Jan 14 10:05:08 2020] RIP: 0010:[<ffffffffb0192200>] > [<ffffffffb0192200>] __x86_indirect_thunk_rax+0x0/0x20 > [Tue Jan 14 10:05:08 2020] RSP: 0018:ffffa0d3040f3a28 EFLAGS: 00000286 > [Tue Jan 14 10:05:08 2020] RAX: ffffffffb0192e60 RBX: 0000000000000010 > RCX: 0000000000000010 > [Tue Jan 14 10:05:08 2020] RDX: 0000000000000008 RSI: ffffa0d0e108b828 > RDI: ffffa0d0e108b818 > [Tue Jan 14 10:05:08 2020] RBP: ffffa0d3040f3a98 R08: 0000000000000000 > R09: 0000000000000016 > [Tue Jan 14 10:05:08 2020] R10: 0000000000000036 R11: ffffe949a9842280 > R12: ffffa0d0e108b818 > [Tue Jan 14 10:05:08 2020] R13: ffffffffc08d7da0 R14: ffffa0d0e108b818 > R15: ffffa0d0e108b828 > [Tue Jan 14 10:05:08 2020] FS: 0000000000000000(0000) > GS:ffffa0d2cfbc0000(0000) knlGS:0000000000000000 > [Tue Jan 14 10:05:08 2020] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > [Tue Jan 14 10:05:08 2020] CR2: 000055f0404d5e40 CR3: 0000001813010000 > CR4: 00000000000627e0 > [Tue Jan 14 10:05:08 2020] Call Trace: > [Tue Jan 14 10:05:08 2020] [<ffffffffb0193055>] ? sort+0x1c5/0x260 > [Tue Jan 14 10:05:08 2020] [<ffffffffb0192e60>] ? u32_swap+0x10/0x10 > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d807b>] > build_snap_context+0x12b/0x290 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d820c>] > rebuild_snap_realms+0x2c/0x90 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d822b>] > rebuild_snap_realms+0x4b/0x90 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08d91fc>] > ceph_update_snap_trace+0x3ec/0x530 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08e2239>] > handle_reply+0x359/0xc60 [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffc08e48ba>] dispatch+0x11a/0xb00 > [ceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffb042e56a>] ? > kernel_recvmsg+0x3a/0x50 > [Tue Jan 14 10:05:08 2020] [<ffffffffc05fcff4>] try_read+0x544/0x1300 > [libceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffafee13ce>] ? > account_entity_dequeue+0xae/0xd0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafee4d5c>] ? > dequeue_entity+0x11c/0x5e0 > [Tue Jan 14 10:05:08 2020] [<ffffffffb042e417>] ? > kernel_sendmsg+0x37/0x50 > [Tue Jan 14 10:05:08 2020] [<ffffffffc05fdfb4>] > ceph_con_workfn+0xe4/0x1530 [libceph] > [Tue Jan 14 10:05:08 2020] [<ffffffffb057f568>] ? > __schedule+0x448/0x9c0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebe21f>] > process_one_work+0x17f/0x440 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebf336>] > worker_thread+0x126/0x3c0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafebf210>] ? > manage_workers.isra.26+0x2a0/0x2a0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec61f1>] kthread+0xd1/0xe0 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec6120>] ? > insert_kthread_work+0x40/0x40 > [Tue Jan 14 10:05:08 2020] [<ffffffffb058cd37>] > ret_from_fork_nospec_begin+0x21/0x21 > [Tue Jan 14 10:05:08 2020] [<ffffffffafec6120>] ? > insert_kthread_work+0x40/0x40 Hi Marc, It's busy processing snapshot contexts, I think it's more or less the same issue that you reported in "ceph node crashed with these errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)". Did "ceph: build_snap_context 100020c9287 ffff911a9a26bd00 fail -12" errors go away? How many snapshots do you have now? Did you file a tracker ticket? It's better to have all details in one place rather than scattered across different threads on the list. Adding Jeff and Zheng, who can take a closer look. Thanks, Ilya

4 years, 1 month

2
2
0 0

RGWReshardLock::lock failed to acquire lock ret=-16

by Josh Haft

Hi, Currently running Mimic 13.2.5. We had reports this morning of timeouts and failures with PUT and GET requests to our Ceph RGW cluster. I found these messages in the RGW log: RGWReshardLock::lock failed to acquire lock on bucket_name:bucket_instance ret=-16 NOTICE: resharding operation on bucket index detected, blocking block_while_resharding ERROR: bucket is still resharding, please retry Which were preceded by many of these, which I think are normal/expected. check_bucket_shards: resharding needed: stats.num_objects=6415879 shard max_objects=6400000 Our RGW cluster sits behind haproxy which notified me approx 90 seconds after the first 'resharding needed' message that no backends were available. It appears this dynamic reshard process caused the RGWs to lock up for a period of time. Roughly 2 minutes later the reshard error messages stop and operation returns to normal. Looking back through previous RGW logs, I see a similar event from about a week ago, on the same bucket. We have several buckets with shard counts exceeding 1k (this one only has 128), and much larger object counts, so clearly this isn't the first time dynamic sharding has been invoked on this cluster. Has anyone seen this? I expect it will come up again, and can turn up debugging if that'll help. Thanks for any assistance! Josh

4 years, 1 month

1
1
0 0

High CPU usage by ceph-mgr in 14.2.6

by jbardgett＠godaddy.com

After upgrading one of our clusters from Luminous 12.2.12 to Nautilus 14.2.6, I am seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). The way we found this was due to Prometheus being unable to report out certain pieces of data, specifically OSD Usage, OSD Apply and Commit Latency. Which are all similar issues people were having in previous versions of Nautilus. Bryan Stillwell reported this previously on a separate cluster, 14.2.5, we have here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VW3GNVJGOOW… That issue was resolved with the upgrade to 14.2.6. We are seeing a similar issue on this other cluster with a couple differences. This cluster has 1900+ OSD in it, the previous one had 300+ The top user is libceph-common, instead of mmap 4.86% libceph-common.so.0 [.] EventCenter::create_time_event 2.78% [kernel] [k] nmi 2.64% libstdc++.so.6.0.19 [.] __dynamic_cast On all our other clusters that have been upgraded to 14.2.6 we are not experiencing this issue, the next largest being 800+ OSD. We feel this is related to the size of the cluster, similarly to the previous report. Anyone else experiencing this and/or can provide some direction on how to go about resolving this? Thanks, Joe

4 years, 1 month

6
7
0 0

official ceph.com buster builds?

by Chad W Seys

Hi all, Am I missing the ceph buster build built by ceph.com ? http://download.ceph.com/debian-nautilus/dists/ Should I be using the Croit supplied builds? Thanks! Chad.

4 years, 1 month

2
2
0 0

ceph-mon using 100% CPU after upgrade to 14.2.5

by Bryan Stillwell

On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around. I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch. This seems too get the ceph-mon process into a state where it pegs a CPU core on one of the mons: 1764450 ceph 20 0 4802412 2.1g 16980 S 100.0 28.1 4:54.72 ceph-mon Has anyone else run into this with 14.2.5 yet? I didn't see this problem while the cluster was running 14.2.4. Thanks, Bryan

4 years, 1 month

5
8
0 0

slow using ISCSI - Help-me

by Gesiel Galvão Bernardes

Hi, I am having an unusual slowdown using VMware with ISCSI gws. I have two ISCSI gateways with two RBD images. I have checked the following in the logs: Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:26.040 969 [INFO] alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting lock acquisition operation.2019-12-24 09:00:26.040 969 [INFO] alua_implicit_transition:557 rbd/pool1.vmware_iscsi1: Lock acquisition operation is already in process.2019-12-24 09:00:26.973 969 [WARN] tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired exclusive lock. Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired exclusive lock. Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:28.099 969 [WARN] tcmu_notify_lock_lost:201 rbd/pool1.vmware_iscsi1: Async lock drop. Old state 1 Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_notify_lock_lost:201 rbd/pool1.vmware_iscsi1: Async lock drop. Old state 1 Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting lock acquisition operation. Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:28.824 969 [INFO] alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting lock acquisition operation.2019-12-24 09:00:28.990 969 [WARN] tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired exclusive lock. Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired exclusive lock. Can anyone help-me please? Gesiel

4 years, 2 months

2
14
0 0

ceph status reports: slow ops - this is related to long running process /usr/bin/ceph-osd

by Thomas

Hi, ceph status reports: root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_ERR 1 filesystem is degraded 1 filesystem has a failed mds daemon 1 filesystem is offline insufficient standby MDS daemons available 4 nearfull osd(s) 1 pool(s) nearfull Reduced data availability: 59 pgs inactive, 16 pgs peering Degraded data redundancy: 597/153910758 objects degraded (0.000%), 2 pgs degraded, 1 pg undersized Degraded data redundancy (low space): 23 pgs backfill_toofull 1 pgs not deep-scrubbed in time 4 pgs not scrubbed in time 3 pools have too many placement groups 164 slow requests are blocked > 32 sec 1082 stuck requests are blocked > 4096 sec 1490 slow ops, oldest one blocked for 19711 sec, daemons [osd,0,osd,175,osd,186,osd,5,osd,6,osd,63,osd,68,osd,9,mon,ld5505,mon,ld5506]... have slow ops. services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 5h) mgr: ld5507(active, since 5h), standbys: ld5506, ld5505 mds: pve_cephfs:0/1, 1 failed osd: 419 osds: 416 up, 416 in; 6024 remapped pgs data: pools: 6 pools, 8864 pgs objects: 51.30M objects, 196 TiB usage: 594 TiB used, 907 TiB / 1.5 PiB avail pgs: 0.666% pgs not active 597/153910758 objects degraded (0.000%) 52964415/153910758 objects misplaced (34.412%) 5954 active+remapped+backfill_wait 2786 active+clean 40 active+remapped+backfilling 35 activating 23 active+remapped+backfill_wait+backfill_toofull 16 peering 7 activating+remapped 1 activating+undersized+degraded 1 active+clean+scrubbing 1 active+recovering+degraded io: client: 3.5 KiB/s wr, 0 op/s rd, 0 op/s wr recovery: 551 MiB/s, 137 objects/s I'm concerned about the slow ops on osd.0 and osd.9. On the relevant OSD node I can see 2 relevant services running for hours: ceph 14795 1 99 09:58 ? 08:49:22 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph ceph 15394 1 99 09:58 ? 07:10:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph In the relevant osd log I can find similar messages: root@ld5505:~# tail -f /var/log/ceph/ceph-osd.0.log 2019-10-08 15:35:32.830 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:33.806 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:34.842 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:35.862 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) root@ld5505:~# tail -f /var/log/ceph/ceph-osd.9.log 2019-10-08 15:35:38.822 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:39.854 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:40.850 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:41.862 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) Question: How can I analyse and solve the issue with slow ops? THX

4 years, 2 months

3
2
0 0

RBD-mirror instabilities

by Oliver Freyermuth

Dear Cephalopodians, running 13.2.6 on the source cluster and 14.2.5 on the rbd mirror nodes and the target cluster, I observe regular failures of rbd-mirror processes. With failures, I mean that traffic stops, but the daemons are still listed as active rbd-mirror daemons in "ceph -s", and the daemons are still running. This comes in sync with a hefty load of below messages in the mirror logs. This happens "sometimes" when some OSDs go down and up in the target cluster (which happens each night since the disks in that cluster shortly go offline during "online" smart self-tests - that's a problem in itself, but it's a cluster built from hardware that would have been trashed otherwise). The rbd daemons keep running in any case, but synchronization stops. If not all rbd mirror daemons have failed (we have three running, and it usually does not hit all of them), the "surviving" seem(s) not to take care of the images the other daemons had locked. Right now, I am eyeing with a "quick solution" of regularly restarting the rbd-mirror daemons, but if there are any good ideas on which debug info I could collect to get this analyzed and fixed, that would of course be appreciated :-). Cheers, Oliver ----------------------------------------------- 2019-12-24 02:08:51.379 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb968d00 [2/aabba863-89fd-4ea5-bb8c-0f417225d394] handle_process_entry_safe: failed to commit journal event: (108) Cannot send after transport endpoint shutdown 2019-12-24 02:08:51.379 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb968d00 [2/aabba863-89fd-4ea5-bb8c-0f417225d394] handle_replay_complete: replay encountered an error: (108) Cannot send after transport endpoint shutdown ... 2019-12-24 02:08:54.392 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb87bb00 [2/23699357-a611-4557-9d73-6ff5279da991] handle_process_entry_safe: failed to commit journal event: (125) Operation canceled 2019-12-24 02:08:54.392 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb87bb00 [2/23699357-a611-4557-9d73-6ff5279da991] handle_replay_complete: replay encountered an error: (125) Operation canceled 2019-12-24 02:08:55.707 7f31ea358700 -1 rbd::mirror::image_replayer::GetMirrorImageIdRequest: 0x559dce2e05b0 handle_get_image_id: failed to retrieve image id: (108) Cannot send after transport endpoint shutdown 2019-12-24 02:08:55.707 7f31ea358700 -1 rbd::mirror::image_replayer::GetMirrorImageIdRequest: 0x559dcf47ee70 handle_get_image_id: failed to retrieve image id: (108) Cannot send after transport endpoint shutdown ... 2019-12-24 02:08:55.716 7f31f5b6f700 -1 rbd::mirror::ImageReplayer: 0x559dcb997680 [2/f8218221-6608-4a2b-8831-84ca0c2cb418] operator(): start failed: (108) Cannot send after transport endpoint shutdown 2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=0577bd16-acc4-4e9a-81f0-c698a24f8771: blacklisted detected during image replay 2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=05bd4cca-a561-4a5c-ad83-9905ad5ce34e: blacklisted detected during image replay 2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=0e614ece-65b1-4b4a-99bd-44dd6235eb70: blacklisted detected during image replay -----------------------------------------------

4 years, 2 months

2
6
0 0

Write i/o in CephFS metadata pool

by Samy Ascha

Hi! I've been running CephFS for a while now and ever since setting it up, I've seen unexpectedly large write i/o on the CephFS metadata pool. The filesystem is otherwise stable and I'm seeing no usage issues. I'm in a read-intensive environment, from the clients' perspective and throughput for the metadata pool is consistently larger than that of the data pool. For example: # ceph osd pool stats pool cephfs_data id 1 client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr pool cephfs_metadata id 2 client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr I realise, of course, that this is a momentary display of statistics, but I see this unbalanced r/w activity consistently when monitoring it live. I would like some insight into what may be causing this large imbalance in r/w, especially since I'm in a read-intensive (web hosting) environment. Some of it may be expected in when considering details of my environment and CephFS implementation specifics, so please ask away if more details are needed. With my experience using NFS, I would start by looking at client io stats, like `nfsstat` and tuning e.g. mount options, but I haven't been able to find such statistics for CephFS clients. Is there anything of the sort for CephFS? Are similar stats obtainable in some other way? This might be a somewhat broad question and shallow description, so yeah, let me know if there's anything you would like more details on. Thanks a lot, Samy

4 years, 2 months

5
8
0 0

getting rid of incomplete pg errors

by Hartwig Hauschild

Hi. before I descend into what happened and why it happened: I'm talking about a test-cluster so I don't really care about the data in this case. We've recently started upgrading from luminous to nautilus, and for us that means we're retiring ceph-disk in favour of ceph-volume with lvm and dmcrypt. Our setup is in containers and we've got DBs separated from Data. When testing our upgrade-path we discovered that running the host on ubuntu-xenial and the containers on centos-7.7 leads to lvm inside the containers not using lvmetad because it's too old. That in turn means that not running `vgscan --cache` on the host before adding a LV to a VG essentially zeros the metadata for all LVs in that VG. That happened on two out of three hosts for a bunch of OSDs and those OSDs are gone. I have no way of getting them back, they've been overwritten multiple times trying to figure out what went wrong. So now I have a cluster that's got 16 pgs in 'incomplete', 14 of them with 0 objects, 2 with about 150 objects each. I have found a couple of howtos that tell me to use ceph-objectstore-tool to find the pgs on the active osds and I've given that a try, but ceph-objectstore-tool always tells me it can't find the pg I am looking for. Can I tell ceph to re-init the pgs? Do I have to delete the pools and recreate them? There's no data I can't get back in there, I just don't feel like scrapping and redeploying the whole cluster. -- Cheers, Hardy

4 years, 2 months

3
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2020