August 2023 - ceph-users

by Christian Rohmann

On 11/05/2022 23:21, Joost Nieuwenhuijse wrote: > After a reboot the OSD turned out to be corrupt. Not sure if > ceph-volume lvm new-db caused the problem, or failed because of > another problem. I just ran into the same issue trying to add a db to an existing OSD. Apparently this is a known bug: https://tracker.ceph.com/issues/55260 It's already fixed master, but the backports are all still pending ... Regards Christian

9 months, 1 week

1
1
0 0

how to set load balance on multi active mds?

by zxcs

Hi, experts, we have a product env build with ceph version 16.2.11 pacific, and using CephFS. Also enable multi active mds(more than 10), but we usually see load unbalance on our client request with these mds. see below picture. the top 1 mds has 32.2k client request. and the last one only 331. this always lead our cluster into very bad situation. say many MDS report slow requests… ... 7 MDSs report slow requests 1 MDSs behind on trimming … So our question is how to set those mdss load balance? Could any one please help to shed some light here? Thanks a ton! Thanks, xz

9 months, 1 week

3
4
0 0

Ceph bucket notification events stop working

by daniel.yordanov1＠swisscom.com

Hello, We started to use the Ceph bucket notification events with subscription to an HTTP endpoint. We encountered an issue when the receiver endpoint was changed. Which means the events from Ceph weren't consumed. We deleted the bucket notifications and the topic, and created a new topic with the new endpoint and new bucket notifications. (We are using the REST api to create bucket notifications and topics. We also used the CLI commands, but there we found out that deleting a topic doesn't delete the notifications that are subscribed to it. Ceph version is Pacific.) From that moment we didn't receive any more notification events to our new endpoint. We tried many times to create new topics and new bucket notifications, but we don't receive anymore events to our endpoint. We suspect that the notification queues don't get fully cleaned and they stay in some broken state. We have been able to reproduce this locally and the only solution was to wipe all the containers and recreate them. The problem is that this issue is on a staging environment where we cannot destroy everything. We are looking for a solution or a command to clean the notification queues, to be able to start anew. We also are looking for a way to know programatically if the notifications broke and have a way to automatically recover as such a flaw is critical for our application. Thanks for your time! Daniel Yordanov

9 months, 1 week

2
2
0 0

libcephfs init hangs, is there a 'timeout' argument?

by Harry G Coin

Libcephfs's 'init' call hangs when passed arguments that once worked normally, but later refer to a cluster that's either broken, is on its way out of service, has too few mons, etc. At least the python libcephfs wrapper hangs on init. Of course mount and session timeouts work, but is there a way to error out a failed init call and not just hang the client? Thanks!

9 months, 2 weeks

1
0
0 0

Ceph Leadership Team Meeting: 2023-08-09 Minutes

by Patrick Donnelly

Today we discussed: - Delegating more privileges for internal hardware to allow on-call folks to fix issues. - Maybe using CephFS for the teuthology VM /home directory (it became full on Friday night) - Preparation for Open Source Day: we are seeking "low-hanging-fruit" tickets for new developers to try fixing. - Reef is released! Time for blog posts. We are gathering options from PTLs. - Ceph organization Github plan migration from the "bronze legacy plan" to the FOSS "free" plan. There is some uncertainty about surprise drawbacks, Ernesto is continuing his investigation. - Case is updating contributors to generate accurate credits for the new reef release: https://github.com/ceph/ceph/pull/52868 -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

9 months, 2 weeks

1
0
0 0

Backfill Performance for

by Jonathan Suever

I am in the process of expanding our cluster capacity by ~50% and have noticed some unexpected behavior during the backfill and recovery process that I'd like to understand and see if there is a better configuration that will yield a faster and smoother backfill. Pool Information: OSDs: 243 spinning HDDs PGs: 1024 (yes, this is low for our number of disks) I inherited the cluster and it has the following settings which seem to have been done in an attempt to get the cluster to recover quickly: osd_max_backfills: 6 (default is 1) osd_recovery_sleep_hdd: 0.0 (default is 0.1) osd_recovery_max_active_hdd: 9 When watching the PGs recover I am noticing a few things: - All PGs seem to be backfilling at the same time which seems to be in violation of osd_max_backfills. I understand that there should be 6 readers and 6 writers at a time, but I'm seeing a given OSD participate in more than 6 PG backfills. Is an OSD only considered as backfilling if it is not present in both the UP and ACTING groups (e.g. it will have it's data altered)? - Some PGs are recovering at a much slower rate than others (some as little as kilobytes per second) despite the disks being all of a similar speed. Is there some way to dig into why that may be? - In general, the recovery is happening very slowly (between 1 and 5 objects per second per PG). Is it possible the settings above are too aggressive and causing performance degradation due to disk thrashing? - Currently, all misplaced PGs are backfilling, if I were to change some of the settings above (specifically `osd_max_backfills`) would that essentially pause backfilling PGs or will those backfills have to end and then start over when it is done waiting? - Given that all PGs are backfilling simultaneously there is no way to prioritize one PG over another (we have some disks with very high usage that we're trying to reduce). Would reducing those max backfills allow for proper prioritization of PGs with force-backfill? - We have had some OSDs restart during the process and their misplaced object count is now zero but they are incrementing their recovering objects bytes. Is that expected and is there a way to estimate when that will complete? Thanks for the help! -Jonathan

9 months, 2 weeks

2
1
0 0

Re: cephfs snapshot mirror peer_bootstrap import hung

by Venky Shankar

Hi Anantha, On Mon, Aug 7, 2023 at 11:52 PM Adiga, Anantha <anantha.adiga(a)intel.com> wrote: > > Hi Venky, > > > > I tried on another secondary Quincy cluster and it is the same problem. The peer_bootstrap mport command hangs. A pacific cluster generated peer token should be importable in a quincy source cluster. Looking at the logs, I suspect that the perceived hang is the mirroring module blocked on connecting to the secondary cluster (to set mirror info xattr). Are you able to connect to the secondary cluster from the host running ceph-mgr on the primary cluster using its monitor address (and a key)? > > > > > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ== > > …… > > ……. > > ..command does not complete..waits here > > ^C to exit. > > Thereafter some commands do not complete… > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 2d) > > mgr: fl31ca104ja0201.kkoono(active, since 3d), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 2d), 44 in (since 5w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.9 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 KiB/s rd, 0 B/s wr, 33 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph fs status cephfs > > This command also waits. …… > > > > I have attached the mgr log > > root@fl31ca104ja0201:/# ceph service status > > { > > "cephfs-mirror": { > > "5306346": { > > "status_stamp": "2023-08-07T17:35:56.884907+0000", > > "last_beacon": "2023-08-07T17:45:01.903540+0000", > > "status": { > > "status_json": "{\"1\":{\"name\":\"cephfs\",\"directory_count\":0,\"peers\":{}}}" > > } > > } > > > > Quincy secondary cluster > > > > root@a001s008-zz14l47008:/# ceph mgr module enable mirroring > > root@a001s008-zz14l47008:/# ceph fs authorize cephfs client.mirror_remote / rwps > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > root@a001s008-zz14l47008:/# ceph auth get client.mirror_remote > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > caps mds = "allow rwps fsname=cephfs" > > caps mon = "allow r fsname=cephfs" > > caps osd = "allow rw tag cephfs data=cephfs" > > root@a001s008-zz14l47008:/# > > root@a001s008-zz14l47008:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote shgR-site > > {"token": "eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ=="} > > root@a001s008-zz14l47008:/# > > > > Thank you, > > Anantha > > > > From: Adiga, Anantha > Sent: Friday, August 4, 2023 11:55 AM > To: Venky Shankar <vshankar(a)redhat.com>; ceph-users(a)ceph.io > Subject: RE: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Venky, > > > > Thank you so much for the guidance. Attached is the mgr log. > > > > Note: the 4th node in the primary cluster has smaller capacity drives, the other 3 nodes have the larger capacity drives. > > 32 ssd 6.98630 1.00000 7.0 TiB 44 GiB 44 GiB 183 KiB 148 MiB 6.9 TiB 0.62 0.64 40 up osd.32 > > -7 76.84927 - 77 TiB 652 GiB 648 GiB 20 MiB 3.0 GiB 76 TiB 0.83 0.86 - host fl31ca104ja0203 > > 1 ssd 6.98630 1.00000 7.0 TiB 73 GiB 73 GiB 8.0 MiB 333 MiB 6.9 TiB 1.02 1.06 54 up osd.1 > > 4 ssd 6.98630 1.00000 7.0 TiB 77 GiB 77 GiB 1.1 MiB 174 MiB 6.9 TiB 1.07 1.11 55 up osd.4 > > 7 ssd 6.98630 1.00000 7.0 TiB 47 GiB 47 GiB 140 KiB 288 MiB 6.9 TiB 0.66 0.68 51 up osd.7 > > 10 ssd 6.98630 1.00000 7.0 TiB 75 GiB 75 GiB 299 KiB 278 MiB 6.9 TiB 1.05 1.09 44 up osd.10 > > 13 ssd 6.98630 1.00000 7.0 TiB 94 GiB 94 GiB 1018 KiB 291 MiB 6.9 TiB 1.31 1.36 72 up osd.13 > > 16 ssd 6.98630 1.00000 7.0 TiB 31 GiB 31 GiB 163 KiB 267 MiB 7.0 TiB 0.43 0.45 49 up osd.16 > > 19 ssd 6.98630 1.00000 7.0 TiB 14 GiB 14 GiB 756 KiB 333 MiB 7.0 TiB 0.20 0.21 50 up osd.19 > > 22 ssd 6.98630 1.00000 7.0 TiB 105 GiB 104 GiB 1.3 MiB 313 MiB 6.9 TiB 1.46 1.51 48 up osd.22 > > 25 ssd 6.98630 1.00000 7.0 TiB 17 GiB 16 GiB 257 KiB 272 MiB 7.0 TiB 0.23 0.24 45 up osd.25 > > 28 ssd 6.98630 1.00000 7.0 TiB 72 GiB 72 GiB 6.1 MiB 180 MiB 6.9 TiB 1.01 1.05 43 up osd.28 > > 31 ssd 6.98630 1.00000 7.0 TiB 47 GiB 46 GiB 592 KiB 358 MiB 6.9 TiB 0.65 0.68 56 up osd.31 > > -9 64.04089 - 64 TiB 728 GiB 726 GiB 17 MiB 1.8 GiB 63 TiB 1.11 1.15 - host fl31ca104ja0302 > > 33 ssd 5.82190 1.00000 5.8 TiB 65 GiB 65 GiB 245 KiB 144 MiB 5.8 TiB 1.09 1.13 47 up osd.33 > > 34 ssd 5.82190 1.00000 5.8 TiB 14 GiB 14 GiB 815 KiB 83 MiB 5.8 TiB 0.24 0.25 55 up osd.34 > > 35 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 224 KiB 213 MiB 5.7 TiB 1.30 1.34 44 up osd.35 > > 36 ssd 5.82190 1.00000 5.8 TiB 117 GiB 117 GiB 8.5 MiB 284 MiB 5.7 TiB 1.96 2.03 52 up osd.36 > > 37 ssd 5.82190 1.00000 5.8 TiB 58 GiB 58 GiB 501 KiB 132 MiB 5.8 TiB 0.98 1.01 40 up osd.37 > > 38 ssd 5.82190 1.00000 5.8 TiB 123 GiB 123 GiB 691 KiB 266 MiB 5.7 TiB 2.07 2.14 73 up osd.38 > > 39 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 609 KiB 193 MiB 5.7 TiB 1.30 1.34 62 up osd.39 > > 40 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 262 KiB 148 MiB 5.7 TiB 1.29 1.34 55 up osd.40 > > 41 ssd 5.82190 1.00000 5.8 TiB 44 GiB 44 GiB 4.4 MiB 140 MiB 5.8 TiB 0.75 0.77 44 up osd.41 > > 42 ssd 5.82190 1.00000 5.8 TiB 45 GiB 45 GiB 886 KiB 135 MiB 5.8 TiB 0.75 0.78 47 up osd.42 > > 43 ssd 5.82190 1.00000 5.8 TiB 28 GiB 28 GiB 187 KiB 104 MiB 5.8 TiB 0.48 0.49 58 up osd.43 > > > > [Also: Yesterday I had two cfs-mirror running one on fl31ca104ja0201 and fl31ca104ja0302. The cfs-mirror on fl31ca104ja0201 was stopped. When the import token was run on fl31ca104ja0302, the cfs-mirror log was active. Just in case it is useful I have attached that log (cfsmirror-container.log) as well. ] > > > > How can I list the token on the target cluster after running the create peer_bootstrap command? > > > > Here is today’s status with your suggestion: > > There is only one cfs-mirror daemon running now. It is on fl31ca104ja0201 node. > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 7m) > > mgr: fl31ca104ja0201.kkoono(active, since 13m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 7m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 MiB/s rd, 0 B/s wr, 57 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph tell mgr.fl31ca104ja0201.kkoono config set debug_mgr 20 > > { > > "success": "" > > } > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > ^CInterrupted > > > > Ctrl-C after 15 min. Once the command is run, the health status goes to WARN . > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_WARN > > 6 slow ops, oldest one blocked for 1095 sec, mon.fl31ca104ja0203 has slow ops > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 30m) > > mgr: fl31ca104ja0201.kkoono(active, since 35m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 29m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 67 KiB/s rd, 0 B/s wr, 68 op/s rd, 21 op/s wr > > > > > > -----Original Message----- > From: Venky Shankar <vshankar(a)redhat.com> > Sent: Thursday, August 3, 2023 11:03 PM > To: Adiga, Anantha <anantha.adiga(a)intel.com> > Cc: ceph-users(a)ceph.io > Subject: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Anantha, > > > > On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha <anantha.adiga(a)intel.com> wrote: > > > > > > Hi > > > > > > Could you please provide guidance on how to diagnose this issue: > > > > > > In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. > > > > > > Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. > > > > > > On the target cluster : > > > --------------------------- > > > "version": "16.2.5", > > > "release": "pacific", > > > "release_type": "stable" > > > > > > root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create > > > cephfs client.mirror_remote flex2-site > > > {"token": > > > "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJma > > > Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiw > > > gInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd > > > 1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTU > > > uNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6M > > > zMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} > > > > Seems fine uptil here. > > > > > root@cr21meg16ba0101:/var/run/ceph# > > > > > > On the source cluster: > > > ---------------------------- > > > "version": "17.2.6", > > > "release": "quincy", > > > "release_type": "stable" > > > > > > root@fl31ca104ja0201:/# ceph -s > > > cluster: > > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > > health: HEALTH_OK > > > > > > services: > > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) > > > mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 > > > mds: 1/1 daemons up, 2 standby > > > osd: 44 osds: 44 up (since 111m), 44 in (since 4w) > > > cephfs-mirror: 1 daemon active (1 hosts) > > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > > > data: > > > volumes: 1/1 healthy > > > pools: 25 pools, 769 pgs > > > objects: 614.40k objects, 1.9 TiB > > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > > pgs: 769 active+clean > > > > > > root@fl31ca104ja0302:/# ceph mgr module enable mirroring module > > > 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs > > > snapshot mirror peer_bootstrap import cephfs > > > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaW > > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwg > > > InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1 > > > h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUu > > > NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6Mz > > > MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > > > Going by your description, I'm guessing this is the command that hangs? If that's the case, set `debug_mgr=20`, repeat the token import step and share the ceph-mgr log. Also note that you can check the mirror daemon status as detailed in > > > > https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemon-status > > > > > > > > > > > root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon > > > /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { > > > "metadata": { > > > "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", > > > "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", > > > "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", > > > "hostname": "fl31ca104ja0302", > > > "pid": "7", > > > "root": "/" > > > }, > > > "dentry_count": 0, > > > "dentry_pinned_count": 0, > > > "id": 5194553, > > > "inst": { > > > "name": { > > > "type": "client", > > > "num": 5194553 > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > } > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > }, > > > "inst_str": "client.5194553 10.45.129.5:0/2497002034", > > > "addr_str": "10.45.129.5:0/2497002034", > > > "inode_count": 1, > > > "mds_epoch": 118, > > > "osd_epoch": 6266, > > > "osd_epoch_barrier": 0, > > > "blocklisted": false, > > > "fs_name": "cephfs" > > > } > > > > > > root@fl31ca104ja0302:/home/general# docker logs > > > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja030 > > > 2-sypagt --tail 10 debug 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 > > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable), process cephfs-mirror, pid 7 debug > > > 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 pidfile_write: ignore > > > empty --pid-file debug 2023-08-03T05:24:27.445+0000 7f8eb6fc0280 1 > > > mgrc service_daemon_register cephfs-mirror.5184622 metadata > > > {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 > > > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302, > > > container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c > > > 64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 6252 CPU > > > @ 2.10GHz,distro=centos,distro_description=CentOS Stream > > > 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt, > > > instance_id=5184622,kernel_description=#82-Ubuntu SMP Tue Jun 6 > > > 23:10:23 UTC > > > 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_total_kb > > > =527946928,os=Linux} debug 2023-08-03T05:27:10.419+0000 7f8ea1b2c700 > > > 0 client.5194553 ms_handle_reset on v2:10.45.128.141:3300/0 debug > > > 2023-08-03T05:50:10.917+0000 7f8ea1b2c700 0 client.5194553 > > > ms_handle_reset on v2:10.45.128.139:3300/0 > > > > > > Thank you, > > > Anantha > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > > email to ceph-users-leave(a)ceph.io > > > > > > > > > -- > > Cheers, > > Venky > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io -- Cheers, Venky

9 months, 2 weeks

2
6
0 0

RBD Disk Usage

by mahnoosh shahidi

Hi all, I have an rbd image that `rbd disk-usage` shows it has 31GB usage but in the filesystem `du` shows its usage is 40KB. Does anyone know the reason for this difference? Best Regards, Mahnoosh

9 months, 2 weeks

3
3
0 0

help, ceph fs status stuck with no response

by Zhang Bao

Hi, I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the monitor log, I can found something like `audit [DBG] from='client.431973 -' entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname", "target": ["mon-mgr", ""]}]: dispatch`. What happened and what should I do? -- ZhangBao +6585021702

9 months, 2 weeks

2
1
0 0

Problems with UFS / FreeBSD on rbd volumes?

by Roland Giesler

We have a FreeBSD 12.3 guest machine that works well on an RBD volume until it is live migrated to another node (on Proxmox). After migration, the processes almost all go into D state (waiting for this disk) and they don't exist from it (ie they don't "get" the disk the requested. I'm not sure what it causing this, so I'm asking here if anyone has come across such a problem?

9 months, 2 weeks

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users August 2023