June 2021 - ceph-users - lists.ceph.io

by E Taka

If I understand the documentation for the placements in "ceph orch apply" correctly, I can place the daemons by number or on specific host. But what I want is: "Start 3 mgr services, and one of it should be started on node ceph01." How I can achieve this? Thanks!

2 years, 10 months

1
0
0 0

pacific installation at ubuntu 20.04

by Jana Markwort

Hi all, I'm a new ceph user and try to install my first cluster. I try to install pacific but as result I get octopus. What's wrong here? I've done: # curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm # chmod +x cephadm # ./cephadm add-repo --release pacific # ./cephadm install # cephadm install ceph-common # ceph -v ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) # cat /etc/apt/sources.list.d/ceph.list deb https://download.ceph.com/debian-pacific/ focal main ?? Kind regards, Jana

2 years, 10 months

2
3
0 0

Re: In "ceph health detail", what's the diff between MDS_SLOW_METADATA_IO and MDS_SLOW_REQUEST?

by Patrick Donnelly

On Mon, Jun 21, 2021 at 8:13 PM opengers <zijian1012(a)gmail.com> wrote: > > Thanks for the answer, I still have some confusion when I see the explanation of "MDS_SLOW_REQUEST" from the document , as follows > ------ > MDS_SLOW_REQUEST > > Message > “N slow requests are blocked” > > Description > One or more client requests have not been completed promptly, indicating that the MDS is either running very slowly, or that the RADOS cluster is not acknowledging journal writes promptly, or that there is a bug. Use the ops admin socket command to list outstanding metadata operations. This message appears if any client requests have taken longer than mds_op_complaint_time (default 30s). > > FROM: https://docs.ceph.com/en/latest/cephfs/health-messages/ > ------ > > "or that the RADOS cluster is not acknowledging journal writes promptly", from this sentence, it seems that "MDS_SLOW_REQUEST" also contains OSD operations by the MDS? Yes. If you have slow metadata IO warnings you will likely also have slow request warnings. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

2 years, 10 months

1
0
0 0

How to stop a rbd migration and recover

by Gilles Mocellin

Hello, As a follow-up of the thread "RBD migration between 2 EC pools : very slow". I'm running Octopus 15.2.13. RBD migration seems really fragile. I started a migration to change the data pool (from an EC 3+2 to an EC 8+2) : - rbd migration prepare - rbd migration execute => 4% after 6h, and no progress 12h later - rbd migration abort => does not return After that, the state of the migration on destination image is "unknown'. rbd info on source (in trash) and destination image shows migrating in the features. Debbugging another abord command shows that it tries to take/put a lock, and there's one already, due to either the cacncelled execute or the subsequent abort. rbd lock delete does not work, I have a strange message about a read only filesystem... So, I'm stuck. My production on the source image is stopped so far, because I use krbd and I must terminate the migration (either commit or abort) before being able to map and mount it. What I think I'm just understanding, is that abort is meant to be used only when the migration execution is finished, like commit ? Am I wrong ? If so, How can we stop an ongoinfg migration, and also, How do we recover the source image ? Do I need to : - delete the destination image - restore the source image from the trash What about the migrating feature that forbid krbd to map the image ?

2 years, 10 months

1
0
0 0

Octopus 15.2.8 slow ops causing inactive PGs upon disk replacement

by Justin Goetz

Hello! We are in the process of expanding our CEPH cluster (by both adding OSD hosts and replacing smaller-sized HDDs on our existing hosts). So far we have gone host by host, removing the old OSDs, swapping the physical HDDs, and re-adding them. This process has gone smooth, aside from one issue: upon any action taken on the cluster (adding new OSDs, replacing old, etc), we have PGs get stuck "activating"which causes around 3.5% of PGs go inactive, causing IO to stop. Here is a current look at our ceph -s command: cluster: id: e8ffe2eb-f8fc-4110-a4bc-1715e878fb7b health: HEALTH_WARN Reduced data availability: 166 pgs inactive Degraded data redundancy: 137153907/3658405707 objects degraded (3.749%), 930 pgs degraded, 928 pgs undersized 10 pgs not deep-scrubbed in time 33709 slow ops, oldest one blocked for 35956 sec, daemons [osd.103,osd.104,osd.105,osd.106,osd.107,osd.109,osd.111,osd.112,osd.113,osd.114]... have slow ops. services: mon: 3 daemons, quorum lb3,lb2,lb1 (age 8w) mgr: lb1(active, since 6w), standbys: lb3, lb2 osd: 117 osds: 117 up (since 15m), 117 in (since 10h); 2033 remapped pgs rgw: 3 daemons active (lb1.rgw0, lb2.rgw0, lb3.rgw0) task status: data: pools: 8 pools, 5793 pgs objects: 609.74M objects, 169 TiB usage: 308 TiB used, 430 TiB / 738 TiB avail pgs: 2.866% pgs not active 137153907/3658405707 objects degraded (3.749%) 262215404/3658405707 objects misplaced (7.167%) 3754 active+clean 963 active+remapped+backfill_wait 892 active+undersized+degraded+remapped+backfill_wait 136 activating+remapped 27 activating+undersized+degraded+remapped 8 active+undersized+degraded+remapped+backfilling 6 active+clean+scrubbing+deep 3 activating+degraded+remapped 3 active+remapped+backfilling 1 active+undersized+remapped+backfill_wait io: client: 94 KiB/s rd, 94 op/s rd, 0 op/s wr recovery: 112 MiB/s, 372 objects/s progress: Rebalancing after osd.20 marked in (10h) [............................] (remaining: 11d) Rebalancing after osd.41 marked in (10h) [=...........................] (remaining: 8d) Rebalancing after osd.30 marked in (10h) [=...........................] (remaining: 9d) Rebalancing after osd.1 marked in (10h) [=======.....................] (remaining: 2h) Rebalancing after osd.10 marked in (10h) [............................] (remaining: 12d) Rebalancing after osd.50 marked in (10h) [............................] (remaining: 2w) Rebalancing after osd.71 marked out (10h) [==..........................] (remaining: 5d) What you may find interesting is the "slow ops" warnings. This is where our inactive PGs become stuck. Once the cluster gets into this state, I'm able to recover IO usually by restarting the OSDs with slow ops. However, what's extremely strange, is this workaround only works after about 12 hours since the last OSD addition. Restarting the slow ops OSDs before roughly 12 hours results in the slow ops returning immediately. Our first thought was hardware issues, however we ruled this out after the slow ops warnings appeared on brand new HDDs and OSD hosts. Monitoring the IO saturation of the OSDs reporting slow ops shows actual usage nowhere near saturation, and no hardware issues are present on the drives themselves. Looking at the journalctl logs of one of the affected OSDs above, we see the following repeated multiple times: osd.103 56934 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.467952.0:1520304537 8.6fbs0 8.1e6826fb (undecoded) ondisk+retry+write+known_if_redirected e56923 So far my procedure for the disk swaps have been as follows: 1. Set noout,norebalance, and norecover on the cluster. 2. Use ceph-ansible to remove the old disk OSD IDs 3. Swap physical HDDs, re-add with ceph-ansible 4. Unset noout,norebalance,norecover I should note this issue appears even with simple OSD additions (not removals), as we added 2 brand new hosts to the cluster and saw the same issue. I've been trying to think of any possible cause of this issue, I should mention our cluster is messy at the moment hardware-wise (we have a mix of 7T HDDs, 4T HDDs, and 10T HDDs - moving to all 10T HDDs but the process to swap has been taking a while). One warning I've noticed during the old disk removals is a warning about too many PGs per OSD, however this warning clears once the new OSDs are added, which is to be expected I assume. If anyone would be willing to provide any hints of where to look, it would be much appreciated! Thanks for your time. -- Justin Goetz Systems Engineer, TeraSwitch Inc. jgoetz(a)teraswitch.com 412-945-7045 (NOC) | 412-459-7945 (Direct)

2 years, 10 months

2
2
0 0

How can I check my rgw quota ?

by Massimo Sgaravatto

Sorry for the very naive question: I know how to set/check the rgw quota for a user (using radosgw-admin) But how can a radosgw user check what is the quota assigned to his/her account , using the S3 and/or the swift interface ? I don't get this information using "swift stat", and I can't find a s3cmd quota related command ... Thanks, Massimo

2 years, 10 months

3
2
0 0

when is krbd on osd nodes starting to get problematic?

by Marc

From what kernel / ceph version is krbd usage on a osd node problematic? Currently I am running Nautilus 14.2.11 and el7 3.10 kernel without any issues. I can remember using a cephfs mount without any issues as well, until some specific luminous update surprised me. So maybe nice to know when to expect this. > -----Original Message----- > Sent: Wednesday, 23 June 2021 11:25 > Subject: *****SPAM***** [ceph-users] Re: Can not mount rbd device > anymore > > On Wed, Jun 23, 2021 at 9:59 AM Matthias Ferdinand > wrote: > > > > On Tue, Jun 22, 2021 at 02:36:00PM +0200, Ml Ml wrote: > > > Hello List, > > > > > > oversudden i can not mount a specific rbd device anymore: > > > > > > root@proxmox-backup:~# rbd map backup-proxmox/cluster5 -k > > > /etc/ceph/ceph.client.admin.keyring > > > /dev/rbd0 > > > > > > root@proxmox-backup:~# mount /dev/rbd0 /mnt/backup-cluster5/ > > > (just never times out) > > > > > > Hi, > > > > there used to be some kernel lock issues when the kernel rbd client > > tried to access an OSD on the same machine. Not sure if these issues > > still exist (but I would guess so) and if you use your proxmox cluster > > in a hyperconverged manner (nodes providing VMs and storage service at > > the same time) you may just have been lucky that it had worked before. > > > > Instead of the kernel client mount you can try to export the volume as > > an NBD device (https://docs.ceph.com/en/latest/man/8/rbd-nbd/) and > > mounting that. rbd-nbd runs in userspace and should not have that > > locking problem. > > rbd-nbd is also susceptible to locking up in such setups, likely more > so than krbd. Don't forget that it also has a kernel component and > there are actually more opportunities for things to go sideways/lock up > because there is an extra daemon involved allocating some additional > memory for each I/O request. > > Thanks, > > Ilya > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 10 months

2
1
0 0

RBD migration between 2 EC pools : very slow

by Gilles Mocellin

Hello Cephers, On a capacitive Ceph cluster (13 nodes, 130 OSDs 8To HDD), I'm migrating a 40 To image from a 3+2 EC pool to a 8+2 one. The use case is Veeam backup on XFS filesystems, mounted via KRBD. Backups are running, and I can see 200MB/s Throughput. But my migration (rbd migrate prepare / execute) is staling at 4% for 6h now. When the backups are not running, I can see a little 20MB/s of throughput, certainly my migration. I need a month to migrate 40 to at that speed ! As I use a KRBD client, I cannot remap the rbd image straight after the rbd prepare. So the filesystem is not usable until the migration is completed. Not really usable for me... Is anyone has a clue either to speed up the rbd migration, or another method to move/copy an image between 2 pools, with the minimum downtime ? I thought of rbd export-diff | rbd import-diff, while mounted, and another unmapped before switching... But, it forces me to rename my image, because if I use another data pool, the metadata pool stays the same. Can you see another method ? -- Gilles

2 years, 10 months

2
3
0 0

Ceph rbd-nbd performance benchmark

by Bobby

Hi, I am trying to benchmark the Ceph rbd-nbd performance. Are there any authentic existing benchmark results of rbd-nbd for comparison? BR Bobby

2 years, 10 months

1
0
0 0

HDD <-> OSDs

by Thomas Roth

Hi all, newbie question: The documentation seems to suggest that with ceph-volume, one OSD is created for each HDD (cf. 4-HDD-example in https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/) This seems odd: what if a server has a finite number of disks? I was going to try cephfs on ~10 servers with 70 HDD each. That would make each system having to deal with 70 OSDs, on 70 LVs? Really no aggregation of the disks? Regards, Thomas -- -------------------------------------------------------------------- Thomas Roth Department: IT GSI Helmholtzzentrum für Schwerionenforschung GmbH www.gsi.de

2 years, 10 months

7
8
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021