October 2020 - ceph-users

by Mac Wynkoop

Just making sure this makes the list: Mac Wynkoop ---------- Forwarded message --------- From: 胡玮文 <huww98(a)outlook.com> Date: Wed, Oct 7, 2020 at 9:00 PM Subject: Re: pool pgp_num not updated To: Mac Wynkoop <mwynkoop(a)netdepot.com> Hi, You can read about this behavior at https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/ In short, ceph will not increase pgp_num if misplaced > 5% (by default), and once you got misplaced < 5%, it will increase pgp_num gradually, until reaching the value you set. This 5% can be configured by target_max_misplaced_ratio config option. 在 2020年10月8日，03:22，Mac Wynkoop <mwynkoop(a)netdepot.com> 写道： Well, backfilling sure, but will it allow me to actually change the pgp_num as more space frees up? Because the issue is that I cannot modify that value. Thanks, Mac Wynkoop, Senior Datacenter Engineer *NetDepot.com:* Cloud Servers; Delivered Houston | Atlanta | NYC | Colorado Springs 1-844-25-CLOUD Ext 806 On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock(a)nde.ag> wrote: Yes, I think that’s exactly the reason. As soon as the cluster has more space the backfill will continue. Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>: The cluster is currently in a warn state, here's the scrubbed output of ceph -s: *cluster: id: *redacted* health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) 2 pool(s) nearfull Low space hindering backfill (add storage if this doesn't resolve itself): 277 pgs backfill_toofull Degraded data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in time 2647 pgs not scrubbed in time 2 daemons have recently crashed services: mon: 5 daemons, *redacted* (age 44h) mgr: *redacted* osd: 162 osds: 162 up (since 44h), 162 in (since 4d); 971 remapped pgs flags noscrub,nodeep-scrub rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons active *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail pgs: 32652738/3651947772 objects degraded (0.894%) 517370913/3651947772 objects misplaced (14.167%) 1677 active+clean 477 active+remapped+backfill_wait 100 active+remapped+backfill_wait+backfill_toofull 80 active+undersized+degraded+remapped+backfill_wait 60 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 42 active+undersized+degraded+remapped+backfill_toofull 33 active+undersized+degraded+remapped+backfilling 25 active+remapped+backfilling 25 active+remapped+backfill_toofull 24 active+undersized+remapped+backfilling 23 active+forced_recovery+undersized+degraded+remapped+backfill_wait 19 active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull 15 active+undersized+remapped+backfill_wait 14 active+undersized+remapped+backfill_wait+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfilling 5 active+undersized+remapped+backfill_toofull 3 active+remapped 1 active+undersized+remapped 1 active+forced_recovery+undersized+remapped+backfilling io: client: 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: 425 MiB/s, 225 objects/s* Now as you can see, we do have a lot of backfill operations going on at the moment. Does that actually prevent Ceph from modifying the pgp_num value of a pool? Thanks, Mac Wynkoop On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote: What is the current cluster status, is it healthy? Maybe increasing pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph -s' output? Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>: Right, both Norman and I set the pg_num before the pgp_num. For example, here is my current pool settings: *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458830 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw"* So, when I set: "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" it returns: "*set pool 40 pgp_num to 2048*" But upon checking the pool details again: "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458870 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw*" and the pgp_num value does not increase. Am I just doing something totally wrong? Thanks, Mac Wynkoop On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: pg_num and pgp_num need to be the same, not? 3.5.1. Set the Number of PGs To set the number of placement groups in a pool, you must specify the number of placement groups at the time you create the pool. See Create a Pool for details. Once you set placement groups for a pool, you can increase the number of placement groups (but you cannot decrease the number of placement groups). To increase the number of placement groups, execute the following: ceph osd pool set {pool-name} pg_num {pg_num} Once you increase the number of placement groups, you must also increase the number of placement groups for placement (pgp_num) before your cluster will rebalance. The pgp_num should be equal to the pg_num. To increase the number of placement groups for placement, execute the following: ceph osd pool set {pool-name} pgp_num {pgp_num} https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… -----Original Message----- To: norman Cc: ceph-users Subject: [ceph-users] Re: pool pgp_num not updated Hi everyone, I'm seeing a similar issue here. Any ideas on this? Mac Wynkoop, On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote: Hi guys, When I update the pg_num of a pool, I found it not worked(no rebalanced), anyone know the reason? Pool's info: pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd removed_snaps [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 pgp_num_target 256 autoscale_mode warn last_change 84769 lfor 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd The pgp_num_target is set, but pgp_num not set. I have scale out new OSDs and is backfilling before setting the value, is it the reason? _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 6 months

1
0
0 0

Error "Operation not permitted" using rbd pool init command

by floda

Hi guys, I have stumbled over an error that I cannot solve after more than 2 days trying. Now I turn to you in hope to get some help! :) First of all, I am totally new to Ceph and this is just a test setup for now. I have followed the instructions on the ceph.com homepage as far as I understand. I am about to set up a Ceph storage cluster on three different nodes (running Ubuntu server 20.04). I am using Ceph Octopus. On these 3 nodes I have installed 3 monitors, 3 managers and 12 OSDs (I do know that OSDs and Monitors on the same machines are not recommended but for this test setup I do it anyway). So far is everything working well for me. However, when I try to create an RBD pool it fails with some strange (at least strange for me) error message: 2020-10-08T07:48:36.601+0000 7f8b9b7fe700 -1 librbd::image::GetMetadataRequest: 0x564279b9d670 handle_metadata_list: failed to retrieve image metadata: (1) Operation not permitted 2020-10-08T07:48:36.601+0000 7f8bb6af8380 -1 librbd::PoolMetadata: list: failed listing metadata: (1) Operation not permitted 2020-10-08T07:48:36.601+0000 7f8bb6af8380 -1 librbd::Config: apply_pool_overrides: failed to read pool config overrides: (1) Operation not permitted 2020-10-08T07:48:36.601+0000 7f8b9b7fe700 -1 librbd::image::ValidatePoolRequest: handle_read_rbd_info: failed to read RBD info: (1) Operation not permitted rbd: pool already registered to a different application. I run the commands as the Linux root user and as the Ceph user client.admin (I have turned off apparmor and other hardening things as well). The chep user client.admin has the following setup in its keyring: [client.admin] key = ..... caps mds = "allow *" caps mgr = "allow *" caps mon = "allow *" For more information regarding permissions in the system see the attached file permissions.txt that contains the output of the ceph auth list command. I have also attached a file (report.txt) containing the output from the ceph report command. (NOTE: I have removed some information from the files, i.e. keys, fsid, fingerprints, uuid, and replaced them with "..." instead) Can anyone please help me understand why I get this error and what I need to do in order to solve it? Thanks in advanced! Best regards, Fredrik

3 years, 6 months

2
2
0 0

el6 / centos6 rpm's for luminous?

by Marc Roos

Nobody ever used luminous on el6?

3 years, 6 months

1
0
0 0

What are mon.<hostname>-safe containers?

by Sebastian Luna Valero

Hi, When I run `ceph orch ps` I see a couple of containers running on our MON nodes whose names end with the `-safe` suffix, and I was wondering what they are? I couldn't find information about it in https://docs.ceph.com This cluster is running Ceph 15.2.5, recently upgraded from 15.2.4 Many thanks, Sebastian

3 years, 6 months

1
0
0 0

Quick/easy access to rbd on el6

by Marc Roos

Normally I would install ceph-common.rpm and access some rbd image via rbdmap. What would be the best way to do this on an old el6? There is not even a luminous el6 on download.ceph.com.

3 years, 6 months

1
0
0 0

Kubernetes Luminous client acting on Nautilus pool: protocol feature mismatch: missing 200000 (CEPH_FEATURE_MON_GV ?)

by Fulvio Galeazzi

Hallo all! I am configuring a new storage class on my Kubernetes cluster, pointing to a pool on a Ceph cluster which was recently upgraded to Nautilus (was Luminous). The old storage class points to a Luminous pool in a separate cluster and works fine. On the new one, I think I did the configuration properly, yet when creating a volume I get this: 2020-10-07 10:00:53.849128 7f2f8c6f9700 0 -- 10.2.3.13:0/3520982056 >> 10.2.3.23:6789/0 pipe(0x7f2f780008c0 sd=3 :60192 s=1 pgs=0 cs=0 l=1 c=0x7f2f780068e0).connect protocol feature mismatch, my 27ffffffefdfbfff < peer 27fddff8efacbfff missing 200000 Looking at https://ceph.io/geen-categorie/feature-set-mismatch-error-on-ceph-kernel-cl… looks like the 200000 corresponds to CEPH_FEATURE_MON_GV, which by the way is listed here https://github.com/ceph/ceph/pull/8214 as a feature which could/should be removed. Things being as I described, I guess it would be safe to change the value of the tunable, correct? Unfortunately, I was unable to find any way to achieve this... the trivial "ceph osd crush set-tunable mon_gv 0" does not work. Any idea, please, how to fix my error? Upgrading Ceph packages on Kubernetes workers (now at Luminous) would help, may be? Thanks! Fulvio -- Fulvio Galeazzi GARR-CSD Department skype: fgaleazzi70 tel.: +39-334-6533-250

3 years, 6 months

1
0
0 0

Pool quotas vs. rados df vs. RBD images (phantom objects in pool?)

by Miroslav Kalina

Hello everyone, I have single Ceph cluster with multiple pools for purposes of Kubernetes PVs and Virtualization (biggest pool, using snapshots here). Every client cluster has its own pool with max_bytes and max_objects quotas in place. When quotas are reached, Ceph halts all write I/O (which I suppose is OK). I am trying to setup best possible monitoring to prevent reaching pool quotas, but I am really struggling to find correct metrics Ceph is driven by, because almost every command I try gives me different results (rados df, rbd du, ceph health detail) and I feel like quotas are not usable for this purpose. After some fiddling with test pool I have somehow got it into state, when there are some phantom objects in stats (maybe also written to OSDs) which cannot be listed, accessed or deleted, but are counted into pool quotas. -> ceph version ceph version 14.2.10 (9f0d3f5a3ce352651da4c2437689144fcbec0131) nautilus (stable) I have my test pool here (quotas 4k objects, 15 GiB data): -> ceph osd pool ls detail | grep mirektest pool 2 'mirektest' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 21360 lfor 0/10929/10927 flags hashpspool,selfmanaged_snaps max_bytes 16106127360 max_objects 4000 stripe_width 0 target_size_bytes 10737418240 application rbd Cluster health says pool it getting full: -> ceph health detail HEALTH_WARN 1 pools nearfull POOL_NEAR_FULL 1 pools nearfull pool 'mirektest' has 3209 objects (max 4000) pool 'mirektest' has 13 GiB (max 15 GiB) But pool looks completely empty (no output, so I tried JSON output also): -> rbd -p mirektest --format json ls [] -> rbd -p mirektest --format json du {"images":[],"total_provisioned_size":0,"total_used_size":0} I am not using pool snapshots: -> rados -p mirektest lssnap 0 snaps But there are some RADOS object in it: -> rados -p mirektest df POOL_NAME *USED**OBJECTS*CLONES *COPIES*MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR mirektest *192 KiB* *3209* 0 *9627* 0 0 0 23998515 192 GiB 1221030 297 GiB 0 B 0 B Okay, lets try to list them: -> rados -p mirektest ls --all rbd_directory rbd_info rbd_trash Thats weird, expected more than 3 objetcs. Let's check it's sizes and content: -> rados -p mirektest stat rbd_directory mirektest/rbd_directory mtime 2020-10-07 11:55:16.000000, size 0 -> rados -p mirektest stat rbd_info mirektest/rbd_info mtime 2019-12-05 12:11:39.000000, size 19 -> rados -p mirektest get rbd_info - overwrite validated -> rados -p mirektest stat rbd_trash mirektest/rbd_trash mtime 2020-10-07 11:55:17.000000, size 0 Do any of you have some idea what's happening here? I am trying to find way how to cleanup pool without interrupting existing content in general. Also I have probably no idea how to replicate this and it's not the story of every pool in cluster. There is e.g. pool which was used for Kubernetes but it has no issues: -> rados df POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR pool12 192 KiB 3 0 9 0 0 0 8710688 807 GiB 5490880 576 GiB 0 B 0 B I don't know how to check other pools, because there is data in them . How can I rely on quotas when in empty cluster there is already 13GiB data and 3.2k objects (which btw corresponds because 3200 * 4MiB = 12.8 GiB). Also is there any way how to calculate same number of objects/bytes Ceph is using to enforce quotas? Thanks for you time if you read it until the end, really appreciated that! :-) Any idea or hint is welcome. -- Miroslav Kalina Systems development specialist miroslav.kalina(a)livesport.eu +420 773 071 848 Livesport s.r.o. Aspira Business Centre Bucharova 2928/14a, 158 00 Praha 5 www.livesport.eu

3 years, 6 months

1
0
0 0

Slow ops on OSDs

by Kristof Coucke

Hi all, We have a Ceph cluster which has been expanded from 10 to 16 nodes. Each node has between 14 and 16 OSDs of which 2 are NVMe disks. Most disks (except NVMe's) are 16TB large. The expansion of 16 nodes went ok, but we've configured the system to prevent auto balance towards the new disks (weight was set to 0) so we could control the expansion. We started adding 6 disks last week (1 disk on each new node) which didn't give a lot of issues. When the Ceph status indicated the PG degraded was almost finished, we've added 2 disks on each node again. All seemed to go fine, till yesterday morning... IOs towards the system were slowing down. Diving onto the nodes we could see that the OSD daemons are consuming the CPU power, resulting in average CPU loads going near 10 (!). The RGWs nor monitors nor other involved servers are having CPU issues (except for the management server which is fighting with Prometheus), so it's latency seems to be related to the ODS hosts. All of the hosts are interconnected with 25Gbit connections, no bottlenecks are reached on the network either. Important piece of information: We are using erasure coding (6/3), and we do have a lot of small files... The current health detail indicates degraded health redundancy where 1192911/103387889228 objects are degraded. (1 pg degraded, 1 pg undersized). Diving into the historic ops of an OSD we can see that the main latency is found between the event "queued_for_pg" and "reached_pg". (Averaging +/- 3 secs) As the system load is quite high I assume the systems are busy recalculating the code chunks for using the new disks we've added (though not sure), but I was wondering how I can better fine tune the system or pinpoint the exact bottle neck. Latency towards the disks doesn't seem an issue at first sight... We are running Ceph 14.2.11 Who can give me some thoughts on how I can better pinpoint the bottle neck? Thanks Kristof

3 years, 6 months

4
16
0 0

Ceph iSCSI Performance

by DHilsbos＠performair.com

All; I've finally gotten around to setting up iSCSI gateways on my primary production cluster, and performance is terrible. We're talking 1/4 to 1/3 of our current solution. I see no evidence of network congestion on any involved network link. I see no evidence CPU or memory being a problem on any involved server (MON / OSD / gateway /client). What can I look at to tune this, preferably on the iSCSI gateways? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International, Inc. DHilsbos(a)PerformAir.com www.PerformAir.com

3 years, 6 months

5
8
0 0

pool pgp_num not updated

by norman

Hi guys, When I update the pg_num of a pool, I found it not worked(no rebalanced), anyone know the reason? Pool's info: pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd removed_snaps [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 pgp_num_target 256 autoscale_mode warn last_change 84769 lfor 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd The pgp_num_target is set, but pgp_num not set. I have scale out new OSDs and is backfilling before setting the value, is it the reason?

3 years, 6 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2020