Re: pool pgp_num not updated

List overview All Threads
Download

newer

older

Difference between node exporter...

Fwd: [lca-announce] linux.conf.au...

Mac Wynkoop

7 Oct 2020 7 Oct '20

1:04 p.m.

Right, both Norman and I set the pg_num before the pgp_num. For example, here is my current pool settings: *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458830 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw"* So, when I set: "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" it returns: "*set pool 40 pgp_num to 2048*" But upon checking the pool details again: "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458870 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw*" and the pgp_num value does not increase. Am I just doing something totally wrong? Thanks, Mac Wynkoop On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:

...

pg_num and pgp_num need to be the same, not? 3.5.1. Set the Number of PGs To set the number of placement groups in a pool, you must specify the number of placement groups at the time you create the pool. See Create a Pool for details. Once you set placement groups for a pool, you can increase the number of placement groups (but you cannot decrease the number of placement groups). To increase the number of placement groups, execute the following: ceph osd pool set {pool-name} pg_num {pg_num} Once you increase the number of placement groups, you must also increase the number of placement groups for placement (pgp_num) before your cluster will rebalance. The pgp_num should be equal to the pg_num. To increase the number of placement groups for placement, execute the following: ceph osd pool set {pool-name} pgp_num {pgp_num} https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… -----Original Message----- To: norman Cc: ceph-users Subject: [ceph-users] Re: pool pgp_num not updated Hi everyone, I'm seeing a similar issue here. Any ideas on this? Mac Wynkoop, On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote:

Hi guys, When I update the pg_num of a pool, I found it not worked(no rebalanced), anyone know the reason? Pool's info: pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd removed_snaps [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 pgp_num_target 256 autoscale_mode warn last_change 84769 lfor 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd The pgp_num_target is set, but pgp_num not set. I have scale out new OSDs and is backfilling before setting the value,

is it the reason? _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Show replies by date

Eugen Block

7 Oct 7 Oct

1:56 p.m.

New subject: pool pgp_num not updated

What is the current cluster status, is it healthy? Maybe increasing pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph -s' output? Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>:

...

is it the reason? _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Mac Wynkoop

6:27 p.m.

New subject: pool pgp_num not updated

...

fast_read

1 application rgw"* So, when I set: "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" it returns: "*set pool 40 pgp_num to 2048*" But upon checking the pool details again: "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458870 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576

fast_read

1 application rgw*" and the pgp_num value does not increase. Am I just doing something totally wrong? Thanks, Mac Wynkoop On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu>

wrote:

> pg_num and pgp_num need to be the same, not? > > 3.5.1. Set the Number of PGs > > To set the number of placement groups in a pool, you must specify the > number of placement groups at the time you create the pool. See Create a > Pool for details. Once you set placement groups for a pool, you can > increase the number of placement groups (but you cannot decrease the > number of placement groups). To increase the number of placement groups, > execute the following: > > ceph osd pool set {pool-name} pg_num {pg_num} > > Once you increase the number of placement groups, you must also increase > the number of placement groups for placement (pgp_num) before your > cluster will rebalance. The pgp_num should be equal to the pg_num. To > increase the number of placement groups for placement, execute the > following: > > ceph osd pool set {pool-name} pgp_num {pgp_num} > > >

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s…

-----Original Message----- To: norman Cc: ceph-users Subject: [ceph-users] Re: pool pgp_num not updated Hi everyone, I'm seeing a similar issue here. Any ideas on this? Mac Wynkoop, On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote:

is it the reason? _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Eugen Block

6:50 p.m.

New subject: pool pgp_num not updated

Yes, I think that’s exactly the reason. As soon as the cluster has more space the backfill will continue. Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>:

...

The cluster is currently in a warn state, here's the scrubbed output of ceph -s: *cluster: id: *redacted* health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) 2 pool(s) nearfull Low space hindering backfill (add storage if this doesn't resolve itself): 277 pgs backfill_toofull Degraded data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in time 2647 pgs not scrubbed in time 2 daemons have recently crashed services: mon: 5 daemons, *redacted* (age 44h) mgr: *redacted* osd: 162 osds: 162 up (since 44h), 162 in (since 4d); 971 remapped pgs flags noscrub,nodeep-scrub rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons active *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail pgs: 32652738/3651947772 objects degraded (0.894%) 517370913/3651947772 objects misplaced (14.167%) 1677 active+clean 477 active+remapped+backfill_wait 100 active+remapped+backfill_wait+backfill_toofull 80 active+undersized+degraded+remapped+backfill_wait 60 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 42 active+undersized+degraded+remapped+backfill_toofull 33 active+undersized+degraded+remapped+backfilling 25 active+remapped+backfilling 25 active+remapped+backfill_toofull 24 active+undersized+remapped+backfilling 23 active+forced_recovery+undersized+degraded+remapped+backfill_wait 19 active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull 15 active+undersized+remapped+backfill_wait 14 active+undersized+remapped+backfill_wait+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfilling 5 active+undersized+remapped+backfill_toofull 3 active+remapped 1 active+undersized+remapped 1 active+forced_recovery+undersized+remapped+backfilling io: client: 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: 425 MiB/s, 225 objects/s* Now as you can see, we do have a lot of backfill operations going on at the moment. Does that actually prevent Ceph from modifying the pgp_num value of a pool? Thanks, Mac Wynkoop On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote: > What is the current cluster status, is it healthy? Maybe increasing > pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph > -s' output? > > > Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > > > Right, both Norman and I set the pg_num before the pgp_num. For example, > > here is my current pool settings: > > > > > > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target > > 2048 last_change 8458830 lfor 0/0/8445757 flags > > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > fast_read > > 1 application rgw"* > > So, when I set: > > > > "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" > > > > it returns: > > > > "*set pool 40 pgp_num to 2048*" > > > > But upon checking the pool details again: > > > > "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target > > 2048 last_change 8458870 lfor 0/0/8445757 flags > > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > fast_read > > 1 application rgw*" > > > > and the pgp_num value does not increase. Am I just doing something > > totally wrong? > > > > Thanks, > > Mac Wynkoop > > > > > > > > > > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> > wrote: > > > >> pg_num and pgp_num need to be the same, not? > >> > >> 3.5.1. Set the Number of PGs > >> > >> To set the number of placement groups in a pool, you must specify the > >> number of placement groups at the time you create the pool. See Create a > >> Pool for details. Once you set placement groups for a pool, you can > >> increase the number of placement groups (but you cannot decrease the > >> number of placement groups). To increase the number of placement groups, > >> execute the following: > >> > >> ceph osd pool set {pool-name} pg_num {pg_num} > >> > >> Once you increase the number of placement groups, you must also increase > >> the number of placement groups for placement (pgp_num) before your > >> cluster will rebalance. The pgp_num should be equal to the pg_num. To > >> increase the number of placement groups for placement, execute the > >> following: > >> > >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> > >> > >> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… > >> > >> -----Original Message----- > >> To: norman > >> Cc: ceph-users > >> Subject: [ceph-users] Re: pool pgp_num not updated > >> > >> Hi everyone, > >> > >> I'm seeing a similar issue here. Any ideas on this? > >> Mac Wynkoop, > >> > >> > >> > >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote: > >> > >> > Hi guys, > >> > > >> > When I update the pg_num of a pool, I found it not worked(no > >> > rebalanced), anyone know the reason? Pool's info: > >> > > >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule > >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 > >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags > >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd > >> > removed_snaps > >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22 > >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 > >> > application rbd > >> > > >> > The pgp_num_target is set, but pgp_num not set. > >> > > >> > I have scale out new OSDs and is backfilling before setting the value, > >> > >> > is it the reason? > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > >> > email to ceph-users-leave(a)ceph.io > >> > > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > >> email to ceph-users-leave(a)ceph.io > >> > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Mac Wynkoop

7:22 p.m.

New subject: pool pgp_num not updated

...

Yes, I think that’s exactly the reason. As soon as the cluster has more space the backfill will continue. Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>:

The cluster is currently in a warn state, here's the scrubbed output of ceph -s: *cluster: id: *redacted* health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s)

pool(s) nearfull Low space hindering backfill (add storage if this doesn't resolve itself): 277 pgs backfill_toofull

Degraded

data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in

time

2647 pgs not scrubbed in time 2 daemons have

recently

crashed services: mon: 5 daemons, *redacted* (age 44h)

mgr:

*redacted* osd: 162 osds: 162 up (since 44h), 162 in (since 4d); 971 remapped pgs flags noscrub,nodeep-scrub rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons

active

*redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail

pgs:

32652738/3651947772 objects degraded (0.894%) 517370913/3651947772 objects misplaced (14.167%) 1677 active+clean 477 active+remapped+backfill_wait

100

active+remapped+backfill_wait+backfill_toofull 80 active+undersized+degraded+remapped+backfill_wait 60 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 42 active+undersized+degraded+remapped+backfill_toofull

active+undersized+degraded+remapped+backfilling 25 active+remapped+backfilling 25 active+remapped+backfill_toofull 24 active+undersized+remapped+backfilling 23 active+forced_recovery+undersized+degraded+remapped+backfill_wait 19

active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull

15 active+undersized+remapped+backfill_wait 14 active+undersized+remapped+backfill_wait+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfill_toofull 12 active+forced_recovery+undersized+degraded+remapped+backfilling 5 active+undersized+remapped+backfill_toofull 3 active+remapped 1 active+undersized+remapped

active+forced_recovery+undersized+remapped+backfilling io:

client:

287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: 425 MiB/s, 225 objects/s* Now as you can see, we do have a lot of backfill operations going on at

the

moment. Does that actually prevent Ceph from modifying the pgp_num value

a pool? Thanks, Mac Wynkoop On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote: > What is the current cluster status, is it healthy? Maybe increasing > pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph > -s' output? > > > Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > > > Right, both Norman and I set the pg_num before the pgp_num. For

example,

> > here is my current pool settings: > > > > > > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024

pgp_num_target

> > 2048 last_change 8458830 lfor 0/0/8445757 flags > > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > fast_read > > 1 application rgw"* > > So, when I set: > > > > "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" > > > > it returns: > > > > "*set pool 40 pgp_num to 2048*" > > > > But upon checking the pool details again: > > > > "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024

pgp_num_target

> > 2048 last_change 8458870 lfor 0/0/8445757 flags > > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > fast_read > > 1 application rgw*" > > > > and the pgp_num value does not increase. Am I just doing something > > totally wrong? > > > > Thanks, > > Mac Wynkoop > > > > > > > > > > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> > wrote: > > > >> pg_num and pgp_num need to be the same, not? > >> > >> 3.5.1. Set the Number of PGs > >> > >> To set the number of placement groups in a pool, you must specify the > >> number of placement groups at the time you create the pool. See

Create a

> >> Pool for details. Once you set placement groups for a pool, you can > >> increase the number of placement groups (but you cannot decrease the > >> number of placement groups). To increase the number of placement

groups,

> >> execute the following: > >> > >> ceph osd pool set {pool-name} pg_num {pg_num} > >> > >> Once you increase the number of placement groups, you must also

increase

> >> the number of placement groups for placement (pgp_num) before your > >> cluster will rebalance. The pgp_num should be equal to the pg_num. To > >> increase the number of placement groups for placement, execute the > >> following: > >> > >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> > >> > >> >

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s…

> >> > >> -----Original Message----- > >> To: norman > >> Cc: ceph-users > >> Subject: [ceph-users] Re: pool pgp_num not updated > >> > >> Hi everyone, > >> > >> I'm seeing a similar issue here. Any ideas on this? > >> Mac Wynkoop, > >> > >> > >> > >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote: > >> > >> > Hi guys, > >> > > >> > When I update the pg_num of a pool, I found it not worked(no > >> > rebalanced), anyone know the reason? Pool's info: > >> > > >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2

crush_rule

> >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 > >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags > >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application

rbd

> >> > removed_snaps > >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule

> >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width

> >> > application rbd > >> > > >> > The pgp_num_target is set, but pgp_num not set. > >> > > >> > I have scale out new OSDs and is backfilling before setting the

value,

> >> > >> > is it the reason? > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send

> >> > email to ceph-users-leave(a)ceph.io > >> > > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > >> email to ceph-users-leave(a)ceph.io > >> > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Eugen Block

8 Oct 8 Oct

7:08 a.m.

New subject: pool pgp_num not updated

Yes, after your cluster has recovered you'll be able to increase pgp_num. Or your change will be applied automatically since you already set it, I'm not sure but you'll see. Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>:

...

Well, backfilling sure, but will it allow me to actually change the pgp_num as more space frees up? Because the issue is that I cannot modify that value. Thanks, Mac Wynkoop, Senior Datacenter Engineer *NetDepot.com:* Cloud Servers; Delivered Houston | Atlanta | NYC | Colorado Springs 1-844-25-CLOUD Ext 806 On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock(a)nde.ag> wrote: > Yes, I think that’s exactly the reason. As soon as the cluster has > more space the backfill will continue. > > > Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > > > The cluster is currently in a warn state, here's the scrubbed output of > > ceph -s: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *cluster: id: *redacted* health: HEALTH_WARN > > noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) > 2 > > pool(s) nearfull Low space hindering backfill (add storage if > > this doesn't resolve itself): 277 pgs backfill_toofull > Degraded > > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs > > degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in > time > > 2647 pgs not scrubbed in time 2 daemons have > recently > > crashed services: mon: 5 daemons, *redacted* (age 44h) > mgr: > > *redacted* osd: 162 osds: 162 up (since 44h), 162 in > > (since 4d); 971 remapped pgs flags noscrub,nodeep-scrub > > rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons > active > > *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M > > objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail > pgs: > > 32652738/3651947772 objects degraded (0.894%) > > 517370913/3651947772 objects misplaced (14.167%) 1677 > > active+clean 477 active+remapped+backfill_wait > 100 > > active+remapped+backfill_wait+backfill_toofull 80 > > active+undersized+degraded+remapped+backfill_wait 60 > > active+undersized+degraded+remapped+backfill_wait+backfill_toofull > > 42 active+undersized+degraded+remapped+backfill_toofull > 33 > > active+undersized+degraded+remapped+backfilling 25 > > active+remapped+backfilling 25 > > active+remapped+backfill_toofull 24 > > active+undersized+remapped+backfilling 23 > > active+forced_recovery+undersized+degraded+remapped+backfill_wait > > 19 > > > active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull > > 15 active+undersized+remapped+backfill_wait 14 > > active+undersized+remapped+backfill_wait+backfill_toofull 12 > > active+forced_recovery+undersized+degraded+remapped+backfill_toofull > > 12 active+forced_recovery+undersized+degraded+remapped+backfilling > > 5 active+undersized+remapped+backfill_toofull 3 > > active+remapped 1 active+undersized+remapped > 1 > > active+forced_recovery+undersized+remapped+backfilling io: > client: > > 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: 425 > > MiB/s, 225 objects/s* > > Now as you can see, we do have a lot of backfill operations going on at > the > > moment. Does that actually prevent Ceph from modifying the pgp_num value > of > > a pool? > > > > Thanks, > > Mac Wynkoop > > > > > > > > On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote: > > > >> What is the current cluster status, is it healthy? Maybe increasing > >> pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph > >> -s' output? > >> > >> > >> Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > >> > >> > Right, both Norman and I set the pg_num before the pgp_num. For > example, > >> > here is my current pool settings: > >> > > >> > > >> > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > pgp_num_target > >> > 2048 last_change 8458830 lfor 0/0/8445757 flags > >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> fast_read > >> > 1 application rgw"* > >> > So, when I set: > >> > > >> > "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" > >> > > >> > it returns: > >> > > >> > "*set pool 40 pgp_num to 2048*" > >> > > >> > But upon checking the pool details again: > >> > > >> > "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > pgp_num_target > >> > 2048 last_change 8458870 lfor 0/0/8445757 flags > >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> fast_read > >> > 1 application rgw*" > >> > > >> > and the pgp_num value does not increase. Am I just doing something > >> > totally wrong? > >> > > >> > Thanks, > >> > Mac Wynkoop > >> > > >> > > >> > > >> > > >> > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> > >> wrote: > >> > > >> >> pg_num and pgp_num need to be the same, not? > >> >> > >> >> 3.5.1. Set the Number of PGs > >> >> > >> >> To set the number of placement groups in a pool, you must specify the > >> >> number of placement groups at the time you create the pool. See > Create a > >> >> Pool for details. Once you set placement groups for a pool, you can > >> >> increase the number of placement groups (but you cannot decrease the > >> >> number of placement groups). To increase the number of placement > groups, > >> >> execute the following: > >> >> > >> >> ceph osd pool set {pool-name} pg_num {pg_num} > >> >> > >> >> Once you increase the number of placement groups, you must also > increase > >> >> the number of placement groups for placement (pgp_num) before your > >> >> cluster will rebalance. The pgp_num should be equal to the pg_num. To > >> >> increase the number of placement groups for placement, execute the > >> >> following: > >> >> > >> >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> >> > >> >> > >> >> > >> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… > >> >> > >> >> -----Original Message----- > >> >> To: norman > >> >> Cc: ceph-users > >> >> Subject: [ceph-users] Re: pool pgp_num not updated > >> >> > >> >> Hi everyone, > >> >> > >> >> I'm seeing a similar issue here. Any ideas on this? > >> >> Mac Wynkoop, > >> >> > >> >> > >> >> > >> >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote: > >> >> > >> >> > Hi guys, > >> >> > > >> >> > When I update the pg_num of a pool, I found it not worked(no > >> >> > rebalanced), anyone know the reason? Pool's info: > >> >> > > >> >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 > crush_rule > >> >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 > >> >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags > >> >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application > rbd > >> >> > removed_snaps > >> >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule > 22 > >> >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width > > >> >> > application rbd > >> >> > > >> >> > The pgp_num_target is set, but pgp_num not set. > >> >> > > >> >> > I have scale out new OSDs and is backfilling before setting the > value, > >> >> > >> >> > is it the reason? > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send > an > >> >> > email to ceph-users-leave(a)ceph.io > >> >> > > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > >> >> email to ceph-users-leave(a)ceph.io > >> >> > >> >> > >> >> > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users(a)ceph.io > >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > > > >

Mac Wynkoop

1:16 p.m.

New subject: pool pgp_num not updated

...

Well, backfilling sure, but will it allow me to actually change the

pgp_num

as more space frees up? Because the issue is that I cannot modify that value. Thanks, Mac Wynkoop, Senior Datacenter Engineer *NetDepot.com:* Cloud Servers; Delivered Houston | Atlanta | NYC | Colorado Springs 1-844-25-CLOUD Ext 806 On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock(a)nde.ag> wrote: > Yes, I think that’s exactly the reason. As soon as the cluster has > more space the backfill will continue. > > > Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > > > The cluster is currently in a warn state, here's the scrubbed output

> > ceph -s: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *cluster: id: *redacted* health: HEALTH_WARN > > noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) > 2 > > pool(s) nearfull Low space hindering backfill (add storage

> > this doesn't resolve itself): 277 pgs backfill_toofull > Degraded > > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281

pgs

> > degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in > time > > 2647 pgs not scrubbed in time 2 daemons have > recently > > crashed services: mon: 5 daemons, *redacted* (age 44h) > mgr: > > *redacted* osd: 162 osds: 162 up (since 44h), 162

> > (since 4d); 971 remapped pgs flags

noscrub,nodeep-scrub

> > rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons > active > > *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M > > objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail > pgs: > > 32652738/3651947772 objects degraded (0.894%) > > 517370913/3651947772 objects misplaced (14.167%) 1677 > > active+clean 477 active+remapped+backfill_wait > 100 > > active+remapped+backfill_wait+backfill_toofull 80 > > active+undersized+degraded+remapped+backfill_wait 60 > > active+undersized+degraded+remapped+backfill_wait+backfill_toofull > > 42 active+undersized+degraded+remapped+backfill_toofull > 33 > > active+undersized+degraded+remapped+backfilling 25 > > active+remapped+backfilling 25 > > active+remapped+backfill_toofull 24 > > active+undersized+remapped+backfilling 23 > > active+forced_recovery+undersized+degraded+remapped+backfill_wait > > 19 > > >

active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull

> > 15 active+undersized+remapped+backfill_wait

> > active+undersized+remapped+backfill_wait+backfill_toofull

> > active+forced_recovery+undersized+degraded+remapped+backfill_toofull > > 12

active+forced_recovery+undersized+degraded+remapped+backfilling

> > 5 active+undersized+remapped+backfill_toofull

> > active+remapped 1 active+undersized+remapped > 1 > > active+forced_recovery+undersized+remapped+backfilling io: > client: > > 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery:

425

> > MiB/s, 225 objects/s* > > Now as you can see, we do have a lot of backfill operations going on

> the > > moment. Does that actually prevent Ceph from modifying the pgp_num

value

example,

> > here is my current pool settings: > > > > > > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024

pgp_num_target

pgp_num_target >> > 2048 last_change 8458870 lfor 0/0/8445757 flags >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 >> fast_read >> > 1 application rgw*" >> > >> > and the pgp_num value does not increase. Am I just doing something >> > totally wrong? >> > >> > Thanks, >> > Mac Wynkoop >> > >> > >> > >> > >> > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu

> >> wrote: > >> > > >> >> pg_num and pgp_num need to be the same, not? > >> >> > >> >> 3.5.1. Set the Number of PGs > >> >> > >> >> To set the number of placement groups in a pool, you must specify

the

> >> >> number of placement groups at the time you create the pool. See > Create a > >> >> Pool for details. Once you set placement groups for a pool, you

can

> >> >> increase the number of placement groups (but you cannot decrease

the

> >> >> number of placement groups). To increase the number of placement > groups, > >> >> execute the following: > >> >> > >> >> ceph osd pool set {pool-name} pg_num {pg_num} > >> >> > >> >> Once you increase the number of placement groups, you must also > increase > >> >> the number of placement groups for placement (pgp_num) before your > >> >> cluster will rebalance. The pgp_num should be equal to the

pg_num. To

> >> >> increase the number of placement groups for placement, execute the > >> >> following: > >> >> > >> >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> >> > >> >> > >> >> > >> >

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s…

> >> >> > >> >> -----Original Message----- > >> >> To: norman > >> >> Cc: ceph-users > >> >> Subject: [ceph-users] Re: pool pgp_num not updated > >> >> > >> >> Hi everyone, > >> >> > >> >> I'm seeing a similar issue here. Any ideas on this? > >> >> Mac Wynkoop, > >> >> > >> >> > >> >> > >> >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com>

wrote:

> >> >> > >> >> > Hi guys, > >> >> > > >> >> > When I update the pg_num of a pool, I found it not worked(no > >> >> > rebalanced), anyone know the reason? Pool's info: > >> >> > > >> >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 > crush_rule > >> >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target

1024

> >> >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044

flags

> >> >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application > rbd > >> >> > removed_snaps > >> >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2

crush_rule

> 22 > >> >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps

stripe_width

> > >> >> > application rbd > >> >> > > >> >> > The pgp_num_target is set, but pgp_num not set. > >> >> > > >> >> > I have scale out new OSDs and is backfilling before setting the > value, > >> >> > >> >> > is it the reason? > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe

send

> an > >> >> > email to ceph-users-leave(a)ceph.io > >> >> > > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe

send an

> >> >> email to ceph-users-leave(a)ceph.io > >> >> > >> >> > >> >> > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users(a)ceph.io > >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > > > >

Mac Wynkoop

20 Oct 20 Oct

1:38 p.m.

New subject: pool pgp_num not updated

Alrighty, so we're all recovered and balanced at this point, but I'm not seeing this behavior: *pool 40 'hou-ec-1.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1109 pgp_num_target 2048 last_change 8654141 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete stripe_width 24576 fast_read 1 application rgw* I don't have autoscaler enabled for the cluster, or this pool, but the pgp_num is slowly incrementing up to the pgp_num_target value. If Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num? Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas? Thanks, Mac Wynkoop On Thu, Oct 8, 2020 at 8:16 AM Mac Wynkoop <mwynkoop(a)netdepot.com> wrote:

...

OK, great. We'll keep tabs on it for now then and try again once we're fully rebalanced. Mac Wynkoop, Senior Datacenter Engineer *NetDepot.com:* Cloud Servers; Delivered Houston | Atlanta | NYC | Colorado Springs 1-844-25-CLOUD Ext 806 On Thu, Oct 8, 2020 at 2:08 AM Eugen Block <eblock(a)nde.ag> wrote: > Yes, after your cluster has recovered you'll be able to increase > pgp_num. Or your change will be applied automatically since you > already set it, I'm not sure but you'll see. > > > Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > > > Well, backfilling sure, but will it allow me to actually change the > pgp_num > > as more space frees up? Because the issue is that I cannot modify that > > value. > > > > Thanks, > > Mac Wynkoop, Senior Datacenter Engineer > > *NetDepot.com:* Cloud Servers; Delivered > > Houston | Atlanta | NYC | Colorado Springs > > > > 1-844-25-CLOUD Ext 806 > > > > > > > > > > On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock(a)nde.ag> wrote: > > > >> Yes, I think that’s exactly the reason. As soon as the cluster has > >> more space the backfill will continue. > >> > >> > >> Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > >> > >> > The cluster is currently in a warn state, here's the scrubbed output > of > >> > ceph -s: > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > *cluster: id: *redacted* health: HEALTH_WARN > >> > noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s) > >> 2 > >> > pool(s) nearfull Low space hindering backfill (add > storage if > >> > this doesn't resolve itself): 277 pgs backfill_toofull > >> Degraded > >> > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 > pgs > >> > degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in > >> time > >> > 2647 pgs not scrubbed in time 2 daemons have > >> recently > >> > crashed services: mon: 5 daemons, *redacted* (age 44h) > >> mgr: > >> > *redacted* osd: 162 osds: 162 up (since 44h), 162 > in > >> > (since 4d); 971 remapped pgs flags > noscrub,nodeep-scrub > >> > rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons > >> active > >> > *redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M > >> > objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail > >> pgs: > >> > 32652738/3651947772 objects degraded (0.894%) > >> > 517370913/3651947772 objects misplaced (14.167%) 1677 > >> > active+clean 477 active+remapped+backfill_wait > >> 100 > >> > active+remapped+backfill_wait+backfill_toofull 80 > >> > active+undersized+degraded+remapped+backfill_wait 60 > >> > active+undersized+degraded+remapped+backfill_wait+backfill_toofull > >> > 42 active+undersized+degraded+remapped+backfill_toofull > >> 33 > >> > active+undersized+degraded+remapped+backfilling 25 > >> > active+remapped+backfilling 25 > >> > active+remapped+backfill_toofull 24 > >> > active+undersized+remapped+backfilling 23 > >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait > >> > 19 > >> > > >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull > >> > 15 active+undersized+remapped+backfill_wait > 14 > >> > active+undersized+remapped+backfill_wait+backfill_toofull > 12 > >> > active+forced_recovery+undersized+degraded+remapped+backfill_toofull > >> > 12 > active+forced_recovery+undersized+degraded+remapped+backfilling > >> > 5 active+undersized+remapped+backfill_toofull > 3 > >> > active+remapped 1 active+undersized+remapped > >> 1 > >> > active+forced_recovery+undersized+remapped+backfilling io: > >> client: > >> > 287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: > 425 > >> > MiB/s, 225 objects/s* > >> > Now as you can see, we do have a lot of backfill operations going on > at > >> the > >> > moment. Does that actually prevent Ceph from modifying the pgp_num > value > >> of > >> > a pool? > >> > > >> > Thanks, > >> > Mac Wynkoop > >> > > >> > > >> > > >> > On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote: > >> > > >> >> What is the current cluster status, is it healthy? Maybe increasing > >> >> pg_num would hit the limit of mon_max_pg_per_osd? Can you share > 'ceph > >> >> -s' output? > >> >> > >> >> > >> >> Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>om>: > >> >> > >> >> > Right, both Norman and I set the pg_num before the pgp_num. For > >> example, > >> >> > here is my current pool settings: > >> >> > > >> >> > > >> >> > *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > >> pgp_num_target > >> >> > 2048 last_change 8458830 lfor 0/0/8445757 flags > >> >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> >> fast_read > >> >> > 1 application rgw"* > >> >> > So, when I set: > >> >> > > >> >> > "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" > >> >> > > >> >> > it returns: > >> >> > > >> >> > "*set pool 40 pgp_num to 2048*" > >> >> > > >> >> > But upon checking the pool details again: > >> >> > > >> >> > "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 > >> >> > crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 > >> pgp_num_target > >> >> > 2048 last_change 8458870 lfor 0/0/8445757 flags > >> >> > hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 > >> >> fast_read > >> >> > 1 application rgw*" > >> >> > > >> >> > and the pgp_num value does not increase. Am I just doing something > >> >> > totally wrong? > >> >> > > >> >> > Thanks, > >> >> > Mac Wynkoop > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Tue, Oct 6, 2020 at 2:32 PM Marc Roos < > M.Roos(a)f1-outsourcing.eu> > >> >> wrote: > >> >> > > >> >> >> pg_num and pgp_num need to be the same, not? > >> >> >> > >> >> >> 3.5.1. Set the Number of PGs > >> >> >> > >> >> >> To set the number of placement groups in a pool, you must > specify the > >> >> >> number of placement groups at the time you create the pool. See > >> Create a > >> >> >> Pool for details. Once you set placement groups for a pool, you > can > >> >> >> increase the number of placement groups (but you cannot decrease > the > >> >> >> number of placement groups). To increase the number of placement > >> groups, > >> >> >> execute the following: > >> >> >> > >> >> >> ceph osd pool set {pool-name} pg_num {pg_num} > >> >> >> > >> >> >> Once you increase the number of placement groups, you must also > >> increase > >> >> >> the number of placement groups for placement (pgp_num) before > your > >> >> >> cluster will rebalance. The pgp_num should be equal to the > pg_num. To > >> >> >> increase the number of placement groups for placement, execute > the > >> >> >> following: > >> >> >> > >> >> >> ceph osd pool set {pool-name} pgp_num {pgp_num} > >> >> >> > >> >> >> > >> >> >> > >> >> > >> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… > >> >> >> > >> >> >> -----Original Message----- > >> >> >> To: norman > >> >> >> Cc: ceph-users > >> >> >> Subject: [ceph-users] Re: pool pgp_num not updated > >> >> >> > >> >> >> Hi everyone, > >> >> >> > >> >> >> I'm seeing a similar issue here. Any ideas on this? > >> >> >> Mac Wynkoop, > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> > wrote: > >> >> >> > >> >> >> > Hi guys, > >> >> >> > > >> >> >> > When I update the pg_num of a pool, I found it not worked(no > >> >> >> > rebalanced), anyone know the reason? Pool's info: > >> >> >> > > >> >> >> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 > >> crush_rule > >> >> >> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target > 1024 > >> >> >> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 > flags > >> >> >> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 > application > >> rbd > >> >> >> > removed_snaps > >> >> >> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > >> >> >> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 > crush_rule > >> 22 > >> >> >> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > >> >> >> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > >> >> >> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps > stripe_width > >> > >> >> >> > application rbd > >> >> >> > > >> >> >> > The pgp_num_target is set, but pgp_num not set. > >> >> >> > > >> >> >> > I have scale out new OSDs and is backfilling before setting the > >> value, > >> >> >> > >> >> >> > is it the reason? > >> >> >> > _______________________________________________ > >> >> >> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe > send > >> an > >> >> >> > email to ceph-users-leave(a)ceph.io > >> >> >> > > >> >> >> _______________________________________________ > >> >> >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe > send an > >> >> >> email to ceph-users-leave(a)ceph.io > >> >> >> > >> >> >> > >> >> >> > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- ceph-users(a)ceph.io > >> >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> >> > >> >> > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users(a)ceph.io > >> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> >> > >> > >> > >> > >> > > > >

Lindsay Mathieson

1:46 p.m.

New subject: pool pgp_num not updated

On 20/10/2020 11:38 pm, Mac Wynkoop wrote:

...

Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num? Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?

Normal recovery ops I imagine - Bump up the recovery settings, Max Backfills and Recovery Max Active -- Lindsay

Eugen Block

1:51 p.m.

New subject: pool pgp_num not updated

The default for max misplaced objects is this (5%): ceph-node1:~ # ceph config get mon target_max_misplaced_ratio 0.050000 You can increase this for the splitting process but I would recommend to rollback as soon as the splitting has finished. Zitat von Lindsay Mathieson <lindsay.mathieson(a)gmail.com>om>:

...

On 20/10/2020 11:38 pm, Mac Wynkoop wrote:

Normal recovery ops I imagine - Bump up the recovery settings, Max Backfills and Recovery Max Active -- Lindsay _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Mac Wynkoop

3:04 p.m.

New subject: pool pgp_num not updated

OK, so for interventions, I've pushed these configs out: ceph config set mon.* target_max_misplaced_ratio 0.05 > 0.20 ceph config get osd.* osd_max_backfills 1 > 4 ceph config set osd.* osd_recovery_max_active 1 > 4 And also ran injectargs to push the changes to the OSDs hot. I'll monitor it for a bit to see how it reacts to the more aggressive settings. Thanks, Mac Wynkoop On Tue, Oct 20, 2020 at 8:52 AM Eugen Block <eblock(a)nde.ag> wrote:

...

On 20/10/2020 11:38 pm, Mac Wynkoop wrote: > Autoscaler isn't on, what part of Ceph is handling the increase of

pgp_num?

Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Toby Darling

21 Oct 21 Oct

12:34 p.m.

New subject: pool pgp_num not updated

Hi Mac We've also tweaked osd-recovery-max-single-start => 2 osd-recovery-sleep-hdd => 0.05 to speed things up. On 2020-10-20 16:04, Mac Wynkoop wrote:

...

On 20/10/2020 11:38 pm, Mac Wynkoop wrote: > Autoscaler isn't on, what part of Ceph is handling the increase of

pgp_num?

Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Toby Darling, Scientific Computing (2N249) MRC Laboratory of Molecular Biology

1303

days inactive

1317

days old

ceph-users@ceph.io

Manage subscription

11 comments

4 participants

tags (0)

participants (4)

Eugen Block
Lindsay Mathieson
Mac Wynkoop
Toby Darling