i wanted to report back to you that splitting worked *exactly* as you
described by running "ceph osd pool set default.rgw.buckets.data pg_num 32"
the whole processes took
approximately 2 minutes to split the placement groups and re-peer them from
8 to 32 for 10 OSDs on 5 hosts.
I had an OSD crash during that time but ceph handled it gracefully.
Downtime was really very minimal. I set target_max_misplaced_ratio to 3%
but the misplaced objects
were around 9% ( 2 active backfills and 2 waiting) which probably has to do
with the fact that each osd has too many objects.
On Tue, Aug 3, 2021 at 4:51 PM 胡 玮文 <huww98(a)outlook.com> wrote:
在 2021年8月3日，21:32，Gabriel Tzagkarakis <gabrieltz(a)gmail.com> 写道：
hi , thank you for replying
Does this method refer to manually setting the number of placement groups
while keeping autoscale_mode setting off ?
Also from what i can see from the documentation
the target_max_misplaced_ratio implies using the balancer feature, which
I am currently not using
I believe this “auto pgp_num increasing” feature works independently from
autoscaler and balancer. When the last time I increase pg_num to 1024, I
have autoscale mode set to warn, and balancer off. I recommend you to read
near “Starting in Nautilus, this second step is no longer necessary: …”
And target_max_misplaced_ratio is not only used in balancer, but also used
in this feature.
If I understood correctly the existing PGs will be split in place and act
as primary for the backfills that will be required to distribute the data
evenly to all osds
Can i use the manual way to increase slowly pgp in the pool end when my
PGs have a more manageable size i will enable the balancer.
will there be a considerable amount of downtime splitting pgs and peering ?
I didn’t observe any significant downtime the last time I did this. I
think it is several seconds at most.
I'm sorry for asking too many questions , i'm trying not to break stuff :)
On Tue, Aug 3, 2021 at 3:46 PM 胡 玮文 <huww98(a)outlook.com> wrote:
Each placement group will get split in 4 pieces in-place all at nearly
the same time, no empty pgs will be created.
Normally, you only set pg_num, but do not touch pgp_num. Instead, you can
set “target_max_misplaced_ratio” (default 5%). Then mgr will increase
pgp_num for you. It will increase pgp_num so that some pg get placed into
another OSD, until misplaced ratio reached target. Then it wait for some
backfilling to finish before increasing pgp_num again. (This behavior seems
to be introduced in Nautilus)
So I don’t think you need to worry about full OSDs. “backfillfull ratio”
should throttling backfill when OSD is nearly full, which in turn will
throttling pgp_num increase.
*发件人: *Gabriel Tzagkarakis <gabrieltz(a)gmail.com>
*发送时间: *2021年8月3日 19:42
*主题: *[ceph-users] PG scaling questions
I would like to know how does the autoscale or manual scaling actually
works to prevent my
cluster from running out of disk space.
Let's say i want to scale a pool of 8 PGs each ~400Gb to 32 PGs.
1) does each placement group get split in 4 pieces IN-PLACE all at the
2) does autoscaling choose one of the existing random placement groups for
example X.Y and
creates new empty placement groups and migrates data upon them and then
continues to the next big PG with or without deleting the original PG?
3) something else ?
I am more concerned about the time period when both the
initial/pre-existing PGs and the newly created ones co-exist in the
to prevent full osds. In my case each pg has many small files and deleting
stray pgs takes a long time.
Would it be better if i used something like
ceph osd pool set default.rgw.buckets.data pg_num 32
and then increase pgp_num in increments of 8 assuming one of the original
PGs is affected at a time. But my assumption may be wrong again
I could not find something relevant in the documentation
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io