There are a few factors to consider. I've gone from 16k pgs to 32k pgs
before and learned some lessons.
The first and most imminent is the peering that happens when you increase
the PG count. I like to increase the pg_num and pgp_num values slowly to
mitigate this. Something like [1] this should do the trick to increase your
pg count slowly and waiting for all peering and such to finish before
continuing. It will also wait for a few other statuses that you shouldn't
be doing maintenance like this during.
The second is that mons do not compact their databases while a pg is in a
non-"clean" state. That means that while your cluster is creating these new
PGs and moving data around, that your mon stores will grow with new maps
until everything is healthy again. This is desired behavior to keep
everything healthy in Ceph in the face of failures, BUT it means that you
need to be aware of how much space you have on your mons for the mon store
to grow. When I was increasing from 16k to 32k PGs, that means we could
only create 4k PGs at a time. In that cluster that would take about 2 weeks
to finish. When we tried to do more than that, our mons ran out of space
and we had to add disks to the mons to move the mon stores to so that the
mons could continue to run.
Finally know that this is just going to take a while (depending on how much
data is in your cluster and how full it is). Be patient. Either you
increase max_backfills, lower backfill sleep, and such to make the
backfilling go faster (at the cost of IOPS used here that the clients
can't) or you keep these throttled to not impact clients as much. Keep a
good balance though as putting off finishing the recovery for too long
leaves your cluster in a riskier position for that much longer.
Good luck.
[1] *Note that I typed this in gmail and not copied from a script. Please
test before using.
ceph osd set nobackfill
ceph osd set norebalance
function healthy_wait() {
while ceph health | grep -q
'peering\|inactive\|activating\|creating\|down\|inconsistent\|stale'; do
echo waiting for ceph to be healthier
sleep 10
done
}
for count in {2048..4096..256}; do
healthy_wait
ceph osd pool set $pool pg_num $count
healthy_wait
ceph osd pool set $pool pgp_num $count
done
healthy_wait
ceph osd unset nobackfill
ceph osd unset norebalance
On Thu, Nov 14, 2019 at 11:19 AM Frank R <frankaritchie(a)gmail.com> wrote:
Hi all,
When increasing the number of placement groups for a pool by a large
amount (say 2048 to 4096) is it better to go in small steps or all at once?
This is a filestore cluster.
Thanks,
Frank
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io