increasing PG count - limiting disruption - ceph-users - lists.ceph.io

List overview All Threads
Download

increasing PG count - limiting disruption

Re: osdmaps not trimmed until...

mds crash loop - cephfs disaster...

Frank R

14 Nov 2019 14 Nov '19

7:49 p.m.

Hi all, When increasing the number of placement groups for a pool by a large amount (say 2048 to 4096) is it better to go in small steps or all at once? This is a filestore cluster. Thanks, Frank

Attachments:

attachment.htm (text/html — 310 bytes)

Reply

Show replies by date

David Turner

14 Nov 14 Nov

8:48 p.m.

There are a few factors to consider. I've gone from 16k pgs to 32k pgs before and learned some lessons. The first and most imminent is the peering that happens when you increase the PG count. I like to increase the pg_num and pgp_num values slowly to mitigate this. Something like [1] this should do the trick to increase your pg count slowly and waiting for all peering and such to finish before continuing. It will also wait for a few other statuses that you shouldn't be doing maintenance like this during. The second is that mons do not compact their databases while a pg is in a non-"clean" state. That means that while your cluster is creating these new PGs and moving data around, that your mon stores will grow with new maps until everything is healthy again. This is desired behavior to keep everything healthy in Ceph in the face of failures, BUT it means that you need to be aware of how much space you have on your mons for the mon store to grow. When I was increasing from 16k to 32k PGs, that means we could only create 4k PGs at a time. In that cluster that would take about 2 weeks to finish. When we tried to do more than that, our mons ran out of space and we had to add disks to the mons to move the mon stores to so that the mons could continue to run. Finally know that this is just going to take a while (depending on how much data is in your cluster and how full it is). Be patient. Either you increase max_backfills, lower backfill sleep, and such to make the backfilling go faster (at the cost of IOPS used here that the clients can't) or you keep these throttled to not impact clients as much. Keep a good balance though as putting off finishing the recovery for too long leaves your cluster in a riskier position for that much longer. Good luck. [1] *Note that I typed this in gmail and not copied from a script. Please test before using. ceph osd set nobackfill ceph osd set norebalance function healthy_wait() { while ceph health | grep -q 'peering\|inactive\|activating\|creating\|down\|inconsistent\|stale'; do echo waiting for ceph to be healthier sleep 10 done } for count in {2048..4096..256}; do healthy_wait ceph osd pool set $pool pg_num $count healthy_wait ceph osd pool set $pool pgp_num $count done healthy_wait ceph osd unset nobackfill ceph osd unset norebalance On Thu, Nov 14, 2019 at 11:19 AM Frank R <frankaritchie(a)gmail.com> wrote:

Hi all, When increasing the number of placement groups for a pool by a large amount (say 2048 to 4096) is it better to go in small steps or all at once? This is a filestore cluster. Thanks, Frank _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Reply

1649

days inactive

1649

days old

ceph-users@ceph.io

Manage subscription

1 comments

2 participants

tags (0)

participants (2)

David Turner
Frank R