Hi Fulvio,
I suggest removing only the upmaps which are clearly incorrect, and
then see if the upmap balancer re-creates them.
Perhaps they were created when they were not incorrect, when you had a
different crush rule?
Or perhaps you're running an old version of ceph which had buggy
balancer implementation?
Cheers, Dan
On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi <fulvio.galeazzi(a)garr.it> wrote:
>
> Hallo Dan, Nathan, thanks for your replies and apologies for my silence.
>
> Sorry I had made a typo... the rule is really 6+4. And to reply to
> Nathan's message, the rule was built like this in anticipation of
> getting additional servers, at which point in time I will relax the "2
> chunks per OSD" part.
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get
> default.rgw.buckets.data erasure_code_profile
> erasure_code_profile: ec_6and4_big
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get
> ec_6and4_big
> crush-device-class=big
> crush-failure-domain=osd
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=4
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Indeed, Dan:
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453
> pg_upmap_items 116.453 [76,49,129,108]
>
> Don't think I ever set such an upmap myself. Do you think it would be
> good to try and remove all upmaps, let the upmap balancer do its magic,
> and check again?
>
> Thanks!
>
> Fulvio
>
>
> On 20/05/2021 18:59, Dan van der Ster wrote:
> > Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you
> > choose 6 type host and then chooseleaf 2 type osd?
> >
> > .. Dan
> >
> >
> > On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi <fulvio.galeazzi(a)garr.it
> > <mailto:fulvio.galeazzi@garr.it>> wrote:
> >
> > Hallo Dan, Bryan,
> > I have a rule similar to yours, for an 8+4 pool, with only
> > difference that I replaced the second "choose" with
"chooseleaf", which
> > I understand should make no difference:
> >
> > rule default.rgw.buckets.data {
> > id 6
> > type erasure
> > min_size 3
> > max_size 10
> > step set_chooseleaf_tries 5
> > step set_choose_tries 100
> > step take default class big
> > step choose indep 5 type host
> > step chooseleaf indep 2 type osd
> > step emit
> > }
> >
> > I am on Nautilus 14.2.16 and while performing a maintenance the
> > other
> > day, I noticed 2 PGs were incomplete and caused troubles to some users.
> > I then verified that (thanks Bryan for the command):
> >
> > [cephmgr(a)cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map
> > 116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r
'.host'
> > ; done | sort | uniq -c | sort -n -k1
> > 2 r2srv07.ct1.box.garr
> > 2 r2srv10.ct1.box.garr
> > 2 r3srv07.ct1.box.garr
> > 4 r1srv02.ct1.box.garr
> >
> > You see that 4 PGs were put on r1srv02.
> > May be this happened due to some temporary unavailability of the
> > host at
> > some point? As all my servers are now up and running, is there a way to
> > force the placement rule to rerun?
> >
> > Thanks!
> >
> > Fulvio
> >
> >
> > Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto:
> > > Hi Bryan,
> > >
> > > I had to do something similar, and never found a rule to place
> > "up to"
> > > 2 chunks per host, so I stayed with the placement of *exactly* 2
> > > chunks per host.
> > >
> > > But I did this slightly differently to what you wrote earlier: my
> > rule
> > > chooses exactly 4 hosts, then chooses exactly 2 osds on each:
> > >
> > > type erasure
> > > min_size 3
> > > max_size 10
> > > step set_chooseleaf_tries 5
> > > step set_choose_tries 100
> > > step take default class hdd
> > > step choose indep 4 type host
> > > step choose indep 2 type osd
> > > step emit
> > >
> > > If you really need the "up to 2" approach then maybe you can
split
> > > each host into two "host" crush buckets, with half the OSDs
in each.
> > > Then a normal host-wise rule should work.
> > >
> > > Cheers, Dan
> > >
>