Hallo Dan,
I am using Nautilus with a slightly outdated version 14.2.16, and I
don't remember me playing with upmaps in the past.
Following your suggestion, I removed a bunch of upmaps (the "longer"
lines) and after a while I verified that all PGs are properly mapped.
Thanks!
Fulvio
Il 5/27/2021 5:33 PM, Dan van der Ster ha scritto:
Hi Fulvio,
I suggest removing only the upmaps which are clearly incorrect, and
then see if the upmap balancer re-creates them.
Perhaps they were created when they were not incorrect, when you had a
different crush rule?
Or perhaps you're running an old version of ceph which had buggy
balancer implementation?
Cheers, Dan
On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi <fulvio.galeazzi(a)garr.it> wrote:
>
> Hallo Dan, Nathan, thanks for your replies and apologies for my silence.
>
> Sorry I had made a typo... the rule is really 6+4. And to reply to
> Nathan's message, the rule was built like this in anticipation of
> getting additional servers, at which point in time I will relax the "2
> chunks per OSD" part.
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get
> default.rgw.buckets.data erasure_code_profile
> erasure_code_profile: ec_6and4_big
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get
> ec_6and4_big
> crush-device-class=big
> crush-failure-domain=osd
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=4
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Indeed, Dan:
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453
> pg_upmap_items 116.453 [76,49,129,108]
>
> Don't think I ever set such an upmap myself. Do you think it would be
> good to try and remove all upmaps, let the upmap balancer do its magic,
> and check again?
>
> Thanks!
>
> Fulvio
>
>
> On 20/05/2021 18:59, Dan van der Ster wrote:
>> Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you
>> choose 6 type host and then chooseleaf 2 type osd?
>>
>> .. Dan
>>
>>
>> On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi <fulvio.galeazzi(a)garr.it
>> <mailto:fulvio.galeazzi@garr.it>> wrote:
>>
>> Hallo Dan, Bryan,
>> I have a rule similar to yours, for an 8+4 pool, with only
>> difference that I replaced the second "choose" with
"chooseleaf", which
>> I understand should make no difference:
>>
>> rule default.rgw.buckets.data {
>> id 6
>> type erasure
>> min_size 3
>> max_size 10
>> step set_chooseleaf_tries 5
>> step set_choose_tries 100
>> step take default class big
>> step choose indep 5 type host
>> step chooseleaf indep 2 type osd
>> step emit
>> }
>>
>> I am on Nautilus 14.2.16 and while performing a maintenance the
>> other
>> day, I noticed 2 PGs were incomplete and caused troubles to some users.
>> I then verified that (thanks Bryan for the command):
>>
>> [cephmgr(a)cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map
>> 116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r
'.host'
>> ; done | sort | uniq -c | sort -n -k1
>> 2 r2srv07.ct1.box.garr
>> 2 r2srv10.ct1.box.garr
>> 2 r3srv07.ct1.box.garr
>> 4 r1srv02.ct1.box.garr
>>
>> You see that 4 PGs were put on r1srv02.
>> May be this happened due to some temporary unavailability of the
>> host at
>> some point? As all my servers are now up and running, is there a way to
>> force the placement rule to rerun?
>>
>> Thanks!
>>
>> Fulvio
>>
>>
>> Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto:
>> > Hi Bryan,
>> >
>> > I had to do something similar, and never found a rule to place
>> "up to"
>> > 2 chunks per host, so I stayed with the placement of *exactly* 2
>> > chunks per host.
>> >
>> > But I did this slightly differently to what you wrote earlier: my
>> rule
>> > chooses exactly 4 hosts, then chooses exactly 2 osds on each:
>> >
>> > type erasure
>> > min_size 3
>> > max_size 10
>> > step set_chooseleaf_tries 5
>> > step set_choose_tries 100
>> > step take default class hdd
>> > step choose indep 4 type host
>> > step choose indep 2 type osd
>> > step emit
>> >
>> > If you really need the "up to 2" approach then maybe you can
split
>> > each host into two "host" crush buckets, with half the OSDs
in each.
>> > Then a normal host-wise rule should work.
>> >
>> > Cheers, Dan
>> >
>
--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250