[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

4 Jun 2021

Hallo Dan,
     I am using Nautilus with a slightly outdated version 14.2.16, and I 
don't remember me playing with upmaps in the past.
Following your suggestion, I removed a bunch of upmaps (the "longer" 
lines) and after a while I verified that all PGs are properly mapped.

   Thanks!

			Fulvio

Il 5/27/2021 5:33 PM, Dan van der Ster ha scritto:
...
  Hi Fulvio,

 I suggest removing only the upmaps which are clearly incorrect, and
 then see if the upmap balancer re-creates them.
 Perhaps they were created when they were not incorrect, when you had a
 different crush rule?
 Or perhaps you're running an old version of ceph which had buggy
 balancer implementation?

 Cheers, Dan

 On Thu, May 27, 2021 at 5:16 PM Fulvio Galeazzi &lt;fulvio.galeazzi(a)garr.it&gt; wrote:
>
> Hallo Dan, Nathan, thanks for your replies and apologies for my silence.
>
>     Sorry I had made a typo... the rule is really 6+4. And to reply to
> Nathan's message, the rule was built like this in anticipation of
> getting additional servers, at which point in time I will relax the "2
> chunks per OSD" part.
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd pool get
> default.rgw.buckets.data erasure_code_profile
> erasure_code_profile: ec_6and4_big
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd erasure-code-profile get
> ec_6and4_big
> crush-device-class=big
> crush-failure-domain=osd
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=4
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Indeed, Dan:
>
> [cephmgr(a)cephAdmPA1.cephAdmPA1 ~]$ ceph osd dump | grep upmap | grep 116.453
> pg_upmap_items 116.453 [76,49,129,108]
>
> Don't think I ever set such an upmap myself. Do you think it would be
> good to try and remove all upmaps, let the upmap balancer do its magic,
> and check again?
>
>     Thanks!
>
>                          Fulvio
>
>
> On 20/05/2021 18:59, Dan van der Ster wrote:
>> Hold on: 8+4 needs 12 osds but you only show 10 there. Shouldn't you
>> choose 6 type host and then chooseleaf 2 type osd?
>>
>> .. Dan
>>
>>
>> On Thu, May 20, 2021, 1:30 PM Fulvio Galeazzi &lt;fulvio.galeazzi(a)garr.it
>> <mailto:fulvio.galeazzi@garr.it>> wrote:
>>
>>      Hallo Dan, Bryan,
>>            I have a rule similar to yours, for an 8+4 pool, with only
>>      difference that I replaced the second "choose" with
"chooseleaf", which
>>      I understand should make no difference:
>>
>>      rule default.rgw.buckets.data {
>>                id 6
>>                type erasure
>>                min_size 3
>>                max_size 10
>>                step set_chooseleaf_tries 5
>>                step set_choose_tries 100
>>                step take default class big
>>                step choose indep 5 type host
>>                step chooseleaf indep 2 type osd
>>                step emit
>>      }
>>
>>          I am on Nautilus 14.2.16 and while performing a maintenance the
>>      other
>>      day, I noticed 2 PGs were incomplete and caused troubles to some users.
>>      I then verified that (thanks Bryan for the command):
>>
>>      [cephmgr(a)cephAdmCT1.cephAdmCT1 clusterCT]$ for osd in $(ceph pg map
>>      116.453 -f json | jq -r '.up[]'); do ceph osd find $osd | jq -r
'.host'
>>      ; done | sort | uniq -c | sort -n -k1
>>              2 r2srv07.ct1.box.garr
>>              2 r2srv10.ct1.box.garr
>>              2 r3srv07.ct1.box.garr
>>              4 r1srv02.ct1.box.garr
>>
>>          You see that 4 PGs were put on r1srv02.
>>      May be this happened due to some temporary unavailability of the
>>      host at
>>      some point? As all my servers are now up and running, is there a way to
>>      force the placement rule to rerun?
>>
>>          Thanks!
>>
>>                               Fulvio
>>
>>
>>      Il 5/16/2021 11:40 PM, Dan van der Ster ha scritto:
>>       > Hi Bryan,
>>       >
>>       > I had to do something similar, and never found a rule to place
>>      "up to"
>>       > 2 chunks per host, so I stayed with the placement of *exactly* 2
>>       > chunks per host.
>>       >
>>       > But I did this slightly differently to what you wrote earlier: my
>>      rule
>>       > chooses exactly 4 hosts, then chooses exactly 2 osds on each:
>>       >
>>       >          type erasure
>>       >          min_size 3
>>       >          max_size 10
>>       >          step set_chooseleaf_tries 5
>>       >          step set_choose_tries 100
>>       >          step take default class hdd
>>       >          step choose indep 4 type host
>>       >          step choose indep 2 type osd
>>       >          step emit
>>       >
>>       > If you really need the "up to 2" approach then maybe you can
split
>>       > each host into two "host" crush buckets, with half the OSDs
in each.
>>       > Then a normal host-wise rule should work.
>>       >
>>       > Cheers, Dan
>>       >
> 
-- 
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250

2024

2023

2022

2021

2020

2019

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster