Ceph EC PG calculation - ceph-users

18 Nov 2020

Hi,

I have this error:
I have 36 osd and get this:
Error ERANGE:  pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 10500
(mon_max_pg_per_osd 250 * num_in_osds 42)

If I want to calculate the max pg in my server, how it works if I have EC pool?

I have 4:2 data EC pool, and the others are replicated.

These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 flags
hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins
pg_num 32 pgp_num 32 autoscale_mode warn last_change 598 flags hashpspool stripe_width 0
application rgw
pool 6 'sin.rgw.log' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 599 flags hashpspool
stripe_width 0 application rgw
pool 7 'sin.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 600 flags hashpspool
stripe_width 0 application rgw
pool 8 'sin.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 601 lfor 0/393/391 flags
hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 10 'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 lfor 0/529/527
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 11 'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 flags
hashpspool stripe_width 0 application rgw
pool 12 'sin.rgw.buckets.data' erasure profile data-ec size 6 min_size 5
crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 604
flags hashpspool,ec_overwrites stripe_width 16384 application rgw

So how I can calculate the pgs?

This is my osd tree:
ID   CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
-1         534.38354  root default
-5          89.06392      host cephosd-6s01
36   nvme    1.74660          osd.36                up   1.00000  1.00000
  0    ssd   14.55289          osd.0                 up   1.00000  1.00000
  8    ssd   14.55289          osd.8                 up   1.00000  1.00000
15    ssd   14.55289          osd.15                up   1.00000  1.00000
18    ssd   14.55289          osd.18                up   1.00000  1.00000
24    ssd   14.55289          osd.24                up   1.00000  1.00000
30    ssd   14.55289          osd.30                up   1.00000  1.00000
-3          89.06392      host cephosd-6s02
37   nvme    1.74660          osd.37                up   1.00000  1.00000
  1    ssd   14.55289          osd.1                 up   1.00000  1.00000
11    ssd   14.55289          osd.11                up   1.00000  1.00000
17    ssd   14.55289          osd.17                up   1.00000  1.00000
23    ssd   14.55289          osd.23                up   1.00000  1.00000
28    ssd   14.55289          osd.28                up   1.00000  1.00000
35    ssd   14.55289          osd.35                up   1.00000  1.00000
-11          89.06392      host cephosd-6s03
41   nvme    1.74660          osd.41                up   1.00000  1.00000
  2    ssd   14.55289          osd.2                 up   1.00000  1.00000
  6    ssd   14.55289          osd.6                 up   1.00000  1.00000
13    ssd   14.55289          osd.13                up   1.00000  1.00000
19    ssd   14.55289          osd.19                up   1.00000  1.00000
26    ssd   14.55289          osd.26                up   1.00000  1.00000
32    ssd   14.55289          osd.32                up   1.00000  1.00000
-13          89.06392      host cephosd-6s04
38   nvme    1.74660          osd.38                up   1.00000  1.00000
  5    ssd   14.55289          osd.5                 up   1.00000  1.00000
  7    ssd   14.55289          osd.7                 up   1.00000  1.00000
14    ssd   14.55289          osd.14                up   1.00000  1.00000
20    ssd   14.55289          osd.20                up   1.00000  1.00000
25    ssd   14.55289          osd.25                up   1.00000  1.00000
31    ssd   14.55289          osd.31                up   1.00000  1.00000
-9          89.06392      host cephosd-6s05
40   nvme    1.74660          osd.40                up   1.00000  1.00000
  3    ssd   14.55289          osd.3                 up   1.00000  1.00000
10    ssd   14.55289          osd.10                up   1.00000  1.00000
12    ssd   14.55289          osd.12                up   1.00000  1.00000
21    ssd   14.55289          osd.21                up   1.00000  1.00000
29    ssd   14.55289          osd.29                up   1.00000  1.00000
33    ssd   14.55289          osd.33                up   1.00000  1.00000
-7          89.06392      host cephosd-6s06
39   nvme    1.74660          osd.39                up   1.00000  1.00000
  4    ssd   14.55289          osd.4                 up   1.00000  1.00000
  9    ssd   14.55289          osd.9                 up   1.00000  1.00000
16    ssd   14.55289          osd.16                up   1.00000  1.00000
22    ssd   14.55289          osd.22                up   1.00000  1.00000
27    ssd   14.55289          osd.27                up   1.00000  1.00000
34    ssd   14.55289          osd.34                up   1.00000  1.00000

This is the crush rules:
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "replicated_nvme",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -21,
                "item_name": "default~nvme"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "replicated_ssd",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 3,
        "rule_name": "sin.rgw.buckets.data.new",
        "ruleset": 3,
        "type": 3,
        "min_size": 3,
        "max_size": 6,
        "steps": [
            {
                "op": "set_chooseleaf_tries",
                "num": 5
            },
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_indep",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in the EC pool
is 512?
But how this affect the SSD replica pools then?

This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Thank you in advance.

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may
also be privileged or otherwise protected by copyright or other legal rules. If you have
received it by mistake please let us know by reply email and delete it from your system.
It is prohibited to copy this message or disclose its content to anyone. Any
confidentiality or privilege is not waived or lost by any mistaken delivery or
unauthorized disclosure of the message. All messages sent to and from Agoda may be
monitored to ensure compliance with company policies, to protect the company's
interests and to remove potential malware. Electronic messages may be intercepted,
amended, lost or deleted, or contain viruses.