Its the same formula. An k-times replicated pool has
replication factor R.
With the formula I stated below, you can compute the entire PG budget depending
on what your PG target per OSD is. I'm afraid you will have to do that yourself.
Sorry, I meant a k-times replicated pool has replication factor R=k.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans(a)dtu.dk>
Sent: 18 November 2020 09:25:46
To: Szabo, Istvan (Agoda); ceph-users(a)ceph.io
Subject: [ceph-users] Re: Ceph EC PG calculation
Its the same formula. An k-times replicated pool has replication factor R. With the
formula I stated below, you can compute the entire PG budget depending on what your PG
target per OSD is. I'm afraid you will have to do that yourself.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
Sent: 18 November 2020 09:21:50
To: Frank Schilder; ceph-users(a)ceph.io
Subject: RE: Ceph EC PG calculation
Hi,
Thank you Frank.
And after how this affect the non EC pools? Because they will use the same device classes,
which is SSD.
So I'd calculate with 100PG/osd, because this will grow.
If I calculate with EC it will be 512. But still have many replicated pools 😊
Or just let the autoscaler in warn and do when it instruct.
To be honest I just want to be sure my setup is correct or I miss something or did
something wrong.
-----Original Message-----
From: Frank Schilder <frans(a)dtu.dk>
Sent: Wednesday, November 18, 2020 3:11 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>om>; ceph-users(a)ceph.io
Subject: Re: Ceph EC PG calculation
Email received from outside the company. If in doubt don't click links nor open
attachments!
________________________________
Roughly speaking, if you have N OSDs, a replication factor of R and aim for P PGs/OSD on
average, you can assign (N*P)/R PGs to the pool.
Example: 4+2 EC has replication 6. There are 36 OSDs. If you want to place, say, 50 PGs
per OSD, you can assign
(36*50)/6=300 PGs
to the EC pool. You may pick a close power of 2 if you wish and then calculate how many
PGs will be placed on each OSD on average. For example, we choose 256 PGs, then
256*6/36 = 42.7 PGs per OSD will be added.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
Sent: 18 November 2020 04:58:38
To: ceph-users(a)ceph.io
Subject: [ceph-users] Ceph EC PG calculation
Hi,
I have this error:
I have 36 osd and get this:
Error ERANGE: pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 10500
(mon_max_pg_per_osd 250 * num_in_osds 42)
If I want to calculate the max pg in my server, how it works if I have EC pool?
I have 4:2 data EC pool, and the others are replicated.
These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 flags
hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 2
'.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num
32 pgp_num 32 autoscale_mode warn last_change 598 flags hashpspool stripe_width 0
application rgw pool 6 'sin.rgw.log' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 599 flags
hashpspool stripe_width 0 application rgw pool 7 'sin.rgw.control' replicated size
3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn
last_change 600 flags hashpspool stripe_width 0 application rgw pool 8
'sin.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins
pg_num 8 pgp_num 8 autoscale_mode warn last_change 601 lfor 0/393/391 flags hashpspool
stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 10
'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 lfor 0/529/527 flags
hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 11
'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 flags hashpspool
stripe_width 0 application rgw pool 12 'sin.rgw.buckets.data' erasure profile
data-ec size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32
autoscale_mode warn last_change 604 flags hashpspool,ec_overwrites stripe_width 16384
application rgw
So how I can calculate the pgs?
This is my osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 534.38354 root default
-5 89.06392 host cephosd-6s01
36 nvme 1.74660 osd.36 up 1.00000 1.00000
0 ssd 14.55289 osd.0 up 1.00000 1.00000
8 ssd 14.55289 osd.8 up 1.00000 1.00000
15 ssd 14.55289 osd.15 up 1.00000 1.00000
18 ssd 14.55289 osd.18 up 1.00000 1.00000
24 ssd 14.55289 osd.24 up 1.00000 1.00000
30 ssd 14.55289 osd.30 up 1.00000 1.00000
-3 89.06392 host cephosd-6s02
37 nvme 1.74660 osd.37 up 1.00000 1.00000
1 ssd 14.55289 osd.1 up 1.00000 1.00000
11 ssd 14.55289 osd.11 up 1.00000 1.00000
17 ssd 14.55289 osd.17 up 1.00000 1.00000
23 ssd 14.55289 osd.23 up 1.00000 1.00000
28 ssd 14.55289 osd.28 up 1.00000 1.00000
35 ssd 14.55289 osd.35 up 1.00000 1.00000
-11 89.06392 host cephosd-6s03
41 nvme 1.74660 osd.41 up 1.00000 1.00000
2 ssd 14.55289 osd.2 up 1.00000 1.00000
6 ssd 14.55289 osd.6 up 1.00000 1.00000
13 ssd 14.55289 osd.13 up 1.00000 1.00000
19 ssd 14.55289 osd.19 up 1.00000 1.00000
26 ssd 14.55289 osd.26 up 1.00000 1.00000
32 ssd 14.55289 osd.32 up 1.00000 1.00000
-13 89.06392 host cephosd-6s04
38 nvme 1.74660 osd.38 up 1.00000 1.00000
5 ssd 14.55289 osd.5 up 1.00000 1.00000
7 ssd 14.55289 osd.7 up 1.00000 1.00000
14 ssd 14.55289 osd.14 up 1.00000 1.00000
20 ssd 14.55289 osd.20 up 1.00000 1.00000
25 ssd 14.55289 osd.25 up 1.00000 1.00000
31 ssd 14.55289 osd.31 up 1.00000 1.00000
-9 89.06392 host cephosd-6s05
40 nvme 1.74660 osd.40 up 1.00000 1.00000
3 ssd 14.55289 osd.3 up 1.00000 1.00000
10 ssd 14.55289 osd.10 up 1.00000 1.00000
12 ssd 14.55289 osd.12 up 1.00000 1.00000
21 ssd 14.55289 osd.21 up 1.00000 1.00000
29 ssd 14.55289 osd.29 up 1.00000 1.00000
33 ssd 14.55289 osd.33 up 1.00000 1.00000
-7 89.06392 host cephosd-6s06
39 nvme 1.74660 osd.39 up 1.00000 1.00000
4 ssd 14.55289 osd.4 up 1.00000 1.00000
9 ssd 14.55289 osd.9 up 1.00000 1.00000
16 ssd 14.55289 osd.16 up 1.00000 1.00000
22 ssd 14.55289 osd.22 up 1.00000 1.00000
27 ssd 14.55289 osd.27 up 1.00000 1.00000
34 ssd 14.55289 osd.34 up 1.00000 1.00000
This is the crush rules:
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "replicated_nvme",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~nvme"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "replicated_ssd",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 3,
"rule_name": "sin.rgw.buckets.data.new",
"ruleset": 3,
"type": 3,
"min_size": 3,
"max_size": 6,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]
So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in the EC pool
is 512?
But how this affect the SSD replica pools then?
This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
Thank you in advance.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may
also be privileged or otherwise protected by copyright or other legal rules. If you have
received it by mistake please let us know by reply email and delete it from your system.
It is prohibited to copy this message or disclose its content to anyone. Any
confidentiality or privilege is not waived or lost by any mistaken delivery or
unauthorized disclosure of the message. All messages sent to and from Agoda may be
monitored to ensure compliance with company policies, to protect the company's
interests and to remove potential malware. Electronic messages may be intercepted,
amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io