So I wanted to report a crush rule/ec profile strange behaviour regarding radosgw items
which i am not sure if it's a bug or it's supposed to work that way.
I am trying to implement the below scenario in my home lab:
By default there is a "default" erasure-code-profile with the below settings:
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8
From the above we see that it uses the root bucket. Now ofcourse you would want to create
your own ec-profile with custom algorithm/crush buckets etc
Let's say for example we create two new ec profiles. One with specific crush-root =
ssd-performance2 and one with the crush-root=default (there are no disks there according
to ceph osd tree-> end of page)
ceph osd erasure-code-profile set test-ec crush-device-class= crush-failure-domain=host
crush-root=ssd-performance2 jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure
technique=reed_sol_van w=8
ceph osd erasure-code-profile set test-ec2 crush-device-class= crush-failure-domain=host
crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure
technique=reed_sol_van w=8
Now let's create the associated crush rules to use these profiles:
ceph osd crush rule create-erasure erasure-test-rule1 test-ec
ceph osd crush rule create-erasure erasure-test-rule2 test-ec2
Now let's say you have a radosgw server that has started and by default it creates the
5 default radosgwpools(supposed you have uploaded some data as well):
default.rgw.buckets.data
default.rgw.buckets.index
default.rgw.control
default.rgw.log
default.rgw.meta
Now if you grep these pools with ceph osd dump you will see that all of them are using
replicated rules but we want to use erasure for the radosgw data pool. So let's
migrate the default.rgw.buckets.data pool to a erasure-coded one.
1) We shutdown the radosgw-server so that we don't allow any requests coming in.
2) ceph osd pool rename default.rgw.buckets.data default.rgw.buckets.data-old
3) ceph osd pool create default.rgw.buckets.data 8 8 erasure test-ec erasure-test-rule -
> We use the newly created erasure crush rule with the profile we created and use the
ssd-performance2 root bucket
4) rados cppool default.rgw.buckets.data-old default.rgw.buckets.data
5) Start radosgw server again
At this point i can see the old objects and i can upload new objects in radosgw and
everything is working fine.
Now i see this strange behavior after i do the below:
We set the default.rgw.buckets.data to use the other erasure crush rule (This is using the
root bucket=default which doesn't have any disks):
ceph osd pool set default.rgw.buckets.data crush_rule erasure-test-rule2
Bug1? You could still browse the data but any attempt to upload/download hangs there with
the below log messages:
2019-12-18 17:07:07.037 7f05a1ece700 0 ERROR: client_io->complete_request() returned
Input/output error
2019-12-18 17:07:07.037 7f05a1ece700 2 req 712 0.004s s3:list_buckets op status=
Monitor nodes don't display anything and seems that new items cannot be saved (which
is correct as it doesn't know where to save them) but at least Monitor nodes should
display something as a warning or there must be crush check before to see if the rule can
be applied?
Reverting back the rule to erasure-test-rule works fine again
=================================
Bug 2? If you modify the erasure-test-rule profile to use a null crush bucket (like
erasure-test-rule2) then this is not being parsed and identified by the crush rule. Seems
crush rules skips that part
Example:
ceph osd erasure-code-profile set test-ec crush-root=default --force
At this point nothing happens and radosgw is working fine. Which it shouldn't as it
should see that the data cannot be saved anywhere. Unless it keeps the crush root bucket
from the crush rules and not from the erasure coded profiles...even if you force
apply/change it to the erasure profile like above.
=================================
Bug 3? You don't know which rule is using which erasure-code-profile from ceph osd
dump. You only see that this pool is using crush rule number 1 but if you dump this crush
rule it doesn't mention which erasure-code profile is using, other than which
item_name eg = root bucket
Even with the telemetry on with latest release and if you do "ceph telemetry show
basic" with below you see there is no crush-root being mentioned.
So is the crush rule > erasure_code_profile regarding parsing of the crush_root
buckets?
{
"min_size": 2,
"erasure_code_profile": {
"crush-failure-domain": "host",
"k": "2",
"technique": "reed_sol_van",
"m": "1",
"plugin": "jerasure"
},
"pg_autoscale_mode": "warn",
"pool": 860,
"size": 3,
"cache_mode": "none",
"target_max_objects": 0,
"pg_num": 8,
"pgp_num": 8,
"target_max_bytes": 0,
"type": "erasure"
}
root@ceph-mon01:~# ceph osd crush rule dump erasure-test-rule
{
"rule_id": 2,
"rule_name": "erasure-test-rule",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "ssd-performance2"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
root@ceph-mon01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-37 0.18398 root really-low
-40 0.09799 host ceph-osd01-really-low
11 hdd 0.09799 osd.11 up 1.00000 1.00000
-41 0.04799 host ceph-osd02-really-low
1 hdd 0.01900 osd.1 up 1.00000 1.00000
9 hdd 0.02899 osd.9 up 1.00000 1.00000
-42 0.03799 host ceph-osd03-really-low
6 hdd 0.01900 osd.6 up 1.00000 1.00000
7 hdd 0.01900 osd.7 up 1.00000 1.00000
-23 10.67598 root spinning-rust
-20 2.04900 rack rack1
-3 2.04900 host ceph-osd01
3 hdd 0.04900 osd.3 up 0.95001 1.00000
22 hdd 1.00000 osd.22 up 0.90002 1.00000
17 ssd 1.00000 osd.17 up 1.00000 1.00000
-25 3.07799 rack rack2
-5 3.07799 host ceph-osd02
4 hdd 0.04900 osd.4 up 1.00000 1.00000
8 hdd 0.02899 osd.8 up 1.00000 1.00000
23 hdd 1.00000 osd.23 up 1.00000 1.00000
25 hdd 1.00000 osd.25 up 1.00000 1.00000
12 ssd 1.00000 osd.12 up 1.00000 1.00000
-28 3.54900 rack rack3
-7 3.54900 host ceph-osd03
0 hdd 1.00000 osd.0 up 0.90002 1.00000
5 hdd 0.04900 osd.5 up 1.00000 1.00000
30 hdd 0.50000 osd.30 up 1.00000 1.00000
21 ssd 1.00000 osd.21 up 0.95001 1.00000
24 ssd 1.00000 osd.24 up 1.00000 1.00000
-55 2.00000 rack rack4
-49 2.00000 host ceph-osd04
26 hdd 1.00000 osd.26 up 1.00000 1.00000
27 hdd 1.00000 osd.27 up 1.00000 1.00000
-2 9.10799 root ssd-performance2
-32 2.09799 host ceph-osd01-ssd
2 ssd 0.09799 osd.2 up 1.00000 1.00000
13 ssd 1.00000 osd.13 up 1.00000 1.00000
16 ssd 1.00000 osd.16 up 1.00000 1.00000
-31 3.00000 host ceph-osd02-ssd
14 ssd 1.00000 osd.14 up 1.00000 1.00000
18 ssd 1.00000 osd.18 up 1.00000 1.00000
19 ssd 1.00000 osd.19 up 1.00000 1.00000
-9 2.00999 host ceph-osd03-ssd
10 ssd 0.00999 osd.10 up 0.90002 1.00000
15 ssd 1.00000 osd.15 up 1.00000 1.00000
20 ssd 1.00000 osd.20 up 1.00000 1.00000
-52 2.00000 host ceph-osd04-ssd
28 ssd 1.00000 osd.28 up 1.00000 1.00000
29 ssd 1.00000 osd.29 up 1.00000 1.00000
-1 0 root default
root@ceph-mon01:~#
Thanks,
Anastasios
Show replies by date