I'm trying to figure out a CRUSH rule that will spread data out across my cluster as
much as possible, but not more than 2 chunks per host.
If I use the default rule with an osd failure domain like this:
step take default
step choose indep 0 type osd
step emit
I get clustering of 3-4 chunks on some of the hosts:
# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r
'.pg_stats[].pgid'); do
echo $pg
for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
ceph osd find $osd | jq -r '.host'
done | sort | uniq -c | sort -n -k1
8.0
1 harrahs
3 paris
4 aladdin
8.1
1 aladdin
1 excalibur
2 mandalaybay
4 paris
8.2
1 harrahs
2 aladdin
2 mirage
3 paris
...
However, if I change the rule to use:
step take default
step choose indep 0 type host
step chooseleaf indep 2 type osd
step emit
I get the data spread across 4 hosts with 2 chunks per host:
# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r
'.pg_stats[].pgid'); do
echo $pg
for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
ceph osd find $osd | jq -r '.host'
done | sort | uniq -c | sort -n -k1
> done
8.0
2 aladdin
2 harrahs
2 mandalaybay
2 paris
8.1
2 aladdin
2 harrahs
2 mandalaybay
2 paris
8.2
2 harrahs
2 mandalaybay
2 mirage
2 paris
...
Is it possible to get the data to spread out over more hosts? I plan on expanding the
cluster in the near future and would like to see more hosts get 1 chunk instead of 2.
Also, before you recommend adding two more hosts and switching to a host-based failure
domain, the cluster is on a variety of hardware with between 2-6 drives per host and
drives that are 4TB-12TB in size (it's part of my home lab).
Thanks,
Bryan