Thank you for sharing your experience. Glad to hear that someone has already used this
strategy and it works well.
在 2020年10月27日,03:10,Reed Dier
<reed.dier(a)focusvq.com> 写道:
Late reply, but I have been using what I refer to as a "hybrid" crush topology
for some data for a while now.
Initially with just rados objects, and later with RBD.
We found that we were able to accelerate reads to roughly all-ssd performance levels,
while bringing up the tail end of the write performance a bit.
Write performance wasn't orders of magnitude improvements, but the ssd write +
replicate to hdd cycle seemed to be an improvement in reducing slow ops, etc.
I will see if I can follow up with some rough benchmarks I can dig up.
As for implementation, I have SSD-only hosts, and HDD-only hosts, bifurcated at the root
level of crush.
{
"rule_id": 2,
"rule_name": "hybrid_ruleset",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -13,
"item_name": "ssd"
},
{
"op": "chooseleaf_firstn",
"num": 1,
"type": "host"
},
{
"op": "emit"
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": -1,
"type": "chassis"
},
{
"op": "emit"
}
]
},
I'm not remembering having to do any type of primary affinity stuff to make it work,
it seemed to *just work* for the most part with making the SSD copy the primary.
Yes, it should just work from my investigations, as long as you don’t change the primary
affinity of SSD.
One thing to keep in mind is that I find balancer
distribution to be a bit skewed due to the hybrid pools, though that could just be my
perception.
I've got 3x rep hdd, 3x rep hybrid, 3x rep ssd, and ec73 hdd pools, so I have a bit
wonky pool topology, and that could lead to issues as well with distribution.
Hope this is helpful.
Reed
On Oct 25, 2020, at 2:10 AM, huww98(a)outlook.com
wrote:
Hi all,
We are planning for a new pool to store our dataset using CephFS. These data are almost
read-only (but not guaranteed) and consist of a lot of small files. Each node in our
cluster has 1 * 1T SSD and 2 * 6T HDD, and we will deploy about 10 such nodes. We aim at
getting the highest read throughput.
If we just use a replicated pool of size 3 on SSD, we should get the best performance,
however, that only leave us 1/3 of usable SSD space. And EC pools are not friendly to such
small object read workload, I think.
Now I’m evaluating a mixed SSD and HDD replication strategy. Ideally, I want 3 data
replications, each on a different host (fail domain). 1 of them on SSD, the other 2 on
HDD. And normally every read request is directed to SSD. So, if every SSD OSD is up, I’d
expect the same read throughout as the all SSD deployment.
I’ve read the documents and did some tests. Here is the crush rule I’m testing with:
rule mixed_replicated_rule {
id 3
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 1 type host
step emit
step take default class hdd
step chooseleaf firstn -1 type host
step emit
}
Now I have the following conclusions, but I’m not very sure:
* The first OSD produced by crush will be the primary OSD (at least if I don’t change the
“primary affinity”). So, the above rule is guaranteed to map SSD OSD as primary in pg. And
every read request will read from SSD if it is up.
* It is currently not possible to enforce SSD and HDD OSD to be chosen from different
hosts. So, if I want to ensure data availability even if 2 hosts fail, I need to choose 1
SSD and 3 HDD OSD. That means setting the replication size to 4, instead of the ideal
value 3, on the pool using the above crush rule.
Am I correct about the above statements? How would this work from your experience?
Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io