Hi Ondrej,
When running multiple OSDs on a shared DB/WAL NVME,it is important to take
into account,when designing your redundancy/failure domains,that the loss
of a single NVMe drive will take out a number of OSDs.You must design your
redundancy,so tat it is acceptable to lose that many OSDs
simultaneously,and still being able to rebuild without data loss.In most
scenarios, this is easily addressed simply by using failure_domain=Host, as
you won't be sharing DB/WAL NVMes across multiple hosts. I don't think
there's any generally agreed perfect number of OSDs per DB/WAL NVMe,but
I'veseen others argue for a best practice of maximum3 OSDs per DB/WAL NVMe,
and have myself adopted that as a standard. I run hosts with 12 HDD OSDs
and 4 DB/WAL NVMEs. and a FAILURE_DOMAIN=Host.
Best Regards,
Simon Kepp,
Founder,
Kepp Technologies.
On Fri, Apr 19, 2024 at 2:07 PM Ondřej Kukla <ondrej(a)kuuk.la> wrote:
Hello,
I’m going to mainly answer the practical questions Niklaus had.
Our standart setup is 12HDDs and 2 Enterprise NVMe per node which means we
have 6 OSDs per 1 NVMe. For the partition we use LVM.
The fact that one one failed NVMe takes down 6 OSDs isn’t great but our
osd-node count is more then double the M + K values for Erasure coding
which means 6 OSDs should be ok-ish. Failing multiple NVMe could be an
issue. If you use replicated pools then this isn’t that problematic.
When it comes to recovery Ceph can easily recover that. Just recreate the
LVMs and OSDs and you are good to go.
One other benefit for us is that because we use large NVMes (7.7TiB) we
can use the spare space for a fast pool.
Ondrej
On 19. 4. 2024, at 12:04, Torkil Svensgaard
<torkil(a)drcmr.dk> wrote:
Hi
Red Hat Ceph support told us back in the day that 16 DB/WAL partitions
pr NVMe
were the max supported by RHCS because their testing showed
performance suffered beyond that. We are running with 11 pr NVMe.
We are prepared to lose a bunch of OSDs if we have an NVMe die. We
expect ceph
will handle it and we can redeploy the OSDs with a new NVMe
device.
We use a service spec for the chopping up bit:
service_type: osd
service_id: slow
service_name: osd.slow
placement:
host_pattern: '*'
spec:
block_db_size: 290966113186
data_devices:
rotational: 1
db_devices:
rotational: 0
size: '1000G:'
filter_logic: AND
objectstore: bluestore
Mvh.
Torkil
On 19-04-2024 11:02, Niklaus Hofer wrote:
> Dear all
> We have an HDD ceph cluster that could do with some more IOPS. One
solution we
are considering is installing NVMe SSDs into the storage nodes
and using them as WAL- and/or DB devices for the Bluestore OSDs.
> However, we have some questions about this
and are looking for some
guidance and advice.
> The first one is about the expected benefits.
Before we undergo the
efforts involved in the transition, we are wondering if it is
even worth
it. How much of a performance boost one can expect when adding NVMe SSDs
for WAL-devices to an HDD cluster? Plus, how much faster than that does it
get with the DB also being on SSD. Are there rule-of-thumb number of that?
Or maybe someone has done benchmarks in the past?
> The second question is of more practical
nature. Are there any best-
practices on how to implement this? I was thinking we
won't do one SSD per
HDD - surely an NVMe SSD is plenty fast to handle the traffic from multiple
OSDs. But what is a good ratio? Do I have one NVMe SSD per 4 HDDs? Per 6 or
even 8? Also, how should I chop-up the SSD, using partitions or using LVM?
Last but not least, if I have one SSD handle WAL and DB for multiple OSDs,
losing that SSD means losing multiple OSDs. How do people deal with this
risk? Is it generally deemed acceptable or is this something people tend to
mitigate and if so how? Do I run multiple SSDs in RAID?
> I do realize that for some of these, there
might not be the one perfect
answer that fits all use cases. I am looking for best
practices and in
general just trying to avoid any obvious mistakes.
Any
advice is much appreciated.
Sincerely
Niklaus Hofer
--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io <mailto:
ceph-users-leave(a)ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io