Hi Brent,
1. If we do a jbod setup, the servers can hold 48
NVMes, if the servers
were bought with 48 cores and 100+ GB of RAM, would this make sense?
Do you seriously mean 48 NVMes per server? How would you even come remotely close to
supporting them with connection (to board) and network bandwidth?
Regarding some of your points there are some valuable comments by Mark Nelson in the
archives, hopefully he is okay with me quoting them here, but of course better look them
up in the archives for full context.
RAM with NVMe OSDs:
So basically the answer is that how much memory you
need depends largely
on how much you care about performance, how many objects are present on
an OSD, and how many objects (and how much data) you have in your active
data set. 4GB is sort of our current default memory target per OSD, but
as someone else mentioned bumping that up to 8-12GB per OSD might make
sense for OSDs on large NVMe drives. You can also lower that down to
about 2GB before you start having real issues, but it definitely can
have an impact on OSD performance.
CPUs with NVMe OSDs:
With 10 NVMe drives per node, I'm guessing that a
single EPYC 7451 is
going to be CPU bound for small IO workloads (2.4c/4.8t per OSD), but
will be network bound for large IO workloads unless you are sticking
2x100GbE in. You might want to consider jumping up to the 7601. That
would get you closer to where you want to be for 10 NVMe drives
(3.2c/6.4t per OSD).
Vitaly Filipov (I think) has also compiled interesting information in his wiki here:
https://yourcmc.ru/wiki/Ceph_performance
Best Greetings
André
----- Am 23. Sep 2020 um 7:39 schrieb Brent Kennedy bkennedy(a)cfl.rr.com:
> We currently run a SSD cluster and HDD clusters and are looking at possibly
> creating a cluster for NVMe storage. For spinners and SSDs, it seemed the
> max recommended per osd host server was 16 OSDs ( I know it depends on the
> CPUs and RAM, like 1 cpu core and 2GB memory ).
>
>
>
> Questions:
1. If we do a jbod setup, the servers can hold 48
NVMes, if the servers
were bought with 48 cores and 100+ GB of RAM, would this make sense?
>
> 2. Should we just raid 5 by groups of NVMe drives instead ( and buy less
> CPU/RAM )? There is a reluctance to waste even a single drive on raid
> because redundancy is basically cephs job.
> 3. The plan was to build this with octopus ( hopefully there are no issues
> we should know about ). Though I just saw one posted today, but this is a
> few months off.
>
> 4. Any feedback on max OSDs?
>
> 5. Right now they run 10Gb everywhere with 80Gb uplinks, I was thinking
> this would need at least 40Gb links to every node ( the hope is to use these
> to speed up image processing at the application layer locally in the DC ).
> I haven't spoken to the Dell engineers yet but my concern with NVMe is that
> the raid controller would end up being the bottleneck ( next in line after
> network connectivity ).
--
Dipl.-Inf. André Gemünd, Leiter IT / Head of IT
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemuend(a)scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend