Hi
We currently run a SSD cluster and HDD clusters and
are looking at possibly
creating a cluster for NVMe storage. For spinners and SSDs, it seemed the
max recommended per osd host server was 16 OSDs ( I know it depends on the
CPUs and RAM, like 1 cpu core and 2GB memory ).
What do you want to achieve?
NVMes aren't much better than good SATA SSDs in Ceph for random workloads. They're
only better for linear workloads.
Questions:
1. If we do a jbod setup, the servers can hold 48 NVMes, if the servers
were bought with 48 cores and 100+ GB of RAM, would this make sense?
In my opinion, no. 1 NVMe OSD = 3-6 CPU cores in Ceph. And JBODs of multiple NVMe
drives... O_o why would you want that? You won't get NVMe speeds from such JBODs.
2. Should we just raid 5 by groups of NVMe drives
instead ( and buy less
CPU/RAM )? There is a reluctance to waste even a single drive on raid
because redundancy is basically cephs job.
Ceph over RAID 5 is a bad idea.
3. The plan was to build this with octopus ( hopefully
there are no issues
we should know about ). Though I just saw one posted today, but this is a
few months off.
4. Any feedback on max OSDs?
min(CPU cores / 3..6, network bandwidth / 1..2 gigabyte per second, free PCIe lanes / 4,
free memory / at least 4 GB)
Otherwise you have a bottleneck and you're wasting money. :-)
Also I think you should forget about NVMe RAID controllers. 1 NVMe = 4 PCIe lanes,
that's all.