[ceph-users] NVMe's

23 Sep 2020

We currently run a SSD cluster and HDD clusters and are looking at possibly
creating a cluster for NVMe storage.  For spinners and SSDs, it seemed the
max recommended per osd host server was 16 OSDs ( I know it depends on the
CPUs and RAM, like 1 cpu core and 2GB memory ).  

Questions: 
1.  If we do a jbod setup, the servers can hold 48 NVMes, if the servers
were bought with 48 cores and 100+ GB of RAM, would this make sense?  

2.  Should we just raid 5 by groups of NVMe drives instead ( and buy less
CPU/RAM )?  There is a reluctance to waste even a single drive on raid
because redundancy is basically cephs job.
3.  The plan was to build this with octopus ( hopefully there are no issues
we should know about ).  Though I just saw one posted today, but this is a
few months off.  

4.  Any feedback on max OSDs?

5.  Right now they run 10Gb everywhere with 80Gb uplinks, I was thinking
this would need at least 40Gb links to every node ( the hope is to use these
to speed up image processing at the application layer locally in the DC ).
I haven't spoken to the Dell engineers yet but my concern with NVMe is that
the raid controller would end up being the bottleneck ( next in line after
network connectivity ).

Regards,

-Brent

Existing Clusters:

Test: Nautilus 14.2.11 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
gateways ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.11 with 12 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

UK Production(HDD): Nautilus 14.2.11 with 12 osd servers, 3 mons, 4 gateways

US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 3 gateways,
2 iscsi gateways

2024

2023

2022

2021

2020

2019

[ceph-users] NVMe's