Hi,
my Name is Moritz and I am working for a 3D production company. Because of the corona
virus I have too much time left and also to much unused hardware. That is why I started
playing around with Ceph as a fileserver for us. Here I want to share my experience for
all those who are interested. To start of here is my actual running test system. I am
interested in the thoughts of the community and also on more suggestions on what to try
out with my available Hardware. I don’t know how to test it right now because I am a
newbie to ceph and our production file server is a super user-friendly but high
performance Synology NAS 😉. All I have done so far was running Crystal disk benchmark on 1
Windows machine on the SMB Share.
3 Nodes: (original those where render workstations that are not in use right now)
Each Node is MON MGM OSD
Mainboard: ASRock TRX40 Creator
CPU: AMD Ryzen Threadripper 3960X, 24 Cores, 3.8Ghz
RAM: 2 x Samsung 32 GB 2 x 8 DDR4 2666 MHz 288-pin DIMM, Unregistered, ECC (64 GB Total)
NIC Public: OnBoard Aquantia 107, 10Gbit
NIC Ceph: Intel XXV710-DA2, 2x SFP28, 25Gbit
System Drive: 2x Samsung SSD 860 PRO 256GB, SATA, ZFS Raid 1
System: Proxmox VE 6.2, Debian Buster, Ceph Nautilus
HBA: Broadcom SAS 9305-16i
OSDs:
6x Seagate Exos, 16TB, 7.200 rpm, 12Gb SAS
Cache:
1x Micron 9300 MAX 3.2TB U.2 NVME
I Played around with setting it up as a WAL/DB Device. Right now I have configured the
Micron NVME as a BCache Infront of the 6 Seagate Drives in writeback mode.
Because in this configuration BCache takes care of translating random writes to sequential
ones for the HDDs I turned the Ceph WAL LOG off. I think Bcache gives more options to tune
the System for my use case instead of just putting WAL/DB on the NVME. And also I can
easily add cache drives or remove them without touching osds.
I set up SMB Shares with the vfs_ceph module. I still have to add CTDB to distribute Samba
to all nodes.
My Next steps are to keep playing around in tuning the system and testing stability and
performance. After that I want to put the Ceph cluster infront of our production NAS.
Because our data is not super critical I thought of setting the replicas to 2 and running
Rsync overnight to our NAS. That way I can switch to the old NAS at any time and wouldn’t
loose more than 1 Day of work which is acceptable for us. This is how I could compare the
two solutions side by side with real-life workload.
I know that ceph might not be the best solution right now but if I am able to get at least
similar performance to our Synology HDD NAS out of it, it would give a super scalable
Solution in size and performance to grow with our needs. And who knows what performance
improvements we get with ceph in the next 3 years.
I am happy to hear your thoughts and ideas. And please I know this might be kind of a
crazy setup but I have fun with it and I learned a lot the last few weeks. If my
experiment fails I will go back to my original plan: Put FreeNas on two of the Nodes with
overnight replication and put the third Node back to his render-friends. 😃
By the way I also have a spare Dell Server: 2x Xeon E5-2630 v3 2,40GHz, 128G Ram. I just
don’t have an idea on how to utilize it. Maybe as extra OSD Node or as a separate Samba
Server to get the SMB traffic away from the Public Ceph Network.
Moritz Wilhelm
Show replies by date