DB sizing for lots of large files - ceph-users

26 Nov 2020

Hi,

Sorry to bother you all.

It’s a home server setup.

Three nodes (ODROID-H2+ with 32GB RAM and dual 2.5Gbit NICs), two 14TB
7200rpm SATA drives and an Optane 118GB NVMe in each node (OS boots from
eMMC).

Only CephFS, I'm anticipating having 50-200K files when the 50TB (4+2 EC)
is full.

I'm trying to address the issue of really big OSD's, not my words, a
Redditor:

"When you write an object to a drive with collocated db and raw space, the
disk has to read/write to both sections before acking a write. That's a lot
to ask a 7200 disk to handle gracefully. I believe Red Hat only supports up
to 8TB because of performance concerns with larger disks. I may be wrong,
but once you are shuffling through 6-10TB of data I'd think those disks are
gonna be bogged down in seek time."

So I want to have my two DB's on the Optane to avoid the above, am I making
sense?

OK, so large files have lower metadata overhead than small files. This is
for a media library this probably means super low overhead, one guy I spoke
to had similar setup and for 48TB used he had a 2.6GB DB?

Is there a rough CephFS calculation (each file uses x bytes of metadata), I
think I should be safe with 30GB, now I read I should double that (you
should allocate twice the size of the biggest layer to allow for
compaction) but I only have 118GB and two OSDs so I will have to go for
59GB (or whatever will fit)?

I'm thinking that I might not even use 3GB but to be safe I’ll make it
30GB, lets say it settles at 5GB, when compaction comes that means I will
only need 20GB and therefore no spillover?

I realise that if the Optane dies both OSD's go with it, do I have to
configure anything special there (CRUSH would just handle it)?

Being that this a home deployment will Ceph be OK with an occasional power
outage, I mean if a bluray mkv gets corrupted ripping it again is an easy
fix, also I will backup the clusters data once a month?

Below are the commands I got from the website:

Create the volume groups:
$ vgcreate ceph-block-0 /dev/sda
$ vgcreate ceph-block-1 /dev/sdb

Create the logical volumes:
$ lvcreate -l 100%FREE -n block-0 ceph-block-0
$ lvcreate -l 100%FREE -n block-1 ceph-block-1

Create db logical volumes (118GB Optane)
$ vgcreate ceph-db-0 /dev/sdc
$ lvcreate -L 59GB -n db-0 ceph-db-0
$ lvcreate -L 59GB -n db-1 ceph-db-0

Create the OSDs
$ ceph-volume lvm create --bluestore --data ceph-block-0/block-0 --block.db
ceph-db-0/db-0
$ ceph-volume lvm create --bluestore --data ceph-block-1/block-1 --block.db
ceph-db-0/db-1

Thanks.
Richard