> - Bluestore requires OSD hosts with 8GB+ of RAM
With Filestore I found that in production I needed to raise vm.min_free_kbytes, though
inheriting the terrible mistake of -n size=65536 didn’t help.
A handful of years back WD Labs did their “microserver” project, a cluster of 504 drives
with an onboard ARM CPU and 1GB of RAM, 8TB HDDs I think. But yeah that most likely was
Filestore.
At a Ceph Day in Hillsboro someone, forgive me for not remembering who, spoke of running
production on servers with 2GB RAM per OSD. He said that it was painful, required a lot
of work, and would not recommend it. ymmv.
A few years more recently I suffered a protracted cluster outage, that cascaded with
OOMkiller gleefully reaping OSDs as their memory footprints expanded trying to peer and
recover. Mixed Filestore and BlueStore. The Filestore OSDs were much more impacted than
the BlueStore … because of the memory target.
> - In my experience, it performs poorly on
HDD-based clusters with a
> small number of disks
Don’t HDD clusters with a small number of disks *always* perform poorly?
> Also, only one Ethernet port.
Worse yet they have *zero* HIPPI ports! Can you imagine!?
> - Intel NUCs and similar machines can do Ceph
work, but only one
> Ethernet port is a limitation.
Why the fixation on multiple network interfaces?
> (Plus the need to use a console
> to manage them instead of using a BMC with a server board or
> a multiplexed serial console is a nuisance.)
Not all of us using Ceph are big corporates with deep pockets. BMCs have an incremental
cost.
> Not all of us using Ceph are big corporates with
deep pockets.
I’ve heard that!
In all seriousness, these aren’t limitations for a PoC cluster, but then functional PoCs
don’t need BMCs and are easy to deploy on VMs. For production I wouldn’t think that there
would be a lot of good use-cases for a small number of SBC nodes — and that some sort of
RAID solution is often a better fit. There’s also lots of used gear available. For small
scale clusters with modest performance needs, this should be a viable alternative. I’ve
seen any number of folks in that situation. Donated / abandoned / repurposed hardware.
FWIW, you can lower both the osd_memory_target and
tweak a couple of other settings that will lower bluestore memory usage. A 2GB target is
about the lowest you can reasonably set it to (and you'll likely hurt performance due
to cache misses),
Indeed, though assuming that we’re talking small clusters with small drives, one can set
the OSD max low to reduce map size, various related tunings, provision a small number of
PGs, etc, which I would think would help. Blacklist unneeded kernel modules? Disable
nf_conntrack with extreme prejudice?
but saying you need a host with 8+GB of RAM is
probably a little excessive.
Especially for a single OSD.
There's also a good chance that filestore memory
usage isn't as consistently low as you think it is.
So, so true. See above. I’ve also seen ceph-mgr balloon randomly like mad, but that’s a
tangent.
Yes you can avoid the in-memory caches that bluestore
has since filestore relies more heavily on page cache, but things like osdmap, pglog, and
various other buffers are still going to use memory in filestore just like bluestore. You
might find yourself working fine 99% of the time and then going OOM during recovery or
something if you try to deploy filestore on a low memory SBC.
Or when something fails, or a customer issues a few thousand snap trims at the same time
:-x