On Sat, 26 Jun 2021 10:06:10 -0700
Anthony D'Atri <anthony.datri(a)gmail.com> wrote:
A handful of years back WD Labs did their
“microserver” project, a
cluster of 504 drives with an onboard ARM CPU and 1GB of RAM, 8TB
HDDs I think. But yeah that most likely was Filestore.
At a Ceph Day in Hillsboro someone, forgive me for not remembering
who, spoke of running production on servers with 2GB RAM per OSD. He
said that it was painful, required a lot of work, and would not
recommend it. ymmv.
Yeah, I wouldn't want to go below 4GB RAM.
> - In my
experience, it performs poorly on HDD-based clusters with a
> small number of disks
Don’t HDD clusters with a small number of disks *always* perform poorly?
Originally when I deployed my 3-node cluster, I was getting comparable
performance to Microsoft Azure's cheaper offerings. (Not a glowing
endorcement of their cloud I might add, but it was quite acceptable.)
> Also,
only one Ethernet port.
Worse yet they have *zero* HIPPI ports! Can you imagine!?
Never used HIPPI.
A 48-port gigabit managed switch is reasonably accessible to the home
gamer, both in terms of availability and cost.
Second-hand 10GbE switches can be found for reasonable prices, but a
new one is pricey! Too expensive for my liking.
> - Intel
NUCs and similar machines can do Ceph work, but only one
> Ethernet port is a limitation.
Why the fixation on multiple network interfaces?
… because Ceph needs one interface for the "public" network and one for
the "private" network? Plus, 802.3AD helps.
> (Plus the
need to use a console
> to manage them instead of using a BMC with a server board or
> a multiplexed serial console is a nuisance.)
Not all of us using Ceph are big corporates with deep pockets. BMCs have an incremental
cost.
Truth be told, I'd like to ditch the BMCs, but most BIOSes have a
fixation of needing a monitor and keyboard to configure them. CoreBoot
has the right idea, but isn't widely available on kit accessible to the
home experimenter.
> Not all of
us using Ceph are big corporates with deep pockets.
I’ve heard that!
In all seriousness, these aren’t limitations for a PoC cluster, but
then functional PoCs don’t need BMCs and are easy to deploy on VMs.
For production I wouldn’t think that there would be a lot of good
use-cases for a small number of SBC nodes — and that some sort of
RAID solution is often a better fit. There’s also lots of used gear
available. For small scale clusters with modest performance needs,
this should be a viable alternative. I’ve seen any number of folks
in that situation. Donated / abandoned / repurposed hardware.
Well, my use case is a small-scale cluster in a SOHO-type environment.
It started out as a project at my workplace to investigate how to set
up a small private cloud arrangement, then I replicated the set-up at
home to better explore the options with a view of applying what I had
learned to the cluster at my workplace.
So, not production in the sense a business runs on it, but my mail
server and numerous other workloads do run from this cluster.
A nice feature over a RAID system is that I can bring one node down for
maintenance, and still be "online", albeit with degraded performance.
FWIW, you can
lower both the osd_memory_target and tweak a couple
of other settings that will lower bluestore memory usage. A 2GB
target is about the lowest you can reasonably set it to (and you'll
likely hurt performance due to cache misses),
Indeed, though assuming that we’re talking small clusters with small
drives, one can set the OSD max low to reduce map size, various
related tunings, provision a small number of PGs, etc, which I would
think would help. Blacklist unneeded kernel modules? Disable
nf_conntrack with extreme prejudice?
but saying you need a host with 8+GB of RAM is
probably a little
excessive.
Especially for a single OSD.
In this case, 3 of my nodes are running two OSDs: Samsung SSD 860 2TB
(Bluestore) and WDC WD20SPZX-00U (Filestore). Built on these boards:
https://www.supermicro.com/products/motherboard/atom/X10/A1SAi-2750F.cfm
and mounted up in a DIN-rail mounted case. (Presently with OS running
off a USB-3.0 external drive.)
I added two more to give me breathing room when re-deploying nodes (in
particular, going from Filestore on BTRFS to Bluestore, then back to
Filestore on XFS), these just have one WDC WD20SPZX-00U each (also
Filestore). These were built on Intel NUCs because I needed them in a
hurry and I had some DDR4 SO-DIMMs that I bought by mistake.
So that's 5 WDC WD20SPZX-00U OSDs and 3 Samsung SSD 860 OSDs.
I'm looking to move out of the DIN-rail cases as it looks like I'm
out-growing them, so maybe in the future I might replace these with
3.5" drives, but right now this is what I have.
Filestore may not set the world on fire, and may be worse off in bigger
deployments, but it works in the smaller ones really well from what
I've seen.
--
Stuart Longland (aka Redhatter, VK4MSL)
I haven't lost my mind...
...it's backed up on a tape somewhere.