Thanks for your useful information.
Can you please also point to the kernel and disk configuration that are
still valid for bluestore or not? I mean the read_ahead_kb and disk
scheduler.
Thanks.
On Tue, Nov 3, 2020 at 10:55 PM Alexander E. Patrakov <patrakov(a)gmail.com>
wrote:
On Tue, Nov 3, 2020 at 6:30 AM Seena Fallah
<seenafallah(a)gmail.com> wrote:
Hi all,
Does this guid is still valid for a bluestore deployment with nautilus or
octopus?
https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
Some of the guidance is of course outdated.
E.g., at the time of that writing, 1x 40GbE was indeed state of the
art in the networking world, but now 100GbE network cards are
affordable, and with 6 NVMe drives per server, even that might be a
bottleneck if the clients use a large block size (>64KB) and do an
fsync() only at the end.
Regarding NUMA tuning, Ceph made some progress. If it finds that your
NVMe and your network card are on the same NUMA node, then, with
Nautilus or later, the OSD will pin itself to that NUMA node
automatically. I.e.: choose strategically which PCIe slots to use,
maybe use two network cards, and you will not have to do any tuning or
manual pinning.
Partitioning the NVMe was also a popular advice in the past, but now
that there are "osd op num shards" and "osd op num threads per
shard"
parameters, with sensible default values, this is something that tends
not to help.
Filesystem considerations in that document obviously apply only to
Filestore, which is something you should not use.
Large PG number per OSD helps more uniform data distribution, but
actually hurts performance a little bit.
The advice regarding the "performance" cpufreq governor is valid, but
you might also look at (i.e. benchmark for your workload specifically)
disabling the deepest idle states.
--
Alexander E. Patrakov
CV:
http://pc.cd/PLz7