On 9/23/19 9:38 AM, Robert LeBlanc wrote:
On Wed, Sep 18, 2019 at 11:47 AM Shawn A Kwang
<kwangs(a)uwm.edu> wrote:
We are planning our ceph architecture and I have a question:
How should NVMe drives be used when our spinning storage devices use
bluestore:
1. block WAL and DB partitions
(
https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-re…)
2. Cache tier
(
https://docs.ceph.com/docs/nautilus/rados/operations/cache-tiering/)
3. Something else?
Hardware- Each node has:
3x 8 TB HDD
1x 450 GB NVMe drive
192 GB RAM
2x Xeon CPUs (24 cores total)
I plan to have three OSD daemons running on the node. There are 95 nodes
total with the same hardware.
Use Case:
The plan is create cephfs and use this filesystem to store people's home
directories and data. I anticipate more read operations than writes.
Regarding cache tiering: The online documentation says cache tiering
will often degrade performance. But when I read various threads on this
ML there do seem to be people using cache tiering with success. I do see
that it is heavily dependent upon one's use-case. In 2019 is there any
updated recommendations as to whether to use cache tiering?
If there is a third suggestion that people have I would be interested in
hearing it. Thanks in advance.
I've had good success when I've been able to hold all the 'hot' data
for 24 hours in a cache tier. That reduces the amount of data being
evicted from the tier and being added to the tier such that you reduce
the penalty from those operations. You can adjust the config (hit
rate, etc) to help reduce promotions for rarely accessed objects. The
size of the NVMe drive may best be suited for WAL (I highly recommend
that for any HDD install) for each OSD, then carve out the rest as an
SSD pool that you can put the CephFS metadata pool on. I don't think
you would have a good experience with cache tier at that size.
However, you know your access patterns far better than I do and it may
be a good fit.
Robert,
I like your idea of partitioning each SSD for bluestore's DB [1], and
then extra space for the cephFS metadata pool.
[1] Question: You wrote 'WAL', but did you mean block.wal or block.db?
Or both?
Sincerely,
Shawn
--
Associate Scientist
Center for Gravitation, Cosmology, and Astrophysics
University of Wisconsin-Milwaukee
office: +1 414 229 4960
kwangs(a)uwm.edu