HI Joao,
You can see how much RocksDB space has been used with this command “ceph daemon osd.X perf
dump” Where X is an OSD id on the node you are running the command on.
You are looking for this section in the output :-
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 23966253056,
"db_used_bytes": 1714421760,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 24,
"log_bytes": 552120320,
"log_compactions": 0,
"logged_bytes": 537051136,
"files_written_wal": 1,
"files_written_sst": 11,
"bytes_written_wal": 429315193,
"bytes_written_sst": 601384180,
"bytes_written_slow": 0,
"max_bytes_wal": 0,
"max_bytes_db": 1714421760,
"max_bytes_slow": 0
},
If you have numbers in the slow_ entries then your RocksDB is spilling over onto the
HDD.
As to if moving RocksDb and WAL on HDD can cause a performance degradation then it depends
how busy your disks are. If you HDD’s are working hard and you are now going to throw a
lot more workload onto them then performance will degrade. Could be substantially. I have
seen performance impacts of upto 75% when things have started spilling over from NVME to
HDD.
By that I mean I had a lovely flat line ingesting objects and that line suddenly dropped
by 75% once the RocksDB had filled up and spilt over onto the HDD.
From: João Victor Rodrigues Soares <jvrs2683(a)gmail.com>
Date: Wednesday, 25 September 2019 at 14:37
To: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io>
Subject: [ceph-users] Slow Write Issues
Hello,
In my company, we currently have the following infrastructure:
- Ceph Luminous
- OpenStack Pike.
We have a cluster of 3 osd nodes with the following configuration:
- 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
- 128GB RAM
- 128GB ROOT DISK
- 12 x 10TB SATA ST10000NM0146 (OSD)
- 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
- Ubuntu 16.04
- 2 X 10Gb network interface configured with lacp
The compute nodes have
- 4 x 10Gb network interfaces with lacp.
We also have 4 monitors with:
- 4 x 10Gb lacp network interfaces.
- The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM
For each OSD disk we have created a partition of 33GB to block.db and block.wal.
We are recently facing a number of performance issues. Virtual machines created in
OpenStack are experiencing slow writing issues (approx. 50MB / s).
The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
The memory consumption is around 30% consumption.
We have no latency issues (9ms average)
My question is if what is happening may have to do with the amount of disk dedicated to DB
/ WAL. In the CEPH documentation it says it is recommended that the block.db size is not
smaller than 4% of block.
In this case for each disk in my environment block.db could not be less than 400GB /
OSD.
Another question is if I set my disks to use block.db / block.wal on the mechanical disks
themselves, if that could lead to a performance degradation.
Att.
João Victor Rodrigues Soares