If you look at your used bytes its only 2.5GB on the NVME due to the way RocksDB works
with levels it cant store the next level on the NVME as the levels increase by a factor of
10.
You need to look at the slow_used_bytes which says that you have used 4GB of space on your
HDD. The other number is the space it has currently reserved.
Increasing the RocksDB and WAL to around 70GB per disk would allow that all to go on the
NVME.
Darren
On 26/09/2019, 15:39, "jvsoares(a)binario.cloud" <jvsoares(a)binario.cloud>
wrote:
I see...
That is the info of one of my spinning disks:
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 30169620480,
"db_used_bytes": 2517630976,
"wal_total_bytes": 1073737728,
"wal_used_bytes": 524288000,
"slow_total_bytes": 400033841152,
"slow_used_bytes": 3996123136,
"num_files": 583,
"log_bytes": 7798784,
"log_compactions": 39,
"logged_bytes": 786444288,
"files_written_wal": 2,
"files_written_sst": 48742,
"bytes_written_wal": 2410376267722,
"bytes_written_sst": 2620043565235
},
On my solid disks, I don't have any issues of slow_
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 153631064064,
"db_used_bytes": 6822035456,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 250,
"log_bytes": 16420864,
"log_compactions": 511,
"logged_bytes": 9285406720,
"files_written_wal": 2,
"files_written_sst": 79316,
"bytes_written_wal": 4393750671932,
"bytes_written_sst": 4626359292945
},
In my understanding, the solid disks manage the db automatically, so it was reserved
about 150GB of 3.84TB total disk.
My spinning disk is showing about 384GB of slow_total_bytes, while I only have
dedicated 33GB of nvme disk for each raw disk.
Well, I believe I have to think about increasing my db partitions.
Thank you for your feedback, Darren!
Joao Victor R Soares
Darren Soothill wrote:
HI Joao,
You can see how much RocksDB space has been used with this command “ceph daemon osd.X
perf
dump” Where X is an OSD id on the node you are running the command on.
You are looking for this section in the output :-
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 23966253056,
"db_used_bytes": 1714421760,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 24,
"log_bytes": 552120320,
"log_compactions": 0,
"logged_bytes": 537051136,
"files_written_wal": 1,
"files_written_sst": 11,
"bytes_written_wal": 429315193,
"bytes_written_sst": 601384180,
"bytes_written_slow": 0,
"max_bytes_wal": 0,
"max_bytes_db": 1714421760,
"max_bytes_slow": 0
},
If you have numbers in the slow_ entries then your RocksDB is spilling over onto the
HDD.
As to if moving RocksDb and WAL on HDD can cause a performance degradation then it
depends
how busy your disks are. If you HDD’s are working hard and you are now going to throw a
lot more workload onto them then performance will degrade. Could be substantially. I
have
seen performance impacts of upto 75% when things have started spilling over from NVME
to
HDD.
By that I mean I had a lovely flat line ingesting objects and that line suddenly
dropped
by 75% once the RocksDB had filled up and spilt over onto the HDD.
From: João Victor Rodrigues Soares <jvrs2683(a)gmail.com>
Date: Wednesday, 25 September 2019 at 14:37
To: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io>
Subject: [ceph-users] Slow Write Issues
Hello,
In my company, we currently have the following infrastructure:
- Ceph Luminous
- OpenStack Pike.
We have a cluster of 3 osd nodes with the following configuration:
- 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
- 128GB RAM
- 128GB ROOT DISK
- 12 x 10TB SATA ST10000NM0146 (OSD)
- 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
- Ubuntu 16.04
- 2 X 10Gb network interface configured with lacp
The compute nodes have
- 4 x 10Gb network interfaces with lacp.
We also have 4 monitors with:
- 4 x 10Gb lacp network interfaces.
- The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM
For each OSD disk we have created a partition of 33GB to block.db and block.wal.
We are recently facing a number of performance issues. Virtual machines created in
OpenStack are experiencing slow writing issues (approx. 50MB / s).
The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
The memory consumption is around 30% consumption.
We have no latency issues (9ms average)
My question is if what is happening may have to do with the amount of disk dedicated to
DB
/ WAL. In the CEPH documentation it says it is recommended that the block.db size is
not
smaller than 4% of block.
In this case for each disk in my environment block.db could not be less than 400GB /
OSD.
Another question is if I set my disks to use block.db / block.wal on the mechanical
disks
themselves, if that could lead to a performance degradation.
Att.
João Victor Rodrigues Soares
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io