I see...
That is the info of one of my spinning disks:
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 30169620480,
"db_used_bytes": 2517630976,
"wal_total_bytes": 1073737728,
"wal_used_bytes": 524288000,
"slow_total_bytes": 400033841152,
"slow_used_bytes": 3996123136,
"num_files": 583,
"log_bytes": 7798784,
"log_compactions": 39,
"logged_bytes": 786444288,
"files_written_wal": 2,
"files_written_sst": 48742,
"bytes_written_wal": 2410376267722,
"bytes_written_sst": 2620043565235
},
On my solid disks, I don't have any issues of slow_
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 153631064064,
"db_used_bytes": 6822035456,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 250,
"log_bytes": 16420864,
"log_compactions": 511,
"logged_bytes": 9285406720,
"files_written_wal": 2,
"files_written_sst": 79316,
"bytes_written_wal": 4393750671932,
"bytes_written_sst": 4626359292945
},
In my understanding, the solid disks manage the db automatically, so it was reserved about
150GB of 3.84TB total disk.
My spinning disk is showing about 384GB of slow_total_bytes, while I only have dedicated
33GB of nvme disk for each raw disk.
Well, I believe I have to think about increasing my db partitions.
Thank you for your feedback, Darren!
Joao Victor R Soares
Darren Soothill wrote:
> HI Joao,
>
> You can see how much RocksDB space has been used with this command “ceph daemon osd.X
perf
> dump” Where X is an OSD id on the node you are running the command on.
>
> You are looking for this section in the output :-
> "bluefs": {
> "gift_bytes": 0,
> "reclaim_bytes": 0,
> "db_total_bytes": 23966253056,
> "db_used_bytes": 1714421760,
> "wal_total_bytes": 0,
> "wal_used_bytes": 0,
> "slow_total_bytes": 0,
> "slow_used_bytes": 0,
> "num_files": 24,
> "log_bytes": 552120320,
> "log_compactions": 0,
> "logged_bytes": 537051136,
> "files_written_wal": 1,
> "files_written_sst": 11,
> "bytes_written_wal": 429315193,
> "bytes_written_sst": 601384180,
> "bytes_written_slow": 0,
> "max_bytes_wal": 0,
> "max_bytes_db": 1714421760,
> "max_bytes_slow": 0
> },
>
> If you have numbers in the slow_ entries then your RocksDB is spilling over onto the
> HDD.
>
> As to if moving RocksDb and WAL on HDD can cause a performance degradation then it
depends
> how busy your disks are. If you HDD’s are working hard and you are now going to throw
a
> lot more workload onto them then performance will degrade. Could be substantially. I
have
> seen performance impacts of upto 75% when things have started spilling over from NVME
to
> HDD.
> By that I mean I had a lovely flat line ingesting objects and that line suddenly
dropped
> by 75% once the RocksDB had filled up and spilt over onto the HDD.
>
>
>
>
> From: João Victor Rodrigues Soares <jvrs2683(a)gmail.com>
> Date: Wednesday, 25 September 2019 at 14:37
> To: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io>
> Subject: [ceph-users] Slow Write Issues
>
> Hello,
>
> In my company, we currently have the following infrastructure:
>
> - Ceph Luminous
> - OpenStack Pike.
>
> We have a cluster of 3 osd nodes with the following configuration:
>
> - 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
> - 128GB RAM
> - 128GB ROOT DISK
> - 12 x 10TB SATA ST10000NM0146 (OSD)
> - 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
> - Ubuntu 16.04
> - 2 X 10Gb network interface configured with lacp
>
>
> The compute nodes have
> - 4 x 10Gb network interfaces with lacp.
>
> We also have 4 monitors with:
> - 4 x 10Gb lacp network interfaces.
> - The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM
>
> For each OSD disk we have created a partition of 33GB to block.db and block.wal.
>
> We are recently facing a number of performance issues. Virtual machines created in
> OpenStack are experiencing slow writing issues (approx. 50MB / s).
>
> The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle
time.
> The memory consumption is around 30% consumption.
> We have no latency issues (9ms average)
>
> My question is if what is happening may have to do with the amount of disk dedicated
to DB
> / WAL. In the CEPH documentation it says it is recommended that the block.db size is
not
> smaller than 4% of block.
>
> In this case for each disk in my environment block.db could not be less than 400GB /
> OSD.
>
> Another question is if I set my disks to use block.db / block.wal on the mechanical
disks
> themselves, if that could lead to a performance degradation.
>
> Att.
> João Victor Rodrigues Soares