HI Joao,

 

You can see how much RocksDB space has been used with this command “ceph daemon osd.X perf dump” Where X is an OSD id on the node you are running the command on.

 

You are looking for this section in the output :-

    "bluefs": {

        "gift_bytes": 0,

        "reclaim_bytes": 0,

        "db_total_bytes": 23966253056,

        "db_used_bytes": 1714421760,

        "wal_total_bytes": 0,

        "wal_used_bytes": 0,

        "slow_total_bytes": 0,

        "slow_used_bytes": 0,

        "num_files": 24,

        "log_bytes": 552120320,

        "log_compactions": 0,

        "logged_bytes": 537051136,

        "files_written_wal": 1,

        "files_written_sst": 11,

        "bytes_written_wal": 429315193,

        "bytes_written_sst": 601384180,

        "bytes_written_slow": 0,

        "max_bytes_wal": 0,

        "max_bytes_db": 1714421760,

        "max_bytes_slow": 0

    },

 

If you have numbers in the slow_ entries then your RocksDB is spilling over onto the HDD.

 

As to if moving RocksDb and WAL on HDD can cause a performance degradation then it depends how busy your disks are. If you HDD’s are working hard and you are now going to throw a lot more workload onto them then performance will degrade. Could be substantially. I have seen performance impacts of upto 75% when things have started spilling over from NVME to HDD.

By that I mean I had a lovely flat line ingesting objects and that line suddenly dropped by 75% once the RocksDB had filled up and spilt over onto the HDD.

 

 

 

 

From: João Victor Rodrigues Soares <jvrs2683@gmail.com>
Date: Wednesday, 25 September 2019 at 14:37
To: "ceph-users@ceph.io" <ceph-users@ceph.io>
Subject: [ceph-users] Slow Write Issues

 

Hello,

In my company, we currently have the following infrastructure:

- Ceph Luminous
- OpenStack Pike.

We have a cluster of 3 osd nodes with the following configuration:

- 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
- 128GB RAM
- 128GB ROOT DISK
- 12 x 10TB SATA ST10000NM0146 (OSD)
- 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
- Ubuntu 16.04
- 2 X 10Gb network interface configured with lacp


The compute nodes have
- 4 x 10Gb network interfaces with lacp.

We also have 4 monitors with:
- 4 x 10Gb lacp network interfaces.
- The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM

For each OSD disk we have created a partition of 33GB to block.db and block.wal.

We are recently facing a number of performance issues. Virtual machines created in OpenStack are experiencing slow writing issues (approx. 50MB / s).

The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
The memory consumption is around 30% consumption.
We have no latency issues (9ms average)

My question is if what is happening may have to do with the amount of disk dedicated to DB / WAL. In the CEPH documentation it says it is recommended that the block.db size is not smaller than 4% of block.

In this case for each disk in my environment block.db could not be less than 400GB / OSD.

Another question is if I set my disks to use block.db / block.wal on the mechanical disks themselves, if that could lead to a performance degradation.

 

Att.

João Victor Rodrigues Soares