Hello,

In my company, we currently have the following infrastructure:

- Ceph Luminous
- OpenStack Pike.

We have a cluster of 3 osd nodes with the following configuration:

- 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
- 128GB RAM
- 128GB ROOT DISK
- 12 x 10TB SATA ST10000NM0146 (OSD)
- 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
- Ubuntu 16.04
- 2 X 10Gb network interface configured with lacp

The compute nodes have
- 4 x 10Gb network interfaces with lacp.

We also have 4 monitors with:
- 4 x 10Gb lacp network interfaces.
- The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM

For each OSD disk we have created a partition of 33GB to block.db and block.wal.

We are recently facing a number of performance issues. Virtual machines created in OpenStack are experiencing slow writing issues (approx. 50MB / s).

The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
The memory consumption is around 30% consumption.
We have no latency issues (9ms average)

My question is if what is happening may have to do with the amount of disk dedicated to DB / WAL. In the CEPH documentation it says it is recommended that the block.db size is not smaller than 4% of block.

In this case for each disk in my environment block.db could not be less than 400GB / OSD.

Another question is if I set my disks to use block.db / block.wal on the mechanical disks themselves, if that could lead to a performance degradation.

Att.

João Victor Rodrigues Soares