From different search results I read, disabling cephx can help.
Also
https://static.linaro.org/connect/san19/presentations/san19-120.pdf
recommended some settings changes for the bluestore cache.
[osd]
bluestore cache autotune = 0
bluestore_cache_kv_ratio = 0.2
bluestore_cache_meta_ratio = 0.8
bluestore rocksdb options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,write_buffer_size=64M,compaction_readahead_size=2M
bluestore_cache_size_hdd = 536870912 # This is size of the Cache on the HDD
osd_min_pg_log_entries = 10
osd_max_pg_log_entries = 10
osd_pg_log_dups_tracked = 10
osd_pg_log_trim_min = 10
But nothing changed much.
It looks like it is mostly with the small files, when I tested the same
with 128k or even 64k block size, the results were much better.
Any suggestions?
Thanks and Regards,
Athreya
On Tue, Nov 10, 2020 at 8:51 PM <athreyavc(a)gmail.com> wrote:
Hi,
We have recently deployed a Ceph cluster with
12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8
We are using Ceph Octopus and we are using RBD block devices.
We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8)
across which RBDs are mapped and mounted, 25 RBDs each on each client node.
Each RBD size is 10TB. Each RBD is formatted as EXT4 file system.
From network side, we have 10Gbps Active/Passive Bond on all the Ceph
cluster nodes, including the clients. Jumbo frames enabled and MTU is 9000
This is a new cluster and cluster health reports Ok. But we see high IO
wait during the writes.
From one of the clients,
15:14:30 CPU %user %nice %system %iowait %steal
%idle
15:14:31 all 0.06 0.00 1.00 45.03 0.00
53.91
15:14:32 all 0.06 0.00 0.94 41.28 0.00
57.72
15:14:33 all 0.06 0.00 1.25 45.78 0.00
52.91
15:14:34 all 0.00 0.00 1.06 40.07 0.00
58.86
15:14:35 all 0.19 0.00 1.38 41.04 0.00
57.39
Average: all 0.08 0.00 1.13 42.64 0.00
56.16
and the system load shows very high
top - 15:19:15 up 34 days, 41 min, 2 users, load average: 13.49, 13.62,
13.83
From 'atop'
one of the CPUs shows this
CPU | sys 7% | user 1% | irq 2% | idle 1394% | wait
195% | steal 0% | guest 0% | ipc initial | cycl initial |
curf 806MHz | curscal ?%
On the OSD nodes, don't see much %utilization of the disks.
RBD caching values are default.
Are we overlooking some configuration item ?
Thanks and Regards,
At
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io