Hi Mark,
rocskdb compactions was one of my first ideas as well. But they don't
correlate. I checkt this with the ceph_rocskdb_log_parser.py from
https://github.com/ceph/cbt.git
I saw only a few compactions on the whole cluster. It didn't seem to be
the problem, although the compactions sometimes took several seconds.
BTW: I configured the following rocksdb options.
bluestore rocksdb options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
This reduced some IO spikes but the slowops isse while snaptim was not
affected by this.
Manuel
On Fri, 7 Aug 2020 09:43:51 -0500
Mark Nelson <mnelson(a)redhat.com> wrote:
That is super interesting regarding scrubbing. I
would have expected
that to be affected as well. Any chance you can check and see if
there is any correlation between rocksdb compaction events, snap
trimming, and increased disk reads? Also (Sorry if you already
answered this) do we know for sure that it's hitting the
block.db/block.wal device? I suspect it is, just wanted to verify.
Mark