[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

10 Aug 2020

Yeah, I know various folks have adopted those settings, though I'm not 
convinced they are better than our defaults.  Basically you have more 
smaller buffers and start compacting sooner and theoretically should 
have a more gradual throttle along with a bunch of changes to 
compaction, but every time I've tried a setup like that I see more write 
amplification in L0 presumably due to a larger number of pglog entries 
not being tomstoned before hitting it (at least on our systems it's not 
faster at this time, and imposes more wear on DB device).  I suspect 
something closer to those settings will be better though if we can 
change the pglog to create/delete new kv pairs for every pglog entry.

In any event, that's good to know about compaction not being involved.  
I think this may be a case where the double-caching fix might help 
significantly if we stop thrashing the rocksdb block cache: 
https://github.com/ceph/ceph/pull/27705

Mark

On 8/10/20 2:28 AM, Manuel Lausch wrote:
...
  Hi Mark,

 rocskdb compactions was one of my first ideas as well. But they don't
 correlate. I checkt this with the ceph_rocskdb_log_parser.py from
 https://github.com/ceph/cbt.git
 I saw only a few compactions on the whole cluster. It didn't seem to be
 the problem, although the compactions sometimes took several seconds.

 BTW: I configured the following rocksdb options.
    bluestore rocksdb options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB

 This reduced some IO spikes but the slowops isse while snaptim was not
 affected by this.

 Manuel

 On Fri, 7 Aug 2020 09:43:51 -0500
 Mark Nelson &lt;mnelson(a)redhat.com&gt; wrote:

  That is super interesting regarding scrubbing.  I
would have expected
 that to be affected as well.  Any  chance you can check and see if
 there is any correlation between rocksdb compaction events, snap
 trimming, and increased disk reads?  Also (Sorry if you already
 answered this) do we know for sure that it's hitting the
 block.db/block.wal device?  I suspect it is, just wanted to verify.

 Mark

  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10