Yeah, I know various folks have adopted those settings, though I'm not
convinced they are better than our defaults. Basically you have more
smaller buffers and start compacting sooner and theoretically should
have a more gradual throttle along with a bunch of changes to
compaction, but every time I've tried a setup like that I see more write
amplification in L0 presumably due to a larger number of pglog entries
not being tomstoned before hitting it (at least on our systems it's not
faster at this time, and imposes more wear on DB device). I suspect
something closer to those settings will be better though if we can
change the pglog to create/delete new kv pairs for every pglog entry.
In any event, that's good to know about compaction not being involved.
I think this may be a case where the double-caching fix might help
significantly if we stop thrashing the rocksdb block cache:
rocskdb compactions was one of my first ideas as well. But they don't
correlate. I checkt this with the ceph_rocskdb_log_parser.py from
I saw only a few compactions on the whole cluster. It didn't seem to be
the problem, although the compactions sometimes took several seconds.
BTW: I configured the following rocksdb options.
bluestore rocksdb options =
This reduced some IO spikes but the slowops isse while snaptim was not
affected by this.
On Fri, 7 Aug 2020 09:43:51 -0500
Mark Nelson <mnelson(a)redhat.com> wrote:
That is super interesting regarding scrubbing. I
would have expected
that to be affected as well. Any chance you can check and see if
there is any correlation between rocksdb compaction events, snap
trimming, and increased disk reads? Also (Sorry if you already
answered this) do we know for sure that it's hitting the
block.db/block.wal device? I suspect it is, just wanted to verify.
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io