My recollection is that rocksdb is always flushing,
correct. There are
conveniently only a handful of writers in rocksdb, the main ones being log
files and sst files.
We could probably put an assertion in fsync() so ensure that the
FileWriter buffer is empty and flushed...?
Thanks for your reply, sage:-) I will do that:-)
By the way, I've got another question here:
It seems that BlueStore tries to provide some kind of atomic
I/O mechanism in which data and metadata are either both modified or
both untouched. To accomplish this, for modifications whose size is
larger than prefer_defer_size, BlueStore will allocate new space for
the modifications and release the old storage space. I think, in the
long run, a initially contiguous stored file in bluestore could become
scattered if there have been many random modifications to that file.
Actually, this is what we are experiencing in our test clusters. The
consequence is that after some period of random modification, the
sequential read performance of that file is significantly degraded.
Should we make this atomic I/O mechanism optional? It seems that most
hard disk only make sure that a sector is never half-modified, for
which, I think, the deferred I/O is enough. Am I right? Thanks:-)