Hi Mark and others,
last week we have finally been able to solve the problem. We are using Gentoo on our test
cluster and as it turned out the official Ebuilds are not setting
CMAKE_BUILD_TYPE=RelWithDebInfo, which alone caused the performance degradation we have
been seeing after upgrading to Octopus. After patching the Ebuild we are now getting the
same results as under Centos 8 using the official RPMs for 15.2.3, which we set up
temporarily on the same hardware in order to narrow down the cause of the problem.
As far as we understand, the Ebuilds for Nautilus are not setting
CMAKE_BUILD_TYPE=RelWithDebInfo too. So we retested Nautilus built with
CMAKE_BUILD_TYPE=RelWithDebInfo and compared the results to a build using the unpatched
ebuild. But interestingly here we have been seeing hardly any difference, it seems to
affect the performance of Octopus in particular at least in our small setup.
We reported the issue in the Gentoo Bugtracker:
https://bugs.gentoo.org/733316
The results we are now seeing with Octopus are in general as good or better compared to
Nautilus.
You can find them here:
https://docs.google.com/spreadsheets/d/13XH3Uuvcq16rrEMp88_Lb-vJfkwjJCDYlBD…
We are seeing performance improvements by upgrading to Octopus, which are comparable to
the results linked below, where you have been testing Nautilus vs. Octopus vs. Master with
8 Nodes and 64 NVMe, but to our surprise only when testing with a single client. When
testing with nine clients, we are seeing a massive performance boost with sequential 4k
writes, all other results are fairly the same as with Nautilus. Not sure why, but maybe
the improvements only affect larger deployments like yours. On the other hand when
comparing with your results, we think that our small cluster was performing quite well
with Nautilus already.
The only thing which is strange: We are losing about 60% performance with a single client
and sequential 4k reads (and about 20% with nine clients) by upgrading to Octopus, which
has not been the case in your tests, as far as we see. We have tested this several times,
it's reproducible and consistent and it does not seem to be related to the issue with
not setting CMAKE_BUILD_TYPE.
Just wanted to share these results, as it might be interesting for you. Thank you very
much for your help.