Hi Loïc:
On Sun, Apr 4, 2021 at 4:14 PM Loïc Dachary <loic(a)dachary.org> wrote:
<snip>
Is the above assumption correct?
Yes,
absolutely right. I changed the variable to be 128 bytes aligned[0],
is it ok? Maybe there is a constant somewhere that provides this number
(number of bytes to be "cache aligned") so it is not hard coded?
Here are two ways you can get the cacheline size.
One is by reading
/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
Another is with: gcc -DLEVEL1_DCACHE_LINESIZE=`getconf
LEVEL1_DCACHE_LINESIZE` ...
Another is with: grep -m1 cache_alignment /proc/cpuinfo
Most often it's 64 bytes. I believe the power cpus are 128 bytes. Itanium
was 128 bytes.
However, even on the X86 platforms where the cacheline size is 64 bytes,
it's very often a good idea to pad your hot locks or hot data items out to
128 bytes (e.g. 2 cachelines instead of 1).
The reason is this: By default when Intel processors fetch a cacheline of
data, the cpu will gratuitously fetch the next cacheline, just in case you
need it. However if that next cacheline is a different hot cacheline, the
last thing you need is invalidate it with gratuitous writes.
We have seen performance problems due to this, and the resolution was to
pad the hot locks and variables out to 128 bytes. Some of the big database
vendors pad out to 128 bytes because of this as well.
I looked at the 2nd tar.gz file that you uploaded
(ceph-c2c-jmario-2021-04-04-22-13.tar.gz ).
As expected, the "without-sharding" case looked like it did earlier.
However, in the "with-sharding" case, it didn't even look like your
ceph_test_c2c program was even running. I even dumped the raw samples from
the perf.data file and didn't see any loads or stores from the program.
Can you double check that it ran correctly?
Joe