Hi Loïc:
Great to see this is moving forward.
A few questions and comments:
1) I do see the test code measuring the runtime differences between the
sharding and non-sharding cases. Were you going to add test code to
analyze the output of the perf c2c runs (for hottest cachelines and long
load latencies). That can be challenging to figure out how to do it
effectively, given all the variables involved with different test
environments.
2) Is there a check to verify the test is being run on an Intel based
system?
3) I see the comment in the mempool.h file:
// Align shard to a cacheline.
//
// It would be possible to retrieve the value at runtime (for instance
// with getconf LEVEL1_DCACHE_LINESIZE or grep -m1 cache_alignment
// /proc/cpuinfo). It is easier to hard code the largest cache
// linesize for all known processors (128 bytes). If the actual cache
// linesize is smaller on a given processor, it will just waste a few
// bytes.
//
The above is slightly incorrect about wasting a few bytes. On Intel
platforms where the cacheline size is 64 bytes, it's smart to pad your hot
locks out to 128 bytes. (I briefly mentioned this earlier.)
The reason is because of the hardware prefetchers. When the prefetchers
fetch a cacheline, they anticipate your program will also need the next
cacheline, so they gratuitously fetch that one as well. They load 128
bytes instead of the requested 64 bytes. If that next cacheine is in a
critical code path, any users of that line will have their cacheline copy
marked invalid because the prefetchers for the earlier cacheline just
overwrote it.
The performance problems we see from this are rare, but are real and are
difficult to diagnose. The Linux kernel and some big database vendors pad
all their hot locks and variables out to 128 bytes just for this reason.
4) I see the test code retained all the commands I put into the c2c.sh file
so that I could understand what the system environment was like. You only
need to keep them and the kernel perf c2c commands if you feel you need
them.
Nice to see this work is happening.
Thanks,
Joe
On Fri, Apr 30, 2021 at 10:04 AM Loïc Dachary <loic(a)dachary.org> wrote:
Hi Joe,
With Josh, Kefu & Nathan's help the minimal c2c test is now in Ceph[0] and
runs on CentOS, RHEL & Ubuntu.
It will help catch regressions and diagnose them: thanks a lot for your
invaluable help in making this happen.
The next, more ambitious, step is to run c2c on a Ceph cluster under load
and analyze the output of "perf c2c" to figure out if and how cacheline
contention can be optimized.
Cheers
[0]
https://github.com/ceph/ceph/pull/41014/files
On 29/04/2021 18:23, Loïc Dachary wrote:
Hi Kefu,
On 29/04/2021 18:14, kefu chai wrote:
>
>
> Loïc Dachary <loic(a)dachary.org <mailto:loic@dachary.org>>于2021年4月28日
周三18:12写道:
>
> Hi Nathan,
>
> Josh noticed that one line could be removed[0] from the test
script. I did
it and repushed[1]. Would you be so kind as to push the
change to GitHub?
>
>
> Loïc and Nathan, when testing the change, I ran into an error like:
>
> Traceback (most recent call last):
> File "/home/kchai/teuthology/virtualenv/bin/teuthology-suite", line
33, in <module>
>
sys.exit(load_entry_point('teuthology', 'console_scripts',
'teuthology-suite')())
File
"/home/kchai/teuthology/scripts/suite.py", line 189, in main
return teuthology.suite.main(args)
File "/home/kchai/teuthology/teuthology/suite/__init__.py", line
143,
in main
run.prepare_and_schedule()
File "/home/kchai/teuthology/teuthology/suite/run.py", line 397,
in
prepare_and_schedule
num_jobs = self.schedule_suite()
File "/home/kchai/teuthology/teuthology/suite/run.py", line 615,
in
schedule_suite
self.args.newest, job_limit)
File "/home/kchai/teuthology/teuthology/suite/run.py", line 467,
in
collect_jobs
> self.package_versions
> File "/home/kchai/teuthology/teuthology/suite/util.py", line 394, in
get_package_versions
> distro_version=os_version,
> File "/home/kchai/teuthology/teuthology/suite/util.py", line 274, in
package_version_for_hash
sha1=hash,
File "/home/kchai/teuthology/teuthology/packaging.py", line 853,
in
__init__
> super(ShamanProject, self).__init__(project, job_config, ctx,
remote)
File
"/home/kchai/teuthology/teuthology/packaging.py", line 462,
in __init__
self._init_from_config()
File "/home/kchai/teuthology/teuthology/packaging.py", line 497,
in
_init_from_config
> OS.version_codename(self.os_type, self.os_version)
> File "/home/kchai/teuthology/teuthology/orchestra/opsys.py", line
200, in version_codename
(version_or_codename, name))
KeyError: '8.3 not a ubuntu version or codename'
I think the root cause is that the rados/standalone test suite includes
it’s own
faces for choosing a random distro, and my test happened
to pick rhel 8.3 for testing, but the distro name was overridden by the
one
specified by c2c.yaml. That’s why I had a combination of Ubuntu 8.3. I
just took the liberty to push another commit to the pull request
in hope to test sooner. If it looks sane to you,
could you include it in
your commit? Or I can do this with your permission.
This is perfect! I had doubts about running this
against something other
than Ubuntu and was not sure which package to include. You
have my
permission (and gratitude) to squash the commits together.
Cheers
>
>
>
> Thanks for your help!
>
> [0]
https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549
<
https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549>
> [1]
https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781 <
https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781>
>
> On 25/04/2021 17:10, Loïc Dachary wrote:
> > Great! Thank you :-)
> >
> > On 25/04/2021 10:39, Nathan Cutler wrote:
> >> On Sat, Apr 24, 2021 at 11:13:39PM +0200, Loïc Dachary wrote:
> >>> Thanks for pushing the branch. I amended it a little and the
teuthology run now passes[0]. There are still issues, I'm sure, but it's
probably good enough for a pull request. Would you be so kind as to create
one based on my branch[1] with the following cover? Thanks a again for your
help
> > :-)
> >> Sure, here you go:
> >>
> >>
https://github.com/ceph/ceph/pull/41014 <
https://github.com/ceph/ceph/pull/41014>
> >
> > _______________________________________________
> > Dev mailing list -- dev(a)ceph.io <mailto:dev@ceph.io>
> > To unsubscribe send an email to dev-leave(a)ceph.io <mailto:
dev-leave(a)ceph.io>
>
> --
> Loïc Dachary, Artisan Logiciel Libre
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io <mailto:dev@ceph.io>
> To unsubscribe send an email to dev-leave(a)ceph.io <mailto:
dev-leave(a)ceph.io>
--
Regards
Kefu Chai
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
--
Loïc Dachary, Artisan Logiciel Libre