Hi Joe,
You remember correctly: I had ideas to refine the tests further and would
have done so if the review phase had been longer. But it went faster than
I expected, which is a good thing :-)
The important part (guarding against regressions) is merged and the improvements can be
added later.
Cheers
On 30/04/2021 17:16, Joe Mario wrote:
Hi Loïc:
Great to see this is moving forward.
A few questions and comments:
1) I do see the test code measuring the runtime differences between the
sharding
and non-sharding cases. Were you going to add test code to analyze the output of the
perf c2c runs (for hottest cachelines and long load latencies). That can be challenging
to figure out how to do it effectively, given all the variables involved with different
test environments.
2) Is there a check to verify the test is being run on an Intel based system?
3) I see the comment in the mempool.h file:
// Align shard to a cacheline.
//
// It would be possible to retrieve the value at runtime (for instance
// with getconf LEVEL1_DCACHE_LINESIZE or grep -m1 cache_alignment
// /proc/cpuinfo). It is easier to hard code the largest cache
// linesize for all known processors (128 bytes). If the actual cache
// linesize is smaller on a given processor, it will just waste a few
// bytes.
//
The above is slightly incorrect about wasting a few bytes. On Intel platforms where the
cacheline size is 64 bytes, it's smart to pad your
hot locks out to 128 bytes.
(I briefly mentioned this earlier.)
The reason is because of the hardware prefetchers.
When the prefetchers fetch a cacheline, they anticipate your program will also need the
next cacheline, so they gratuitously fetch that one as well. They load 128 bytes
instead of the requested 64 bytes. If that next cacheine is in a critical code path, any
users of that line will have their cacheline copy marked invalid because the prefetchers
for the earlier cacheline just overwrote it.
The performance problems we see from this are rare, but are real and are difficult to
diagnose. The Linux kernel and some big database vendors pad all their hot locks and
variables out to 128 bytes just for this reason.
4) I see the test code retained all the commands I put into the c2c.sh file so that I
could understand what the system environment was like. You only need to keep them and
the kernel perf c2c commands if you
feel you need them.
Nice to see this work is happening.
Thanks,
Joe
On Fri, Apr 30, 2021 at 10:04 AMLoïc Dachary <loic(a)dachary.org
<mailto:loic@dachary.org>> wrote:
Hi Joe,
With Josh, Kefu & Nathan's help the minimal c2c test is now in Ceph[0] and
runs on CentOS, RHEL & Ubuntu.
It will help catch regressions and diagnose them: thanks a lot for your invaluable
help in making this happen.
The next, more ambitious, step is to run c2c on a Ceph cluster under load and analyze
the output of "perf c2c" to figure out if and how cacheline contention can be
optimized.
Cheers
[0]
https://github.com/ceph/ceph/pull/41014/files
<https://github.com/ceph/ceph/pull/41014/files>
On 29/04/2021 18:23, Loïc Dachary wrote:
> Hi Kefu,
>
> On 29/04/2021 18:14, kefu chai wrote:
>>
>>
>> Loïc Dachary <loic(a)dachary.org <mailto:loic@dachary.org>
<mailto:loic@dachary.org <mailto:loic@dachary.org>>>于2021年4月28日 周三18:12写道:
>>
>> Hi Nathan,
>>
>> Josh noticed that one line could be removed[0] from the test script. I
did it and repushed[1]. Would you be so kind as
to push the change to GitHub?
>>
>>
>> Loïc and Nathan, when testing the change, I ran into an error like:
>>
>> Traceback (most recent call last):
>> File "/home/kchai/teuthology/virtualenv/bin/teuthology-suite",
line 33, in <module>
>> sys.exit(load_entry_point('teuthology',
'console_scripts', 'teuthology-suite')())
>> File "/home/kchai/teuthology/scripts/suite.py", line 189,
in main
>> return teuthology.suite.main(args)
>> File "/home/kchai/teuthology/teuthology/suite/__init__.py",
line
> 143, in main
>> run.prepare_and_schedule()
>> File "/home/kchai/teuthology/teuthology/suite/run.py", line
397,
> in prepare_and_schedule
>> num_jobs = self.schedule_suite()
>> File "/home/kchai/teuthology/teuthology/suite/run.py", line
615,
> in schedule_suite
>> self.args.newest, job_limit)
>> File "/home/kchai/teuthology/teuthology/suite/run.py", line
467,
> in collect_jobs
>> self.package_versions
>> File "/home/kchai/teuthology/teuthology/suite/util.py", line
394, in get_package_versions
>> distro_version=os_version,
>> File "/home/kchai/teuthology/teuthology/suite/util.py", line
274, in package_version_for_hash
>> sha1=hash,
>> File "/home/kchai/teuthology/teuthology/packaging.py", line
853,
> in __init__
>> super(ShamanProject, self).__init__(project, job_config, ctx, remote)
>> File "/home/kchai/teuthology/teuthology/packaging.py", line
462,
> in __init__
>> self._init_from_config()
>> File "/home/kchai/teuthology/teuthology/packaging.py", line
497,
> in _init_from_config
>> OS.version_codename(self.os_type, self.os_version)
>> File "/home/kchai/teuthology/teuthology/orchestra/opsys.py",
line 200, in version_codename
>> (version_or_codename, name))
>> KeyError: '8.3 not a ubuntu version or codename'
>>
>> I think the root cause is that the rados/standalone test suite includes
> it’s own faces for choosing a random distro, and my test happened
> to pick rhel 8.3 for testing, but the distro name was overridden by the one
specified by c2c.yaml. That’s why I had a combination of Ubuntu 8.3. I just took the
liberty to push another commit to the pull
request
> in hope to test sooner. If it looks sane to
you, could you include it in your commit? Or I can do this with your permission.
> This is perfect! I had doubts about running this against something other than
Ubuntu and was not sure which package to include. You have my permission (and gratitude)
to squash the commits together.
>
> Cheers
>>
>>
>>
>> Thanks for your help!
>>
>> [0]
https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549
<https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549>
<https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549
<https://github.com/ceph/ceph/pull/41014#pullrequestreview-645521549>>
>> [1]
https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781
<https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781>
<https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781
<https://lab.fedeproxy.eu/ceph/ceph/-/tree/wip-mempool-cacheline-49781>>
>>
>> On 25/04/2021 17:10, Loïc Dachary wrote:
>> > Great! Thank you :-)
>> >
>> > On 25/04/2021 10:39, Nathan Cutler wrote:
>> >> On Sat, Apr 24, 2021 at 11:13:39PM +0200, Loïc Dachary wrote:
>> >>> Thanks for pushing the branch. I amended it a little and
the teuthology run now passes[0]. There are still issues,
I'm sure, but
it's probably good enough for a pull request. Would you be so kind as to create one
based on my branch[1] with the following cover? Thanks a again for your help
:-)
> Sure, here you go:
>
>
https://github.com/ceph/ceph/pull/41014
<https://github.com/ceph/ceph/pull/41014>
<https://github.com/ceph/ceph/pull/41014
<https://github.com/ceph/ceph/pull/41014>>
_______________________________________________
Dev mailing list -- dev(a)ceph.io <mailto:dev@ceph.io> <mailto:dev@ceph.io
<mailto:dev@ceph.io>>
To unsubscribe send an email to dev-leave(a)ceph.io <mailto:dev-leave@ceph.io>
<mailto:dev-leave@ceph.io <mailto:dev-leave@ceph.io>>
--
Loïc Dachary, Artisan Logiciel Libre
_______________________________________________
Dev mailing list -- dev(a)ceph.io <mailto:dev@ceph.io> <mailto:dev@ceph.io
<mailto:dev@ceph.io>>
To unsubscribe send an email to dev-leave(a)ceph.io <mailto:dev-leave@ceph.io>
<mailto:dev-leave@ceph.io <mailto:dev-leave@ceph.io>>
--
Regards
Kefu Chai
_______________________________________________
Dev mailing list -- dev(a)ceph.io <mailto:dev@ceph.io>
To unsubscribe send an email to dev-leave(a)ceph.io <mailto:dev-leave@ceph.io>
--
Loïc Dachary, Artisan Logiciel Libre
--
Loïc Dachary, Artisan Logiciel Libre