Hi,
I discovered this back in June and reported the issue in [1].
The solution (for me it's only a workaround) was to set "--ulimit
nofile=1024:4096" on the docker run calls in ceph-ansible [2][3].
This is also implemented in the ceph/daemon container images because we're
using ceph-volume/ceph-disk commands before running the ceph-osd process
[4].
Note that the issue is also present on non container deployment but the
default max open files values are already set to 1024:4096 whereas in
container it's set to 1048576.
If you increase this value on non containerized deployment then you will
see the same behaviour.
[1]
Regards,
Dimitri
On Wed, Nov 13, 2019 at 8:52 AM Sebastien Han <shan(a)redhat.com> wrote:
I think Dimitry found that weeks ago and did some
changes in
ceph-ansible to speed that up (along the same line IIRC)
Dim, can you share what you did?
Thanks!
–––––––––
Sébastien Han
Principal Software Engineer, Storage Architect
"Always give 100%. Unless you're giving blood."
On Wed, Nov 13, 2019 at 2:46 PM Sage Weil <sweil(a)redhat.com> wrote:
On Wed, 13 Nov 2019, Paul Cuzner wrote:
> Hi Sage,
>
> So I tried switching out the udev calls to pyudev, and shaved a
whopping
> 1sec from the timings..Looking deeper I
found that the issue is
related to
> *ALL* process.Popen calls (of which there
are many!) - they all use
> close_fds=True.
>
> My suspicion is that when running in a container the close_fds sees
fd's
> from the host too - so it tries to tidy up
more than it should. If you
set
> ulimit -n 1024 or something and then try a
ceph-volume inventory, it
should
just fly
through! (at least it did for me)
Let me know if this works for you.
Yes.. that speeds of significantly! 1.5s -> .2s in my case. I can't say
that I understand why, though... it seems like ulimit -n will make file
open attempts fail, but I don't see any failures.
Can we drop the close_fds arg?
sage
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io