Hi,

I discovered this back in June and reported the issue in [1].

The solution (for me it's only a workaround) was to set "--ulimit nofile=1024:4096" on the docker run calls in ceph-ansible [2][3].
This is also implemented in the ceph/daemon container images because we're using ceph-volume/ceph-disk commands before running the ceph-osd process [4].

Note that the issue is also present on non container deployment but the default max open files values are already set to 1024:4096 whereas in container it's set to 1048576.
If you increase this value on non containerized deployment then you will see the same behaviour.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1722562
[2] https://github.com/ceph/ceph-ansible/blob/master/library/ceph_volume.py#L192
[3] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/tasks/start_osds.yml#L24
[4] https://github.com/ceph/ceph-container/blob/master/src/daemon/osd_scenarios/osd_volume_activate.sh#L66-L67

Regards,

Dimitri

On Wed, Nov 13, 2019 at 8:52 AM Sebastien Han <shan@redhat.com> wrote:
I think Dimitry found that weeks ago and did some changes in
ceph-ansible to speed that up (along the same line IIRC)
Dim, can you share what you did?

Thanks!
–––––––––
Sébastien Han
Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Wed, Nov 13, 2019 at 2:46 PM Sage Weil <sweil@redhat.com> wrote:
>
> On Wed, 13 Nov 2019, Paul Cuzner wrote:
> > Hi Sage,
> >
> > So I tried switching out the udev calls to pyudev, and shaved a whopping
> > 1sec from the timings..Looking deeper I found that the issue is related to
> > *ALL* process.Popen calls (of which there are many!) - they all use
> > close_fds=True.
> >
> > My suspicion is that when running in a container the close_fds sees fd's
> > from the host too - so it tries to tidy up more than it should. If you set
> > ulimit -n 1024 or something and then try a ceph-volume inventory, it should
> > just fly through! (at least it did for me)
> >
> > Let me know if this works for you.
>
> Yes.. that speeds of significantly!  1.5s -> .2s in my case.  I can't say
> that I understand why, though... it seems like ulimit -n will make file
> open attempts fail, but I don't see any failures.
>
> Can we drop the close_fds arg?
>
> sage
> _______________________________________________
> Dev mailing list -- dev@ceph.io
> To unsubscribe send an email to dev-leave@ceph.io
>