Trying to upgrade a containerized setup from 16.2.10
to 16.2.11 gave us two
big surprises, I wanted to share in case anyone else encounters the same. I
don't see any nice solution to this apart from a new release that fixes the
performance regression that completely breaks the container setup in
cephadm due to timeouts:
After some digging, we would that the it was the "ceph-volume" command that
kept timing out, and after a ton of digging, found that it does so because
of
https://github.com/ceph/ceph/commit/bea9f4b643ce32268ad79c0fc257b25ff2f8333…
which was introduced into 16.2.11.
Unfortunately, the vital fix for this
https://github.com/ceph/ceph/commit/8d7423c3e75afbe111c91e699ef3cb1c0beee61b
was not included in 16.2.11
So, in a setup like ours, with *many* devices, a simple "ceph-volume raw
list" now takes over 10 minutes to run (instead of 5 seconds in 16.2.10).