On Tue, May 11, 2021 at 10:50 AM Konstantin Shalygin <k0ste(a)k0ste.ru> wrote:
Hi Ilya,
On 3 May 2021, at 14:15, Ilya Dryomov <idryomov(a)gmail.com> wrote:
I don't think empty directories matter at this point. You may not have
had 12 OSDs at any point in time, but the max_osd value appears to have
gotten bumped when you were replacing those disks.
Note that max_osd being greater than the number of OSDs is not a big
problem by itself. The osdmap is going to be larger and require more
memory but that's it. You can test by setting it back to 12 and trying
to mount -- it should work. The issue is specific to how to those OSDs
were replaced -- something went wrong and the osdmap somehow ended up
with rather bogus addrvec entries. Not sure if it's ceph-deploy's
fault, something weird in ceph.conf (back then) or a an actual ceph
bug.
What actuality is bug? When max_osds > total_osd_in?
No, as mentioned above max_osds being greater is not a problem per se.
Having max_osds set to 10000 when you only have a few dozen is going to
waste a lot of memory and network bandwidth, but if it is just slightly
bigger it's not something to worry about. Normally these "spare" slots
are ignored, but in Magnus' case they looked rather weird and the kernel
refused the osdmap. See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
for details.
What kernel's was affected?
5.11 and 5.12, backports are on the way.
For example, max_osds is 132, total_osds_in in 126, max osd number is 131 - is affected?
No, max_osds alone is not enough to trigger it.
Thanks,
Ilya