Yeah, annoyingly `ms_bind_ipv4` it set to true by default, so if you just
set `ms_bind_ipv6` without turning off ipv4 you end up in dual stack mode.
I've created a PR to fix this with at least a warning _and_ to properly
mention this in the documentation:
https://github.com/ceph/ceph/pull/36536
Hopefully it'll land at some point :)
Matt
On Fri, Sep 4, 2020 at 12:12 AM Wido den Hollander <wido(a)42on.com> wrote:
Hi,
Last night I've spend a couple of hours debugging a issue where OSDs
would be marked as 'up', but then PGs stayed in the 'peering' state.
Looking through the admin socket I saw these OSDs were in the 'booting'
state.
Looking at the OSDMap I saw this:
osd.3 up in weight 1 up_from 26 up_thru 700 down_at 0
last_clean_interval [0,0)
[v2:[2a05:xx0:700:2::7]:6816/7923,v1:[2a05:xx:700:2::7]:6817/7923,v2:
0.0.0.0:6818/7923,v1:0.0.0.0:6819/7923]
[v2:[2a05:xx:700:2::7]:6820/7923,v1:[2a05:1500:700:2::7]:6821/7923,v2:
0.0.0.0:6822/7923,v1:0.0.0.0:6823/7923]
exists,up 786d3e9d-047f-4b09-b368-db9e8dc0805d
In ceph.conf this was set:
ms_bind_ipv6 = true
public_addr = 2a05:xx:700:2::6
On true IPv6-only nodes this works fine. But on nodes where there is
also IPv4 present this can (and will?) cause problems.
It did not use tcpdump/wireshark to investigate, but it seems that the
OSDs tried to contact each other. Using the 0.0.0.0 IPv4 address.
After adding these settings the problems were resolved:
ms_bind_msgr1 = false
ms_bind_ipv4 = false
This also disables msgrv1 as we didn't need it here. A cluster and
clients all running Octopus.
The OSDMap now showed:
osd.3 up in weight 1 up_from 704 up_thru 712 down_at 702
last_clean_interval [26,701) v2:[2a05:xx:700:2::7]:6804/791503
v2:[2a05:xx:700:2::7]:6805/791503 exists,up
786d3e9d-047f-4b09-b368-db9e8dc0805d
OSDs can back right away, PGs peered and the problems were resolved.
Wido
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io