While poking through one of our Nautilus clusters I noticed OSDs have
HB peers that are not sharing PGs.
Nautilus added OSDMap::get_random_up_osds_by_subtree() to select
random OSDs of type mon_osd_reporter_subtree_level even if
mon_osd_min_down_reporters is already met.
If you have multiple types of hardware mapped to different pools, OSDs
between these pools will HB each other which is not necessarily
expected from an operations point of view. This also has the potential
of wrongly marking OSDs down if one type of hardware is having issues.
The more HB peers the better but couldn't we increase the default for
mon_osd_min_down_reporters instead and if not met, call
get_random_up_osds_by_subtree? I initially made a patch to exclude any
OSD not part of the same crush root, but this wouldn't work widely
since it's possible to have a crush rule spanning multiple trees, I'm
not sure what other alternatives there are.
Another bit from pre-nautilus, the osd id-1 and +1 are added to the HB
peers, in order to have a "fully-connected set". I'm not sure I
understand that comment, could somebody briefly explain how it creates
a fully connected set and what set we're talking about?