Recently, one of our online clusters encountered a problem. When we
are trying to add machines to an existing crush root, multiple pgs'
states are stuck in unknown or activating or peering. Right now, we
have resolved this problem by restarting those OSDs that are related
to the inactive pgs, but the root cause is still unknown.
On the other hand, we found that, in our online configuration, there
is an entry "bluestore_min_alloc_size_hdd = 262144", and no
"bluefs_shared_alloc_size" is configured which means it is the default
value 64K. Normally, this configuration would trigger an error when
creating osds. However we found that our online systems' version is
14.2.4, and it wouldn't trigger that error in this version.
My question is: could this misconfiguration be the root cause of the
problem mentioned above? Thanks:-)
Show replies by date