Slow cluster and incorrect peers - ceph-users

22 Sep 2020

Hello again,

following up on the previous mail, one cluster gets rather slow at the
moment and we have spotted something "funny":

When checking ceph pg dump we see some osds have HB peers with osds that
they should not have any pg in common with.

When restarting one of the effected osds, we get the following message:

mon_cmd_maybe_osd_create fail: 'osd.12 has already bound to class
'xxx-ssd', can not reset class to 'hdd'; use 'ceph osd crush
rm-device-class <id>' to remove old class first': (16) Device or
resource busy

When checking the output of ceph osd tree, it seems to be in the correct
class:

 12 xxx-ssd   0.21767         osd.12        up  1.00000 1.00000

Is it possible that the osd has "multiple" classes / that the cluster
remebers a class that was set to osd.12 when it used to be an HDD?

The output of ceph pg dump includes at the bottom this

OSD_STAT USED    AVAIL   USED_RAW TOTAL   HB_PEERS PG_SUM PRIMARY_PG_SUM
12       150 GiB  72 GiB  151 GiB 223 GiB [3,11,13,25,36,43,54,64,71,82]    128           
 35

which is wrong, because osd.12 should only peer with osd.3 and osd.25,
which are the only ones in the same pool that has the replicated rule
set to match on xxx-ssd.

And the obvious question: how do we fix this?

At the moment we see around 75 pgs in peering and 39 activating,
most of them which are in a pool with slower SSDs, but it seems that
these peerings affect another pool that should have faster SSDs.

Best regards,

Nico

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch