Misleading error (osd has already bound to class) when starting osd on nautilus? - ceph-users

25 Nov 2020

Hi!

I have a nautilus ceph cluster, and today I restarted one of the osd daemons
and spend some time trying to debug an error I was seeing in the log, though it
seems the osd is actually working.

The error I was seeing is:
```
Nov 25 09:07:43 osd15 systemd[1]: Starting Ceph object storage daemon osd.44...
Nov 25 09:07:43 osd15 systemd[1]: Started Ceph object storage daemon osd.44.
Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.846 7f55395fbc80 -1 osd.44
106947 log_to_monitors {default=true}
Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.850 7f55395fbc80 -1 osd.44
106947 mon_cmd_maybe_osd_create fail: 'osd.44 has already bound to class
'ssd', can not reset class to 'hdd'; use 'ceph osd crush
rm-device-class <id>' to remove old class first': (16) Device or resource
busy
```

There's no other messages in the journal so at first I thought that the osd
failed to start.
But it seems to be up and working correctly anyhow.

There's no "hdd" class in my crush map:
```
# ceph osd crush class ls
[
    "ssd"
]
```

And that osd is actually of the correct class:
```
# ceph osd crush get-device-class osd.44
ssd
```

```
# uname -a
Linux osd15 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

# ceph --version
ceph version 14.2.5-1-g23e76c7aa6 (23e76c7aa6e15817ffb6741aafbc95ca99f24cbb) nautilus
(stable)
```

The osd shows up in the cluster and it's receiving load, so there seems to be
no problem, but does anyone know what that error is about?

Thanks!

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."