[ceph-users] Re: Stuck OSD service specification - can't remove

15 Mar 2023

I ended up in the same situation while playing around with a test cluster. The SUSE team
has an article [1] for this case, the following helped me resolve this issue. I had three
different osd specs in place for the same three nodes:

osd                               3  <deleting>  3w   nautilus2;nautilus3          

osd.osd-hdd-ssd                   3  2m ago      2w   nautilus;nautilus2;nautilus3  
osd.osd-hdd-ssd-mix               3  2m ago      -    <unmanaged>

I replaced the "service_name" with the more suiting value
("osd.osd-hdd-ssd") in the unit.meta file of each OSD containing the invalid
spec, then restarted each affected OSD. It probably wouldn't have been necessary but I
wanted to see the effect immediately, so I failed over the mgr (ceph mgr fail), now I only
have one valid osd spec.

# before
nautilus3:~ # grep service_name
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta
    "service_name": "osd",
# after
nautilus3:~ # grep service_name
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta 
    "service_name": "osd.osd-hdd-ssd",

nautilus3:~ # ceph orch ls osd
NAME             PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                     
osd.osd-hdd-ssd               9  10m ago    2w   nautilus;nautilus2;nautilus3

Regards,
Eugen

[1] https://www.suse.com/support/kb/doc/?id=000020667

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Stuck OSD service specification - can't remove