[ceph-users] Re: Replacing OSD with containerized deployment

30 Jan 2023

oph wait,

i might have been too impatient:

1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06

1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

Although, it doesnt show the NVME as wal/db yet, but i will let it 
proceed to a clear state until i do anything further.

On 30.01.23 16:42, mailing-lists wrote:
> root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
> destroyed osd.232
>
>
> OSD 232 shows now as destroyed and out in the dashboard.
>
>
> root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
> --> Zapping: /dev/sdm
> --> --destroy was not specified, but zapping a whole device will 
> remove the partition table
> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 
> conv=fsync
>  stderr: 10+0 records in
> 10+0 records out
>  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
> --> Zapping successful for: <Raw Device: /dev/sdm>
>
>
> root@ceph-a2-01:/# ceph orch device ls
>
> ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T             21m ago 
> *locked*
>
>
> It shows locked and is not automatically added now, which is good i 
> think? otherwise it would probably be a new osd 307.
>
>
> root@ceph-a2-01:/# ceph orch osd rm status
> No OSD remove/replace operations reported
>
> root@ceph-a2-01:/# ceph orch osd rm 232 --replace
> Unable to find OSDs: ['232']
>
>
> Unfortunately it is still not replacing.
>
>
> It is so weird, i tried this procedure exactly in my virtual ceph 
> environment and it just worked. The real scenario is acting up now. -.-
>
>
> Do you have more hints for me?
>
> Thank you for your help so far!
>
>
> Best
>
> Ken
>
>
> On 30.01.23 15:46, David Orman wrote:
>> The 'down' status is why it's not being replaced, vs. destroyed, 
>> which would allow the replacement. I'm not sure why --replace lead to 
>> that scenario, but you will probably need to mark it destroyed for it 
>> to be replaced.
>>
>> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-…

>> has instructions on the non-orch way of doing that. You only need 1/2.
>>
>> You should look through your logs to see what happened that the OSD 
>> was marked down and not destroyed. Obviously, make sure you 
>> understand ramifications before running any commands. :)
>>
>> David
>>
>> On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
>>> # ceph orch osd rm status
>>> No OSD remove/replace operations reported
>>> # ceph orch osd rm 232 --replace
>>> Unable to find OSDs: ['232']
>>>
>>> It is not finding 232 anymore. It is still shown as down and out in the
>>> Ceph-Dashboard.
>>>
>>>
>>>       pgs:     3236 active+clean
>>>
>>>
>>> This is the new disk shown as locked (because unzapped at the moment).
>>>
>>> # ceph orch device ls
>>>
>>> ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T 9m ago
>>> locked
>>>
>>>
>>> Best
>>>
>>> Ken
>>>
>>>
>>> On 29.01.23 18:19, David Orman wrote:
>>>> What does "ceph orch osd rm status" show before you try the
zap? Is
>>>> your cluster still backfilling to the other OSDs for the PGs that were
>>>> on the failed disk?
>>>>
>>>> David
>>>>
>>>> On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
>>>>> Dear Ceph-Users,
>>>>>
>>>>> i am struggling to replace a disk. My ceph-cluster is not 
>>>>> replacing the
>>>>> old OSD even though I did:
>>>>>
>>>>> ceph orch osd rm 232 --replace
>>>>>
>>>>> The OSD 232 is still shown in the osd list, but the new hdd will be
>>>>> placed as a new OSD. This wouldnt mind me much, if the OSD was also
>>>>> placed on the bluestoreDB / NVME, but it doesn't.
>>>>>
>>>>>
>>>>> My steps:
>>>>>
>>>>> "ceph orch osd rm 232 --replace"
>>>>>
>>>>> remove the failed hdd.
>>>>>
>>>>> add the new one.
>>>>>
>>>>> Convert the disk within the servers bios, so that the node can have
>>>>> direct access on it.
>>>>>
>>>>> It shows up as /dev/sdt,
>>>>>
>>>>> enter maintenance mode
>>>>>
>>>>> reboot server
>>>>>
>>>>> drive is now /dev/sdm (which the old drive had)
>>>>>
>>>>> "ceph orch device zap node-x /dev/sdm"
>>>>>
>>>>> A new OSD is placed on the cluster.
>>>>>
>>>>>
>>>>> Can you give me a hint, where did I take a wrong turn? Why is the 
>>>>> disk
>>>>> not being used as OSD 232?
>>>>>
>>>>>
>>>>> Best
>>>>>
>>>>> Ken
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users(a)ceph.io
>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list --ceph-users(a)ceph.io
>> To unsubscribe send an email toceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Replacing OSD with containerized deployment