[ceph-users] Re: Replacing OSD with containerized deployment

31 Jan 2023

OK, the OSD is filled again. In and Up, but it is not using the nvme 
WAL/DB anymore.

And it looks like the lvm group of the old osd is still on the nvme 
drive. I come to this idea, because the two nvme drives still have 9 lvm 
groups each. 18 groups but only 17 osd are using the nvme (shown in 
dashboard).

Do you have a hint on how to fix this?

Best

Ken

On 30.01.23 16:50, mailing-lists wrote:
> oph wait,
>
> i might have been too impatient:
>
>
> 1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06
>
> 1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup 
> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>
> 1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup 
> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>
> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup 
> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>
> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>
>
>
> Although, it doesnt show the NVME as wal/db yet, but i will let it 
> proceed to a clear state until i do anything further.
>
>
> On 30.01.23 16:42, mailing-lists wrote:
>> root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
>> destroyed osd.232
>>
>>
>> OSD 232 shows now as destroyed and out in the dashboard.
>>
>>
>> root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
>> --> Zapping: /dev/sdm
>> --> --destroy was not specified, but zapping a whole device will 
>> remove the partition table
>> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 
>> conv=fsync
>>  stderr: 10+0 records in
>> 10+0 records out
>>  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
>> --> Zapping successful for: <Raw Device: /dev/sdm>
>>
>>
>> root@ceph-a2-01:/# ceph orch device ls
>>
>> ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T 21m ago *locked*
>>
>>
>> It shows locked and is not automatically added now, which is good i 
>> think? otherwise it would probably be a new osd 307.
>>
>>
>> root@ceph-a2-01:/# ceph orch osd rm status
>> No OSD remove/replace operations reported
>>
>> root@ceph-a2-01:/# ceph orch osd rm 232 --replace
>> Unable to find OSDs: ['232']
>>
>>
>> Unfortunately it is still not replacing.
>>
>>
>> It is so weird, i tried this procedure exactly in my virtual ceph 
>> environment and it just worked. The real scenario is acting up now. -.-
>>
>>
>> Do you have more hints for me?
>>
>> Thank you for your help so far!
>>
>>
>> Best
>>
>> Ken
>>
>>
>> On 30.01.23 15:46, David Orman wrote:
>>> The 'down' status is why it's not being replaced, vs. destroyed,

>>> which would allow the replacement. I'm not sure why --replace lead 
>>> to that scenario, but you will probably need to mark it destroyed 
>>> for it to be replaced.
>>>
>>>
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-… 
>>> has instructions on the non-orch way of doing that. You only need 1/2.
>>>
>>> You should look through your logs to see what happened that the OSD 
>>> was marked down and not destroyed. Obviously, make sure you 
>>> understand ramifications before running any commands. :)
>>>
>>> David
>>>
>>> On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
>>>> # ceph orch osd rm status
>>>> No OSD remove/replace operations reported
>>>> # ceph orch osd rm 232 --replace
>>>> Unable to find OSDs: ['232']
>>>>
>>>> It is not finding 232 anymore. It is still shown as down and out in 
>>>> the
>>>> Ceph-Dashboard.
>>>>
>>>>
>>>>       pgs:     3236 active+clean
>>>>
>>>>
>>>> This is the new disk shown as locked (because unzapped at the moment).
>>>>
>>>> # ceph orch device ls
>>>>
>>>> ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T 9m ago
>>>> locked
>>>>
>>>>
>>>> Best
>>>>
>>>> Ken
>>>>
>>>>
>>>> On 29.01.23 18:19, David Orman wrote:
>>>>> What does "ceph orch osd rm status" show before you try the
zap? Is
>>>>> your cluster still backfilling to the other OSDs for the PGs that 
>>>>> were
>>>>> on the failed disk?
>>>>>
>>>>> David
>>>>>
>>>>> On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
>>>>>> Dear Ceph-Users,
>>>>>>
>>>>>> i am struggling to replace a disk. My ceph-cluster is not 
>>>>>> replacing the
>>>>>> old OSD even though I did:
>>>>>>
>>>>>> ceph orch osd rm 232 --replace
>>>>>>
>>>>>> The OSD 232 is still shown in the osd list, but the new hdd will
be
>>>>>> placed as a new OSD. This wouldnt mind me much, if the OSD was
also
>>>>>> placed on the bluestoreDB / NVME, but it doesn't.
>>>>>>
>>>>>>
>>>>>> My steps:
>>>>>>
>>>>>> "ceph orch osd rm 232 --replace"
>>>>>>
>>>>>> remove the failed hdd.
>>>>>>
>>>>>> add the new one.
>>>>>>
>>>>>> Convert the disk within the servers bios, so that the node can
have
>>>>>> direct access on it.
>>>>>>
>>>>>> It shows up as /dev/sdt,
>>>>>>
>>>>>> enter maintenance mode
>>>>>>
>>>>>> reboot server
>>>>>>
>>>>>> drive is now /dev/sdm (which the old drive had)
>>>>>>
>>>>>> "ceph orch device zap node-x /dev/sdm"
>>>>>>
>>>>>> A new OSD is placed on the cluster.
>>>>>>
>>>>>>
>>>>>> Can you give me a hint, where did I take a wrong turn? Why is the

>>>>>> disk
>>>>>> not being used as OSD 232?
>>>>>>
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>>>> _______________________________________________
>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users(a)ceph.io
>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Replacing OSD with containerized deployment