Did your db/wall device show as having free space
prior to the OSD creation?
Yes.
root@ceph-a1-06:~# pvs
PV VG Fmt Attr
PSize PFree
/dev/nvme0n1 ceph-3a336b8e-ed39-4532-a199-ac6a3730840b lvm2 a--
5.82t 2.91t
/dev/nvme1n1 ceph-b38117e8-8e50-48dd-95f2-b4226286bfde lvm2 a--
5.82t 2.91t
Although:
ceph-a1-06 /dev/nvme0n1 ssd
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900788 6401G 102s ago LVM
detected, *locked*
ceph-a1-06 /dev/nvme1n1 ssd
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900777 6401G 102s ago LVM
detected, *locked*
What does your OSD service specification look like?
I am not sure. I didn't find it... It should be somewhere, right? I used
the dashboard to create the osd service.
Best
Ken
On 31.01.23 12:35, David Orman wrote:
> What does your OSD service specification look like? Did your db/wall device show as
having free space prior to the OSD creation?
>
> On Tue, Jan 31, 2023, at 04:01, mailing-lists wrote:
>> OK, the OSD is filled again. In and Up, but it is not using the nvme
>> WAL/DB anymore.
>>
>> And it looks like the lvm group of the old osd is still on the nvme
>> drive. I come to this idea, because the two nvme drives still have 9 lvm
>> groups each. 18 groups but only 17 osd are using the nvme (shown in
>> dashboard).
>>
>>
>> Do you have a hint on how to fix this?
>>
>>
>>
>> Best
>>
>> Ken
>>
>>
>>
>> On 30.01.23 16:50, mailing-lists wrote:
>>> oph wait,
>>>
>>> i might have been too impatient:
>>>
>>>
>>> 1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06
>>>
>>> 1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup
>>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>>
>>> 1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup
>>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>>
>>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup
>>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>>
>>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06':
['232']}
>>>
>>>
>>>
>>> Although, it doesnt show the NVME as wal/db yet, but i will let it
>>> proceed to a clear state until i do anything further.
>>>
>>>
>>> On 30.01.23 16:42, mailing-lists wrote:
>>>> root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
>>>> destroyed osd.232
>>>>
>>>>
>>>> OSD 232 shows now as destroyed and out in the dashboard.
>>>>
>>>>
>>>> root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
>>>> --> Zapping: /dev/sdm
>>>> --> --destroy was not specified, but zapping a whole device will
>>>> remove the partition table
>>>> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10
>>>> conv=fsync
>>>> stderr: 10+0 records in
>>>> 10+0 records out
>>>> stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
>>>> --> Zapping successful for: <Raw Device: /dev/sdm>
>>>>
>>>>
>>>> root@ceph-a2-01:/# ceph orch device ls
>>>>
>>>> ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 21m ago *locked*
>>>>
>>>>
>>>> It shows locked and is not automatically added now, which is good i
>>>> think? otherwise it would probably be a new osd 307.
>>>>
>>>>
>>>> root@ceph-a2-01:/# ceph orch osd rm status
>>>> No OSD remove/replace operations reported
>>>>
>>>> root@ceph-a2-01:/# ceph orch osd rm 232 --replace
>>>> Unable to find OSDs: ['232']
>>>>
>>>>
>>>> Unfortunately it is still not replacing.
>>>>
>>>>
>>>> It is so weird, i tried this procedure exactly in my virtual ceph
>>>> environment and it just worked. The real scenario is acting up now. -.-
>>>>
>>>>
>>>> Do you have more hints for me?
>>>>
>>>> Thank you for your help so far!
>>>>
>>>>
>>>> Best
>>>>
>>>> Ken
>>>>
>>>>
>>>> On 30.01.23 15:46, David Orman wrote:
>>>>> The 'down' status is why it's not being replaced, vs.
destroyed,
>>>>> which would allow the replacement. I'm not sure why --replace
lead
>>>>> to that scenario, but you will probably need to mark it destroyed
>>>>> for it to be replaced.
>>>>>
>>>>>
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-…
>>>>> has instructions on the non-orch way of doing that. You only need
1/2.
>>>>>
>>>>> You should look through your logs to see what happened that the OSD
>>>>> was marked down and not destroyed. Obviously, make sure you
>>>>> understand ramifications before running any commands. :)
>>>>>
>>>>> David
>>>>>
>>>>> On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
>>>>>> # ceph orch osd rm status
>>>>>> No OSD remove/replace operations reported
>>>>>> # ceph orch osd rm 232 --replace
>>>>>> Unable to find OSDs: ['232']
>>>>>>
>>>>>> It is not finding 232 anymore. It is still shown as down and out
in
>>>>>> the
>>>>>> Ceph-Dashboard.
>>>>>>
>>>>>>
>>>>>> pgs: 3236 active+clean
>>>>>>
>>>>>>
>>>>>> This is the new disk shown as locked (because unzapped at the
moment).
>>>>>>
>>>>>> # ceph orch device ls
>>>>>>
>>>>>> ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 9m ago
>>>>>> locked
>>>>>>
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Ken
>>>>>>
>>>>>>
>>>>>> On 29.01.23 18:19, David Orman wrote:
>>>>>>> What does "ceph orch osd rm status" show before you
try the zap? Is
>>>>>>> your cluster still backfilling to the other OSDs for the PGs
that
>>>>>>> were
>>>>>>> on the failed disk?
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
>>>>>>>> Dear Ceph-Users,
>>>>>>>>
>>>>>>>> i am struggling to replace a disk. My ceph-cluster is
not
>>>>>>>> replacing the
>>>>>>>> old OSD even though I did:
>>>>>>>>
>>>>>>>> ceph orch osd rm 232 --replace
>>>>>>>>
>>>>>>>> The OSD 232 is still shown in the osd list, but the new
hdd will be
>>>>>>>> placed as a new OSD. This wouldnt mind me much, if the
OSD was also
>>>>>>>> placed on the bluestoreDB / NVME, but it doesn't.
>>>>>>>>
>>>>>>>>
>>>>>>>> My steps:
>>>>>>>>
>>>>>>>> "ceph orch osd rm 232 --replace"
>>>>>>>>
>>>>>>>> remove the failed hdd.
>>>>>>>>
>>>>>>>> add the new one.
>>>>>>>>
>>>>>>>> Convert the disk within the servers bios, so that the
node can have
>>>>>>>> direct access on it.
>>>>>>>>
>>>>>>>> It shows up as /dev/sdt,
>>>>>>>>
>>>>>>>> enter maintenance mode
>>>>>>>>
>>>>>>>> reboot server
>>>>>>>>
>>>>>>>> drive is now /dev/sdm (which the old drive had)
>>>>>>>>
>>>>>>>> "ceph orch device zap node-x /dev/sdm"
>>>>>>>>
>>>>>>>> A new OSD is placed on the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you give me a hint, where did I take a wrong turn?
Why is the
>>>>>>>> disk
>>>>>>>> not being used as OSD 232?
>>>>>>>>
>>>>>>>>
>>>>>>>> Best
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>>>>> To unsubscribe send an emailtoceph-users-leave(a)ceph.io
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>>>> To unsubscribe send an emailtoceph-users-leave(a)ceph.io
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>>> To unsubscribe send an emailtoceph-users-leave(a)ceph.io
>>>>> _______________________________________________
>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>> To unsubscribe send an emailtoceph-users-leave(a)ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users(a)ceph.io
>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list --ceph-users(a)ceph.io
>> To unsubscribe send an email toceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list --ceph-users(a)ceph.io
> To unsubscribe send an email toceph-users-leave(a)ceph.io