[ceph-users] Proper way to replace an OSD with a shared SSD for db/wal

7 Nov 2019

We are running the Mimic version of Ceph (13.2.6) and I would like to know
a proper way of replacing a defective OSD disk that has its DB and WAL on a
separate SSD drive which is shared with 9 other OSDs. More specifically,
the failing disk for osd.327 is on /dev/sdai and its wal/db are on
/dev/sdc, which is partitioned into 10 LVs, holding wal/db for osd.320-329.
When I deployed it, I used pv/vg/lvcreate commands to make VG named ssd1,
LV named db320, db321 and so on. Then I used the ceph-deploy command from
an admin node (`ceph-deploy osd create --block-db=ssd1/db327
--data=dev/sdai <node>`).  My main question is what to do about the
separate wal/db data as this page (
https://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/) does not
seem to address the issue.

1) Do I need to erase the wal/db data on the ssd1/db327 Logical Volume? If
so, how should I do that?
2) Assuming 1) is taken care of (and the "old" OSD is destroyed and the
"bad" hard drive has been physically replaced with a new one), does this
command look correct? `ceph-volume  lvm create --osd-id 327 --bluestore
--data /dev/sdai --block.db ssd1/db327`

*Mami Hayashida*
*Research Computing Associate*
Univ. of Kentucky ITS Research Computing Infrastructure

2024

2023

2022

2021

2020

2019

[ceph-users] Proper way to replace an OSD with a shared SSD for db/wal