Hi Guillaume,
thank you very much for the quick clarification and elaborate workaround.
We’ll check if manual migration is feasible with our setup with respect to the time
needed. Alternatively, we’re looking into completely redeploying all affected OSDs (i.e.
shrinking the cluster with ceph-ansible and newly provisioning all the devices).
Thanks as well for giving us the hint with the flags. In both cases it makes sense to
prevent unnecessary data migration (by setting noout, norecovery, etc.) during the
procedure.
Cheers, Len
Guillaume Abrioux schrieb am 2023-01-18:
Hi Len,
Indeed, this is not possible with ceph-ansible.
One option would be to do it manually with `ceph-volume lvm migrate`:
(Note that it can be tedious given that it requires a
lot of manual
operations, especially for clusters with a large number of OSDs.)
Initial setup:
```
# cat group_vars/all
---
devices:
- /dev/sdb
dedicated_devices:
- /dev/sda
```
```
[root@osd0 ~]# lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
8:0 0 50G 0 disk
`-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
252:1 0 50G 0 lvm
sdb
8:16 0 50G 0 disk
`-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
252:0 0 50G 0 lvm
sdc
8:32 0 50G 0 disk
sdd
8:48 0 50G 0 disk
vda
253:0 0 11G 0 disk
`-vda1
253:1 0 10G 0 part /
```
```
[root@osd0 ~]# lvs
LV VG
Attr LSize Pool Origin Data% Meta% Move Log
Cpy%Sync Convert
osd-block-70fd3b96-7bb2-4ae3-a0f8-4d18748186f9
ceph-4c77295c-28a5-440a-9561-b9dc4c814e36 -wi-ao---- <50.00g
osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6 -wi-ao---- <50.00g
[root@osd0 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
ceph-4c77295c-28a5-440a-9561-b9dc4c814e36 1 1 0 wz--n- <50.00g 0
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6 1 1 0 wz--n- <50.00g 0
```
Create a tmp LV on your new device:
```
[root@osd0 ~]# pvcreate /dev/sdd
Physical volume "/dev/sdd" successfully created.
[root@osd0 ~]# vgcreate vg_db_tmp /dev/sdd
Volume group "vg_db_tmp" successfully created
[root@osd0 ~]# lvcreate -n db-sdb -l 100%FREE vg_db_tmp
Logical volume "db-sdb" created.
```
stop your osd:
```
[root@osd0 ~]# systemctl stop ceph-osd@0
```
Migrate the db to the tmp lv:
```
[root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target vg_db_tmp/db-sdb
--> Migrate to new, Source: ['--devs-source',
'/var/lib/ceph/osd/ceph-0/block.db'] Target: /dev/vg_db_tmp/db-sdb
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-2
--> Migration successful.
```
remove the old lv:
```
[root@osd0 ~]# lvremove
/dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
Do you really want to remove active logical volume
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a?
[y/n]: y
Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" successfully
removed.
```
recreate a smaller LV.
in my simplified case, I want to go from 1 to 2 db device. it means that my
old LV has to be resized down to 1/2:
```
[root@osd0 ~]# lvcreate -n osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a -l
50%FREE ceph-8d085f45-939c-4a65-a577-d21fa146d7d6
Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" created.
```
Migrate the db to the new LV:
```
[root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
--> Migrate to new, Source: ['--devs-source',
'/var/lib/ceph/osd/ceph-0/block.db'] Target:
/dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-1
--> Migration successful.
```
restart the osd:
```
[root@osd0 ~]# systemctl start ceph-osd@0
```
remove tmp lv/vg/pv:
```
[root@osd0 ~]# lvremove /dev/vg_db_tmp/db-sdb
Do you really want to remove active logical volume vg_db_tmp/db-sdb? [y/n]:
y
[root@osd0 ~]# vgremove vg_db_tmp
Volume group "vg_db_tmp" successfully removed
[root@osd0 ~]# pvremove /dev/sdd
Labels on physical volume "/dev/sdd" successfully wiped.
```
add the new osd (should be done by re-running the
playbook):
```
[root@osd0 ~]# ceph-volume lvm batch --bluestore --yes /dev/sdb /dev/sdc
--db-devices /dev/sda
--> passed data devices: 2 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM
... omitted output ...
--> ceph-volume lvm activate successful for osd ID:
1
--> ceph-volume lvm create successful for: /dev/sdc
[root@osd0 ~]#
```
new lsblk output:
```
[root@osd0 ~]# lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
8:0 0 50G 0 disk
|-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
252:0 0 25G 0 lvm
`-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--bb30e5aa--a634--4c52--8b99--a222c03c18e3
252:3 0 25G 0 lvm
sdb
8:16 0 50G 0 disk
`-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
252:1 0 50G 0 lvm
sdc
8:32 0 50G 0 disk
`-ceph--5255bfbb--f133--4954--aaa8--35e2643ed491-osd--block--9e67ea46--2409--45f8--83e1--f66a42a6d9d0
252:2 0 50G 0 lvm
sdd
8:48 0 50G 0 disk
vda
253:0 0 11G 0 disk
`-vda1
253:1 0 10G 0 part /
```
If you plan to re-run the playbook, do not forget to
update your group_vars
to reflect the new topology:
```
# cat group_vars/all
---
devices:
- /dev/sdb
- /dev/sdc
dedicated_devices:
- /dev/sda
```
You might want to use some osd flags (noout, etc..) in
order to avoid
unnecessary data migration.
Regards,
On Tue, 17 Jan 2023 at 18:39, Len Kimms
<len.kimms(a)uni-muenster.de> wrote:
> Hello all,
>
> we’ve set up a new Ceph cluster with a number of nodes which are all
> identically configured.
> There is one device vda which should act as WAL device for all other
> devices. Additionally, there are four other devices vdb, vdc, vdd, vde
> which use vda as WAL.
> The whole cluster was set up using ceph-ansible (branch stable-7.0) and
> Ceph version 17.2.0.
> Device configuration in osds.yml looks as follows:
> devices: [/dev/vdb, /dev/vdc, /dev/vdd, /dev/vde]
> bluestore_wal_devices: [/dev/vda]
> As expected vda contains four logical volumes for WAL each 1/4 of the
> overall vda disk size (‘ceph-ansible/group_vars/all.yml’ has default
> ‘block_db_size: -1’).
>
> After the initial setup, we’ve added an additional device vdf which should
> become a new OSD. The new OSD should use vda for WAL as well. This means
> the previous four WAL LVs have to be resized down to 1/5 and a new LV has
> to be added.
>
> Is it possible to retroactively add a new device to an already provisioned
> WAL device?
>
> We suspect that this is not possible because the ceph-bluestore-tool does
> not provide any way to shrink an existing BlueFS device. Only expanding is
> currently possible (
>
https://docs.ceph.com/en/quincy/man/8/ceph-bluestore-tool/).
> Simply adding the new device to the devices list and rerunning the
> playbook does nothing. And so does only setting “devices: [/dev/vdf]” and
> “bluestore_wal_devices: [/dev/vda]”. In both cases vda is rejected because
> “Insufficient space (<10 extents) on vgs” which makes sense because vda is
> already fully used by the previous four OSD WALs.
>
> Thanks for the help and kind regards.
>
>
> Additional notes:
> - We’re testing pre-production on an emulated cluster hence the device
> names vdx and unusually small device sizes.
> - The output of `lsblk` after the initial setup looks as follows:
> ```
> vda
> 252:0 0 8G 0 disk
>
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--3677c354--8d7d--4db9--a2b7--68aeb8248d40
> 253:2 0 2G 0 lvm
>
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--52d71122--b573--4077--9633--968c178612fd
> 253:4 0 2G 0 lvm
>
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--2d7eb467--cfb1--4a00--8a45--273932036599
> 253:6 0 2G 0 lvm
>
└─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--d7b13b79--219c--4002--9e92--370dff7a5376
> 253:8 0 2G 0 lvm
> vdb
> 252:16 0 8G 0 disk
>
└─ceph--49ddaa8b--5d8f--4267--85f9--5cac608ce53d-osd--block--861a53c7--ee57--4c5f--9546--1dd7cb0185ef
> 253:1 0 8G 0 lvm
> vdc
> 252:32 0 5G 0 disk
>
└─ceph--1ed9ee91--e071--4ea4--9703--d56d84d9ae0a-osd--block--8aacb66a--e29b--4b7a--8ad5--a9fb1f81c6d6
> 253:3 0 5G 0 lvm
> vdd
> 252:48 0 5G 0 disk
>
└─ceph--554cdd8b--e722--41a9--8f64--c09c857cc0dc-osd--block--4dee3e1b--b50d--4154--b2ff--80cadb67e2a0
> 253:5 0 5G 0 lvm
> vde
> 252:64 0 5G 0 disk
>
└─ceph--5d58de32--ca55--4895--8ac7--af94ee07672e-osd--block--3f563f40--0c1e--4cca--9325--d9534cceb711
> 253:7 0 5G 0 lvm
> vdf
> 252:80 0 5G 0 disk
> ```
> - Ceph status is happy and healthy:
> ```
> cluster:
> id: ff043ce8-xxxx-xxxx-xxxx-e98d073c9d09
> health: HEALTH_WARN
> mons are allowing insecure global_id reclaim
>
> services:
> mon: 3 daemons, quorum baloo-1,baloo-2,baloo-3 (age 13m)
> mgr: baloo-2(active, since 5m), standbys: baloo-3, baloo-1
> mds: 1/1 daemons up, 1 standby
> osd: 24 osds: 24 up (since 4m), 24 in (since 5m)
> rgw: 1 daemon active (1 hosts, 1 zones)
>
> data:
> volumes: 1/1 healthy
> pools: 7 pools, 177 pgs
> objects: 213 objects, 584 KiB
> usage: 98 MiB used, 138 GiB / 138 GiB avail
> pgs: 177 active+clean
> ```
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
*Guillaume Abrioux*Senior Software Engineer