January 2023 - ceph-users

Ceph-ansible: add a new HDD to an already provisioned WAL device

by Len Kimms

Hello all, we’ve set up a new Ceph cluster with a number of nodes which are all identically configured. There is one device vda which should act as WAL device for all other devices. Additionally, there are four other devices vdb, vdc, vdd, vde which use vda as WAL. The whole cluster was set up using ceph-ansible (branch stable-7.0) and Ceph version 17.2.0. Device configuration in osds.yml looks as follows: devices: [/dev/vdb, /dev/vdc, /dev/vdd, /dev/vde] bluestore_wal_devices: [/dev/vda] As expected vda contains four logical volumes for WAL each 1/4 of the overall vda disk size (‘ceph-ansible/group_vars/all.yml’ has default ‘block_db_size: -1’). After the initial setup, we’ve added an additional device vdf which should become a new OSD. The new OSD should use vda for WAL as well. This means the previous four WAL LVs have to be resized down to 1/5 and a new LV has to be added. Is it possible to retroactively add a new device to an already provisioned WAL device? We suspect that this is not possible because the ceph-bluestore-tool does not provide any way to shrink an existing BlueFS device. Only expanding is currently possible (https://docs.ceph.com/en/quincy/man/8/ceph-bluestore-tool/). Simply adding the new device to the devices list and rerunning the playbook does nothing. And so does only setting “devices: [/dev/vdf]” and “bluestore_wal_devices: [/dev/vda]”. In both cases vda is rejected because “Insufficient space (<10 extents) on vgs” which makes sense because vda is already fully used by the previous four OSD WALs. Thanks for the help and kind regards. Additional notes: - We’re testing pre-production on an emulated cluster hence the device names vdx and unusually small device sizes. - The output of `lsblk` after the initial setup looks as follows: ``` vda 252:0 0 8G 0 disk ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--3677c354--8d7d--4db9--a2b7--68aeb8248d40 253:2 0 2G 0 lvm ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--52d71122--b573--4077--9633--968c178612fd 253:4 0 2G 0 lvm ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--2d7eb467--cfb1--4a00--8a45--273932036599 253:6 0 2G 0 lvm └─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--d7b13b79--219c--4002--9e92--370dff7a5376 253:8 0 2G 0 lvm vdb 252:16 0 8G 0 disk └─ceph--49ddaa8b--5d8f--4267--85f9--5cac608ce53d-osd--block--861a53c7--ee57--4c5f--9546--1dd7cb0185ef 253:1 0 8G 0 lvm vdc 252:32 0 5G 0 disk └─ceph--1ed9ee91--e071--4ea4--9703--d56d84d9ae0a-osd--block--8aacb66a--e29b--4b7a--8ad5--a9fb1f81c6d6 253:3 0 5G 0 lvm vdd 252:48 0 5G 0 disk └─ceph--554cdd8b--e722--41a9--8f64--c09c857cc0dc-osd--block--4dee3e1b--b50d--4154--b2ff--80cadb67e2a0 253:5 0 5G 0 lvm vde 252:64 0 5G 0 disk └─ceph--5d58de32--ca55--4895--8ac7--af94ee07672e-osd--block--3f563f40--0c1e--4cca--9325--d9534cceb711 253:7 0 5G 0 lvm vdf 252:80 0 5G 0 disk ``` - Ceph status is happy and healthy: ``` cluster: id: ff043ce8-xxxx-xxxx-xxxx-e98d073c9d09 health: HEALTH_WARN mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum baloo-1,baloo-2,baloo-3 (age 13m) mgr: baloo-2(active, since 5m), standbys: baloo-3, baloo-1 mds: 1/1 daemons up, 1 standby osd: 24 osds: 24 up (since 4m), 24 in (since 5m) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 7 pools, 177 pgs objects: 213 objects, 584 KiB usage: 98 MiB used, 138 GiB / 138 GiB avail pgs: 177 active+clean ```

1 year, 3 months

2
2
0 0

[RFC] Detail view of OSD network I/O

by Nico Schottelius

Good morning ceph community, for quite some time I was wondering if it would not make sense to add an iftop alike interface to ceph that shows network traffic / iops on a per IP basis? I am aware of "rbd perf image iotop", however I am much more interested into a combined metric featuring 1) Which clients read/write to where and 2) inter OSDs traffic to see the total load on the cluster and being able to drill down. For example, the metric could look like this: -------------------------------------------------------------------------------- FROM TO Bytes/s Packets/s osd.0 [IP] -> [IP] osd.10 .. .. osd.0 [IP] -> [IP] client .. .. -------------------------------------------------------------------------------- Given that this table would be sortable by from/to/min-or-max bytes/min-or-max packets, this would allow spotting the And maybe a summarised view such as: -------------------------------------------------------------------------------- FROM IN Bytes/s OUT Bytes/s IN Packets/s OUT Packets/s osd.0 [IP] osd.10 [IP] -------------------------------------------------------------------------------- This way it would be nicely possible to identify high load. If it was combined with average/current latency, it would potentially also be able to find the bottlenecks in the cluster. From my perspective, easily combining client + intra cluster traffic would be very helpful. What do you think, does that make sense, does it already exist or how do you approach this? Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch

1 year, 3 months

1
0
0 0

Ceph Community Infrastructure Outage

by Mike Perez

Hi everyone, From November into January, we experienced a series of outages with the Ceph Community Infrastructure and its services: - Mailing lists - https://lists.ceph.io - Sepia (testing infrastructure) - https://wiki.sepia.ceph.com - https://pulpito.ceph.com - https://chacra.ceph.com - https://shaman.ceph.com - VPN to access testing services - Etherpad - https://pad.ceph.com - Images: - https://quay.ceph.io - Git mirror - https://git.ceph.com - https://ceph.io - Telemetry <https://telemetry-public.ceph.com/> These services are now mostly restored, but we did experience some data loss, notably in our mailing lists. We have restored them from backups, but subscription changes after July 2021 need to be repeated. If you subscribed or unsubscribed since then, please check your settings with the appropriate list at https://lists.ceph.io. If your posts to our mailing lists are now needing approval, that is also an indication that you need to re-subscribe to the appropriate lists. Keep an eye out for emails with subject lines such as “Your message to ceph-users(a)ceph.io awaits moderator approval”. When the community infrastructure was first created in late 2014, the VM cluster management software selected by the team came with the benefit of being widely entrenched and familiar to the lab administrators but didn't support Ceph as a storage backend at the time. As services grew, we relied more and more on its legacy storage solution, which was never migrated to Ceph. Over the last few months, this legacy storage solution had several instances of silent data corruption, rendering the VMs unbootable, taking down various services, and requiring restoration from backups in many cases. We are moving these services to a more reliable, mostly container-based, infrastructure backed by Ceph, and planning for longer-term improvements to monitoring, backups, deployment, and other pieces of the project infrastructure. This event highlights the need to better support the infrastructure. A handful of contributors have stepped up to restore these services, but we need an invested team focused. If you or your company is looking for a great way to contribute to the Ceph community, this could be your opportunity. Please contact council(a)ceph.io if you can provide time to contribute to the Ceph Community Infrastructure and would like to join the team. You can also join the upstream #sepia slack channel to participate in these discussions using this link: https://join.slack.com/t/ceph-storage/shared_invite/zt-1n1eh6po5-PF9sokUSoo… Unfortunately, these events have slowed down our upstream development and releases. We are currently working on publishing the next Pacific point release. The development freeze and release deadline for the Reef release will likely be pushed out, and more discussions to follow in the Ceph Leadership Team meetings. - The Ceph Leadership Team

1 year, 3 months

2
1
0 0

Stable erasure coding CRUSH rule for multiple hosts?

by aschmitz

Hi folks, I have a small cluster of three Ceph hosts running on Pacific. I'm trying to balance resilience and disk usage, so I've set up a k=4 m=2 pool for some bulk storage on HDD devices. With the correct placement of PGs this should allow me to take any one host offline for maintenance. I've written this CRUSH rule for that purpose: rule erasure_k4_m2_hdd_rule { id 3 type erasure step take default class hdd step choose indep 3 type host step chooseleaf indep 2 type osd step emit } This should pick three hosts, and then two OSDs from each, which at least ensures that no host has more than two OSDs. This appears to work correctly, but I'm running into an odd situation when adding additional OSDs to the cluster: sometimes the hosts flip order in a PG's set, resulting in unnecessary remapping work. For example, I have one PG that changed from OSDs [0,13,7,9,3,5] to [0,13,3,5,7,9]. (Note that the middle two and last two sets of OSDs have swapped with one another.) From a quick perusal of other PGs that are being moved, the two OSDs within a host never appear to be rearranged, but the set of hosts that are chosen may be shuffled. Is there something I'm missing that would make this rule more stable in the face of OSD addition? (I'm wondering if the host choosing component should be "firstn" rather than "indep", even though the discussion at https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#crushmapr… implies indep is preferable in EC pools.) I don't have current plans to expand beyond a three-host cluster, but if there's an alternative way to express "not more than two OSDs per host", that could be helpful as well. Any insights or suggestions would be appreciated. Thanks, aschmitz

1 year, 3 months

2
1
0 0

bidirectional rbd-mirroring

by Aielli, Elia

Hi all, I've a working couple of cluster configured with rbd mirror, Master cluster is production, Backup cluster is DR. Right now all is working good with Master configured in "tx-only" and Backup in "rx-tx". I'd like to modify Master direction to rx-tx so I'm already prepared for a failover after a disaster has happened, but while I'm doin so, I face this error and i'm stuck: ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus (stable) Ceph user able to operate on Master is rbd-mirror.master, while on Backup is rbd-mirror.backup On Master cluster I've my ceph.conf and backup.conf, and on Backup cluster I've ceph.conf and master.conf Keyrings has been copied correctly. I've change direction without any problem, but when i try to configure the peer with this command, i receive following error: root@master# rbd mirror pool peer add <my_pool> client.rbd-mirror.backup@backup rbd: multiple RX peers are not currently supported And when I check my pool info, i have the "Client:" section empty (while the one on my DR is populated with client.rbd-mirror.master" Can someone lend me a hand? Is this something I can't do or simply I'm using the wrong commands? Thanks in advance! Elia

1 year, 3 months

2
2
0 0

Ceph User + Dev Monthly January Meetup

by Neha Ojha

Hi everyone, This month's Ceph User + Dev Monthly meetup is on January 19, 15:00-16:00 UTC. There are some topics in the agenda regarding RGW backports, please feel free to add other topics to https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. Hope to see you there! Thanks, Neha

1 year, 3 months

1
0
0 0

Dashboard access to CephFS snapshots

by Robert Sander

Hi, The dashboard has a simple CephFS browser where we can set quota and snapshots for the directories. When a directory has the "other" permission bits unset, i.e. only access for user and group, the dashboard displays an error: Failed to execute CephFS opendir failed at /path/to/dir/.snap: Permission denied [Errno 13] It can be reproduced in Ceph 17.2.5 by creating the directory and using "chmod o= /path/to/dir" to not allow "other". How does the dashboard access the contents of the CephFS? It looks like the MGR uses something like the nobody account. Regards -- Robert Sander Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030-405051-43 Fax: 030-405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

1 year, 3 months

2
1
0 0

Re: opnesuse rpm repos

by Eugen Block

Hi, the last RPMs I'm aware of for openSUSE are for Pacific, and we get them from software.opensuse.org. We used Pacific for the switch to cephadm and containers in our openSUSE based ceph cluster. I don't see any opensuse directory underneath download.ceph.com, were they actually present in the past? I've never used that repo. But the general development is directed to containers. Regards, Eugen Zitat von Mazzystr <mazzystr(a)gmail.com>: > Hello, > I ran the following on OpenSuse > > cephadm add-repo --release quincy > > > The command landed /etc/zypp/repos.d/ceph.repo and contents > > [Ceph] > name=Ceph $basearch > baseurl=https://download.ceph.com/rpm-quincy/opensuse/$basearch > enabled=1 > gpgcheck=1 > gpgkey=https://download.ceph.com/keys/release.gpg > > [Ceph-noarch] > name=Ceph noarch > baseurl=https://download.ceph.com/rpm-quincy/opensuse/noarch > enabled=1 > gpgcheck=1 > gpgkey=https://download.ceph.com/keys/release.gpg > > [Ceph-source] > name=Ceph SRPMS > baseurl=https://download.ceph.com/rpm-quincy/opensuse/SRPMS > enabled=1 > gpgcheck=1 > gpgkey=https://download.ceph.com/keys/release.gpg > > Apparently these repos don't exist. > > Do we plan on building for OpenSuse or isd the plan to migrate to > containers? > > Thanks, > /Chris > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

1 year, 3 months

1
0
0 0

Mysterious HDD-Space Eating Issue

by matthew＠peregrineit.net

Hi Guys, I've got a funny one I'm hoping someone can point me in the right direction with: We've got three identical(?) Ceph nodes running 4 OSDs, Mon, Man, and iSCSI G/W each (we're only a small shop) on Rocky Linux 8 / Ceph Quincy. Everything is running fine, no bottle-necks (as far as we can see) and the Cluster is holding up very well. However, one of the boxes is constantly running out of space on the /var mount. Its 16 GiB in size, and it only takes a day or three to fill up, thus taking it's monitor service out of quorum. The thing is, I can't find *what's* taking up all the space. At first we thought it was an overly large log file, but I've done searches to find the largest files, etc, and nothing is showing up (that I can find) - ie the log files on this box are comparable with the log files on the other two boxes and the other two boxes are sitting at around 10% full (via a df-H), while the problem box is at around 85% and growing (at time of posting). Another interesting point is that the problem box, unrelated to this issue, was rebooted recently and when it came back on-line the space-issue was gone ie the /var mount was back down to around the 10% mark. This suggests to me its some sort of "temporary" journal/log/dump/whatever/? that was "reset" (cleaned-up?) via the reboot. I've had a look at the logs but I'm not sure what I should be looking for - so I don't even know if I'm looking in the *correct* logs... Anyone got any ideas? I mean, rebooting the server every couple of days is not really a practical solution, and neither is turning off the monitor service on the box, and increasing the size of the /var mount just seems like it'll postpone the issue. Any help would be greatly appreciated. Cheers Dulux-Oz

1 year, 3 months

4
4
0 0

._handle_peer_banner peer [v2:***,v1:***] is using msgr V1 protocol

by Frank Schilder

Hi all, on an octopus latest cluster I see a lot of these log messages: Jan 13 20:00:25 ceph-21 journal: 2023-01-13T20:00:25.366+0100 7f47702b8700 -1 --2- [v2:192.168.16.96:6826/5724,v1:192.168.16.96:6827/5724] >> [v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] conn(0x55c867624400 0x55c7e9dfa800 unknown :-1 s=BANNER_CONNECTING pgs=22826 cs=73364 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] is using msgr V1 protocol These addresses are on the replication network and both hosts are OSD hosts. What is the reason for these messages and how can I fix it? Thanks a lot! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

1 year, 3 months

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2023