Hello all,
we’ve set up a new Ceph cluster with a number of nodes which are all identically configured.
There is one device vda which should act as WAL device for all other devices. Additionally, there are four other devices vdb, vdc, vdd, vde which use vda as WAL.
The whole cluster was set up using ceph-ansible (branch stable-7.0) and Ceph version 17.2.0.
Device configuration in osds.yml looks as follows:
devices: [/dev/vdb, /dev/vdc, /dev/vdd, /dev/vde]
bluestore_wal_devices: [/dev/vda]
As expected vda contains four logical volumes for WAL each 1/4 of the overall vda disk size (‘ceph-ansible/group_vars/all.yml’ has default ‘block_db_size: -1’).
After the initial setup, we’ve added an additional device vdf which should become a new OSD. The new OSD should use vda for WAL as well. This means the previous four WAL LVs have to be resized down to 1/5 and a new LV has to be added.
Is it possible to retroactively add a new device to an already provisioned WAL device?
We suspect that this is not possible because the ceph-bluestore-tool does not provide any way to shrink an existing BlueFS device. Only expanding is currently possible (https://docs.ceph.com/en/quincy/man/8/ceph-bluestore-tool/).
Simply adding the new device to the devices list and rerunning the playbook does nothing. And so does only setting “devices: [/dev/vdf]” and “bluestore_wal_devices: [/dev/vda]”. In both cases vda is rejected because “Insufficient space (<10 extents) on vgs” which makes sense because vda is already fully used by the previous four OSD WALs.
Thanks for the help and kind regards.
Additional notes:
- We’re testing pre-production on an emulated cluster hence the device names vdx and unusually small device sizes.
- The output of `lsblk` after the initial setup looks as follows:
```
vda 252:0 0 8G 0 disk
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--3677c354--8d7d--4db9--a2b7--68aeb8248d40 253:2 0 2G 0 lvm
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--52d71122--b573--4077--9633--968c178612fd 253:4 0 2G 0 lvm
├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--2d7eb467--cfb1--4a00--8a45--273932036599 253:6 0 2G 0 lvm
└─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--d7b13b79--219c--4002--9e92--370dff7a5376 253:8 0 2G 0 lvm
vdb 252:16 0 8G 0 disk
└─ceph--49ddaa8b--5d8f--4267--85f9--5cac608ce53d-osd--block--861a53c7--ee57--4c5f--9546--1dd7cb0185ef 253:1 0 8G 0 lvm
vdc 252:32 0 5G 0 disk
└─ceph--1ed9ee91--e071--4ea4--9703--d56d84d9ae0a-osd--block--8aacb66a--e29b--4b7a--8ad5--a9fb1f81c6d6 253:3 0 5G 0 lvm
vdd 252:48 0 5G 0 disk
└─ceph--554cdd8b--e722--41a9--8f64--c09c857cc0dc-osd--block--4dee3e1b--b50d--4154--b2ff--80cadb67e2a0 253:5 0 5G 0 lvm
vde 252:64 0 5G 0 disk
└─ceph--5d58de32--ca55--4895--8ac7--af94ee07672e-osd--block--3f563f40--0c1e--4cca--9325--d9534cceb711 253:7 0 5G 0 lvm
vdf 252:80 0 5G 0 disk
```
- Ceph status is happy and healthy:
```
cluster:
id: ff043ce8-xxxx-xxxx-xxxx-e98d073c9d09
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum baloo-1,baloo-2,baloo-3 (age 13m)
mgr: baloo-2(active, since 5m), standbys: baloo-3, baloo-1
mds: 1/1 daemons up, 1 standby
osd: 24 osds: 24 up (since 4m), 24 in (since 5m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 7 pools, 177 pgs
objects: 213 objects, 584 KiB
usage: 98 MiB used, 138 GiB / 138 GiB avail
pgs: 177 active+clean
```
Good morning ceph community,
for quite some time I was wondering if it would not make sense to add an
iftop alike interface to ceph that shows network traffic / iops on a per
IP basis?
I am aware of "rbd perf image iotop", however I am much more interested
into a combined metric featuring 1) Which clients read/write to where
and 2) inter OSDs traffic to see the total load on the cluster and being
able to drill down.
For example, the metric could look like this:
--------------------------------------------------------------------------------
FROM TO Bytes/s Packets/s
osd.0 [IP] -> [IP] osd.10 .. ..
osd.0 [IP] -> [IP] client .. ..
--------------------------------------------------------------------------------
Given that this table would be sortable by from/to/min-or-max
bytes/min-or-max packets, this would allow spotting the
And maybe a summarised view such as:
--------------------------------------------------------------------------------
FROM IN Bytes/s OUT Bytes/s IN Packets/s OUT Packets/s
osd.0 [IP]
osd.10 [IP]
--------------------------------------------------------------------------------
This way it would be nicely possible to identify high load.
If it was combined with average/current latency, it would potentially
also be able to find the bottlenecks in the cluster.
From my perspective, easily combining client + intra cluster traffic
would be very helpful.
What do you think, does that make sense, does it already exist or how do
you approach this?
Best regards,
Nico
--
Sustainable and modern Infrastructures by ungleich.ch
Hi everyone,
From November into January, we experienced a series of outages with the
Ceph Community Infrastructure and its services:
-
Mailing lists
-
https://lists.ceph.io
-
Sepia (testing infrastructure)
-
https://wiki.sepia.ceph.com
-
https://pulpito.ceph.com
-
https://chacra.ceph.com
-
https://shaman.ceph.com
-
VPN to access testing services
-
Etherpad
-
https://pad.ceph.com
-
Images:
-
https://quay.ceph.io
-
Git mirror
-
https://git.ceph.com
-
https://ceph.io
-
Telemetry <https://telemetry-public.ceph.com/>
These services are now mostly restored, but we did experience some data
loss, notably in our mailing lists. We have restored them from backups, but
subscription changes after July 2021 need to be repeated. If you subscribed
or unsubscribed since then, please check your settings with the appropriate
list at https://lists.ceph.io. If your posts to our mailing lists are now
needing approval, that is also an indication that you need to re-subscribe
to the appropriate lists.
Keep an eye out for emails with subject lines such as “Your message to
ceph-users(a)ceph.io awaits moderator approval”.
When the community infrastructure was first created in late 2014, the VM
cluster management software selected by the team came with the benefit of
being widely entrenched and familiar to the lab administrators but didn't
support Ceph as a storage backend at the time. As services grew, we relied
more and more on its legacy storage solution, which was never migrated to
Ceph. Over the last few months, this legacy storage solution had several
instances of silent data corruption, rendering the VMs unbootable, taking
down various services, and requiring restoration from backups in many cases.
We are moving these services to a more reliable, mostly container-based,
infrastructure backed by Ceph, and planning for longer-term improvements to
monitoring, backups, deployment, and other pieces of the project
infrastructure.
This event highlights the need to better support the infrastructure. A
handful of contributors have stepped up to restore these services, but we
need an invested team focused.
If you or your company is looking for a great way to contribute to the Ceph
community, this could be your opportunity. Please contact council(a)ceph.io
if you can provide time to contribute to the Ceph Community Infrastructure
and would like to join the team. You can also join the upstream #sepia
slack channel to participate in these discussions using this link:
https://join.slack.com/t/ceph-storage/shared_invite/zt-1n1eh6po5-PF9sokUSoo…
Unfortunately, these events have slowed down our upstream development and
releases. We are currently working on publishing the next Pacific point
release. The development freeze and release deadline for the Reef release
will likely be pushed out, and more discussions to follow in the Ceph
Leadership Team meetings.
- The Ceph Leadership Team
Hi folks,
I have a small cluster of three Ceph hosts running on Pacific. I'm
trying to balance resilience and disk usage, so I've set up a k=4 m=2
pool for some bulk storage on HDD devices.
With the correct placement of PGs this should allow me to take any one
host offline for maintenance. I've written this CRUSH rule for that purpose:
rule erasure_k4_m2_hdd_rule {
id 3
type erasure
step take default class hdd
step choose indep 3 type host
step chooseleaf indep 2 type osd
step emit
}
This should pick three hosts, and then two OSDs from each, which at
least ensures that no host has more than two OSDs.
This appears to work correctly, but I'm running into an odd situation
when adding additional OSDs to the cluster: sometimes the hosts flip
order in a PG's set, resulting in unnecessary remapping work.
For example, I have one PG that changed from OSDs [0,13,7,9,3,5] to
[0,13,3,5,7,9]. (Note that the middle two and last two sets of OSDs have
swapped with one another.) From a quick perusal of other PGs that are
being moved, the two OSDs within a host never appear to be rearranged,
but the set of hosts that are chosen may be shuffled.
Is there something I'm missing that would make this rule more stable in
the face of OSD addition? (I'm wondering if the host choosing component
should be "firstn" rather than "indep", even though the discussion at
https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#crushmapr…
implies indep is preferable in EC pools.)
I don't have current plans to expand beyond a three-host cluster, but if
there's an alternative way to express "not more than two OSDs per host",
that could be helpful as well.
Any insights or suggestions would be appreciated.
Thanks,
aschmitz
Hi all,
I've a working couple of cluster configured with rbd mirror, Master cluster
is production, Backup cluster is DR. Right now all is working good with
Master configured in "tx-only" and Backup in "rx-tx".
I'd like to modify Master direction to rx-tx so I'm already prepared for a
failover after a disaster has happened, but while I'm doin so, I face this
error and i'm stuck:
ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus
(stable)
Ceph user able to operate on Master is rbd-mirror.master, while on Backup
is rbd-mirror.backup
On Master cluster I've my ceph.conf and backup.conf, and on Backup cluster
I've ceph.conf and master.conf
Keyrings has been copied correctly.
I've change direction without any problem, but when i try to configure the
peer with this command, i receive following error:
root@master# rbd mirror pool peer add <my_pool>
client.rbd-mirror.backup@backup
rbd: multiple RX peers are not currently supported
And when I check my pool info, i have the "Client:" section empty (while
the one on my DR is populated with client.rbd-mirror.master"
Can someone lend me a hand?
Is this something I can't do or simply I'm using the wrong commands?
Thanks in advance!
Elia
Hi everyone,
This month's Ceph User + Dev Monthly meetup is on January 19, 15:00-16:00
UTC. There are some topics in the agenda regarding RGW backports, please
feel free to add other topics to
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes.
Hope to see you there!
Thanks,
Neha
Hi,
The dashboard has a simple CephFS browser where we can set
quota and snapshots for the directories.
When a directory has the "other" permission bits unset, i.e.
only access for user and group, the dashboard displays an error:
Failed to execute CephFS
opendir failed at /path/to/dir/.snap: Permission denied [Errno 13]
It can be reproduced in Ceph 17.2.5 by creating the directory
and using "chmod o= /path/to/dir" to not allow "other".
How does the dashboard access the contents of the CephFS?
It looks like the MGR uses something like the nobody account.
Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de
Tel: 030-405051-43
Fax: 030-405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Hi Guys,
I've got a funny one I'm hoping someone can point me in the right direction with:
We've got three identical(?) Ceph nodes running 4 OSDs, Mon, Man, and iSCSI G/W each (we're only a small shop) on Rocky Linux 8 / Ceph Quincy. Everything is running fine, no bottle-necks (as far as we can see) and the Cluster is holding up very well.
However, one of the boxes is constantly running out of space on the /var mount. Its 16 GiB in size, and it only takes a day or three to fill up, thus taking it's monitor service out of quorum.
The thing is, I can't find *what's* taking up all the space. At first we thought it was an overly large log file, but I've done searches to find the largest files, etc, and nothing is showing up (that I can find) - ie the log files on this box are comparable with the log files on the other two boxes and the other two boxes are sitting at around 10% full (via a df-H), while the problem box is at around 85% and growing (at time of posting).
Another interesting point is that the problem box, unrelated to this issue, was rebooted recently and when it came back on-line the space-issue was gone ie the /var mount was back down to around the 10% mark.
This suggests to me its some sort of "temporary" journal/log/dump/whatever/? that was "reset" (cleaned-up?) via the reboot.
I've had a look at the logs but I'm not sure what I should be looking for - so I don't even know if I'm looking in the *correct* logs...
Anyone got any ideas? I mean, rebooting the server every couple of days is not really a practical solution, and neither is turning off the monitor service on the box, and increasing the size of the /var mount just seems like it'll postpone the issue.
Any help would be greatly appreciated.
Cheers
Dulux-Oz
Hi all,
on an octopus latest cluster I see a lot of these log messages:
Jan 13 20:00:25 ceph-21 journal: 2023-01-13T20:00:25.366+0100 7f47702b8700 -1 --2- [v2:192.168.16.96:6826/5724,v1:192.168.16.96:6827/5724] >> [v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] conn(0x55c867624400 0x55c7e9dfa800 unknown :-1 s=BANNER_CONNECTING pgs=22826 cs=73364 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] is using msgr V1 protocol
These addresses are on the replication network and both hosts are OSD hosts.
What is the reason for these messages and how can I fix it?
Thanks a lot!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14