Hi everyone,
The User + Dev Monthly Meeting is happening tomorrow, July 20th at 2:00 PM
UTC at this link:
https://meet.jit.si/ceph-user-dev-monthly
Please add any topics you'd like to discuss to the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
Thanks,
Laura Flores
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804
Hi folks,
Today we discussed:
- Reef is almost ready! The remaining issues are tracked in [1]. In
particular, an epel9 package is holding back the release.
- Vincent Hsu, Storage Group CTO of IBM, presented a proposal outline
for a Ceph Foundation Client Council. This council would be composed
of 10-25 invited significant operators or users of Ceph. The function
of the council is to provide essential feedback on use-cases,
pain-points, and successes arising during their use of Ceph. This
feedback will be used to steer development and initiatives. More
information on this will be forthcoming once the proposal is
finalized.
The monthly user <-> dev meeting will be reevaluated in light of
this, possibly continuing on as usual.
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hi,
I noticed an incredible high performance drop with mkfs.ext4 (as well as mkfs.xfs) when setting (almost) "any" value for rbd_qos_write_bps_limit (or rbd_qos_bps_limit).
Baseline: 4TB rbd volume rbd_qos_write_bps_limit = 0
mkfs.ext4:
real 0m6.688s
user 0m0.000s
sys 0m0.006s
50GB/s: 4TB rbd volume rbd_qos_write_bps_limit = 53687091200
mkfs.ext4:
real 1m22.217s
user 0m0.009s
sys 0m0.000s
5GB/s: 4TB rbd volume rbd_qos_write_bps_limit = 5368709120
mkfs.ext4:
real 13m39.770s
user 0m0.008s
sys 0m0.034s
500MB/s: 4TB rbd volume rbd_qos_write_bps_limit = 524288000
mkfs.ext4:
test still runing... I can provide the result if needed.
The tests are running on a client vm (Ubuntu 22.04) using Qemu/libvirt.
Using the same values with Qemu/libvirt QoS does not affect mkfs performance.
https://libvirt.org/formatdomain.html#block-i-o-tuning
Ceph Version: 16.2.11
Qemu: 6.2.0
Libvirt: 8.0.0
Kernel (hypervisor host): 5.19.0-35-generic
librbd1 (hypervisor host): 17.2.5
Could anyone pls confirm and explain what's going on?
All the best,
Florian
by fb2cd0fc-933c-4cfe-b534-93d67045a088@simplelogin.com
Starting on Friday, as part of adding a new pod of 12 servers, we initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something about the resulting large backfill is causing librbd to hang, requiring server restarts. The volumes are showing buffer i/o errors when this happens.We are currently using hybrid OSDs with both SSD and traditional spinning disks. The current status of the cluster is:
ceph --version
ceph version 14.2.22
Cluster Kernel 5.4.49-200
{
"mon": {
"ceph version 14.2.22 nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.22 nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.21 nautilus (stable)": 368,
"ceph version 14.2.22 (stable)": 2055
},
"mds": {},
"rgw": {
"ceph version 14.2.22 (stable)": 7
},
"overall": {
"ceph version 14.2.21 (stable)": 368,
"ceph version 14.2.22 (stable)": 2068
}
}
HEALTH_WARN, noscrub,nodeep-scrub flag(s) set.
pgs: 6815703/11016906121 objects degraded (0.062%) 2814059622/11016906121
objects misplaced (25.543%).
The client servers are on 3.10.0-1062.1.2.el7.x86_6
We have found a couple of issues that look relevant:
https://tracker.ceph.com/issues/19385https://tracker.ceph.com/issues/18807
Has anyone experienced anything like this before? Does anyone have any recommendations as to settings that can help alleviate this while the backfill completes?
An example of the buffer ii/o errors:
Jul 17 06:36:08 host8098 kernel: buffer_io_error: 22 callbacks suppressed
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 3, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-5, logical block 511984, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657728, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657729, async page read
Hi,
The typical EOL date (2023-06-01) has already passed for Pacific. Just
wondering if there's going to be another Pacific point release (16.2.14) in
the pipeline.
--
Regards,
Ponnuvel P
Hello ceph users,
my ceph configuration is
- ceph version 17.2.5 on ubuntu 20.04
- stretch mode
- 2 rooms with OSDs and monitors + additional room for the tiebreaker monitor
- 4 OSD servers in each room
- 6 OSDs per OSD server
- ceph installation/administration is manual (without ansible, orch... or any other tool like this)
Ceph health is currently OK.
Raw usage is around 60%,
Pools usage is below 75%
I need to replace all OSD disks in the cluster with larger capacity disks (500G to 1000G). So the eventual configuration will contain the same number of OSDs and servers.
I understand I can replace OSDs one by one, following the documented procedure (removing old and adding new OSD to the configuration) and waiting for health OK. But in this case, ceph will probably copy data around like crazy after each step. So, my question is:
What is the recommended procedure in this case of replacing ALL disks and keeping the ceph operational during the upgrade?
In particular:
Should I use any of "nobackfill, norebalance, norecover..." flags during the process? If yes, which?
Should I do one OSD at the time, server at the time or even room at the time?
Thanks for the suggestions.
regards,
Zoran
Hi Experts,
We plan to setup a Ceph Object to support a S3 workload, that will need to
delete 100M file daily via lifecycle.
Appreciate your suggestion and setting to handle this kind of scenario
Best Regards,
Ha
We did have a peering storm, we're past that portion of the backfill and still experiencing new instances of rbd volumes hanging. It is for sure not just the peering storm.
We've got 22.184% objects misplaced yet, with a bunch of pgs left to backfill (like 75k). Our rbd poll is using about 1.7PiB of storage, so we're looking at like 370TiB yet to backfill, rough estimate. This specific pool is using replicated encoding, with size=3.
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 21 PiB 11 PiB 10 PiB 10 PiB 48.73
TOTAL 21 PiB 11 PiB 10 PiB 10 PiB 48.73
POOLS:
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
pool1 4 32768 574 TiB 147.16M 1.7 PiB 68.87 260 TiB
We did see a lot of rbd volumes that hung, often giving the buffer i/o errors previously sent - whether that was the peering storm or backfills is uncertain. As suggested, we've already been detaching/reattaching the rbd volumes, pushing the primary active osd for pgs to another, and sometimes rebooting the kernel on the vm to clear the io queue. A combination of those brings the rbd volume block device back for a while.
We're no longer in a peering storm and we're seeing the rbd volumes going into an unresponsive state again - including osds where they were unresponsive, we did things and got them responsive, and then they went unresponsive again. All pgs are in an active state, some active+remapped+backfilling, some active+undersized+remapped+backfilling, etc.
We also run the object gateway off the same cluster with the same backfill, the object gateway is not experiencing issues. Also the osds patricipating in the backfill are not saturated with i/o, or seeing abnormal load for our usual backfill operations.
But with the continuing backfill, we're seeing rbd volumes on active pgs going back into a blocked state. We can do about the same with detaching the volume / bouncing the pg to a new primary acting osd, but we'd rather have these stop going unresponsive in the first place. Any suggestions towards that direction are greatly appreciated.
Hi,
After having set up RadosGW with keystone authentication the cluster shows this warning:
# ceph health detail
HEALTH_WARN Failed to set 1 option(s)
[WRN] CEPHADM_FAILED_SET_OPTION: Failed to set 1 option(s)
Failed to set rgw.fra option rgw_keystone_implicit_tenants: config set failed: error parsing value: 'True' is not one of the permitted values: false, true, swift, s3, both, 0, 1, none retval: -22
I might have made a typo but that has been corrected.
# ceph config dump |grep rgw_keystone_implicit_tenants
client.rgw.fra.controller1.lushdc advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller1.pznmuf advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller1.tdrqot advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller2.ndxaet advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller2.rodqxh advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller2.wyhjuk advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller3.boasgr advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller3.lkczbl advanced rgw_keystone_implicit_tenants true *
client.rgw.fra.controller3.qxctee advanced rgw_keystone_implicit_tenants true *
# ceph --version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Any idea on how to fix this issue?
Thanks,
Arnoud.