Hi everyone, quick question regarding radosgw zone data-pool.
I’m currently planning to migrate an old data-pool that was created with
inappropriate failure-domain to a newly created pool with appropriate
failure-domain.
If I’m doing something like:
radosgw-admin zone modify —rgw-zone default —data-pool <new_pool>
Will data from the old pool be migrated to the new one or do I need to do
something else to migrate those data out of the old pool? I’ve read a lot
of mail archive with peoples willing to do that but I can’t get a clear
answer from those archives.
I’m running on nautilus release of it ever help.
Thanks a lot!
PS: This mail is a redo of the old one as I’m not sure the former one
worked (missing tags).
Hi,
I have a Ceph 16.2.12 cluster with hybrid OSDs (HDD block storage, DB/WAL
on NVME). All OSD settings are default except, cache-related settings are
as follows:
osd.14 dev bluestore_cache_autotune true
osd.14 dev bluestore_cache_size_hdd
4294967296
osd.14 dev bluestore_cache_size_ssd
4294967296 osd.14
advanced bluestore_default_buffered_write false
osd.14 dev
osd_memory_cache_min 2147483648
osd.14 basic osd_memory_target
17179869184
Other settings such as bluestore_cache_kv_ratio,
bluestore_cache_meta_ratio, etc. are default. I.e. OSD memory target is set
to 16 GB, bluestore cache is set to 4 GB for HDDs and SSDs, minimum cache
size is 2 GB.
When I dump memory pools of OSDs, bluestore cache doesn't seem to be
actively used (https://pastebin.com/EpfFp85C), despite there's plenty of
memory and the memory target is 16 GB, memory pools are around 2 GB and the
total RSS of the OSD process is ~4.8 GB.
There are 66 OSDs in the cluster and the situation is very similar with all
of them. The OSDs are being used quite actively for both reads and writes,
and I guess they could benefit from using more memory for
caching, especially considering that we have lots of RAM available on each
host.
Is there a way to increase and/or tune OSD cache memory usage? I would
appreciate any advice or pointers.
Best regards,
Zakhar
We have a large cluster on Quincy 17.2.3 with a bucket holding 8.9 million small (15~20 MiB) objects.
All the objects were multipart uploads from scripts using `aws s3 cp`
The data is static (write-once, read-many) with no manual deletions and no new writes for months.
We recently found 3 objects in this bucket that cannot be retrieved.
The symptom is exactly the same as https://tracker.ceph.com/issues/47866 and https://bugzilla.redhat.com/show_bug.cgi?id=1892644 which were fixed a long time ago.
Any form of listing (`aws s3 ls`, radosgw-admin object stat, radoslist, http head request, etc) returns good data, but the objects cannot be retrieved and rados -p ls shows the object data is missing.
Any suggestions on how to troubleshoot this further?
I have two osds. these osd are used to rgw index pool. After a lot of
stress tests, these two osds were written to 99.90%. The full ratio
(95%) did not take effect? I don't know much. Could it be that if the
osd of omap is fully stored, it cannot be limited by the full ratio?
ALSO I use ceph-bluestore-tool to expand it . Before I add a partition
. But i failed, I dont know why.
In my cluster every osd have 55GB (db val data in same device), ceph
-v is 14.2.5. can anyone give me some idear to fix it?
# ceph windows tests
PR check will be made required once regressions are fixed
windows build currently depends on gcc11 which limits use of c++20
features. investigating newer gcc or clang toolchain
# 16.2.13 release
final testing in progress
# prometheus metric regressions
https://tracker.ceph.com/issues/59505
related to previous discussion on 4/12 about quincy backports
integration test coverage needed for ceph-exporter and the mgr module
# lab update
centos/rhel tests were failing due to problematic mirrorlists
fixed in https://github.com/ceph/ceph-cm-ansible/pull/731
more sanity checks in progress at
https://github.com/ceph/ceph-cm-ansible/pull/733
# cephalocon feedback
dev summit etherpads: https://pad.ceph.com/p/cephalocon-dev-summit-2023
collect more notes here: https://pad.ceph.com/p/cephalocon-2023-brainstorm
request for dev-focused longer term discussion
could have specific user-focused and dev-focused sessions
dense conference, hard to fit everything in 3 days
could have longer component updates during conf, with time for questions
perhaps 3 days of conf, dev-specific discussions a day before (no cfp,
one big room, then option for breakout), user-feedback sessions during
the normal con
Hello all,
today I moved ceph to HEALTH_OK state :-)
1) I had to restart MGR node, then my old c-osdx hostnames goes
definitely away and all of OSDs from old machines are now
orchestrated by 'ceph orch' command.
2) I've updated ceph* packages on the osd2 node to version
17.2.6, then I tried 'cephadm adopt' command once more and voila!
It works like a charm.
I will try to configure OSDs on the node 1 to adopt WAL and DB
from prepared LVM... Maybe after upgrade to newer version of
CEPH it will be OK?
Sincerely
Jan Marek
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html
Hi all,
Over the last 2 weeks we have experienced several OSD_TOO_MANY_REPAIRS errors that we struggle to handle in a non-intrusive manner. Restarting MDS + hypervisor that accessed the object in question seems to be the only way we can clear the error so we can repair the PG and recover access. Any pointers on how to handle this issue in a more gentle way than rebooting the hypervisor and failing the MDS would be welcome!
The problem seems to only affect one specific pool (id 42) that is used for cephfs_data. This pool is our second cephfs data pool in this cluster. The data in the pool is accessible via LXC container via Samba and have the cephfs filesystem bind-mounted from hypervisor.
Ceph is recently updated to version 16.2.11 (pacific) -- kernel version is 5.13.19-6-pve on OSD-hosts/samba-containers and 5.19.17-2-pve on MDS-hosts.
The following warnings are issued:
$ ceph health detail
HEALTH_WARN 1 clients failing to respond to capability release; Too many repaired reads on 1 OSDs; Degraded data redundancy: 1/2648430
090 objects degraded (0.000%), 1 pg degraded; 1 slow ops, oldest one blocked for 608 sec, osd.34 has slow ops
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
mds.hk-cephnode-65(mds.0): Client hk-cephnode-56 failing to respond to capability release client_id: 9534859837
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs
osd.34 had 9936 reads repaired
[WRN] PG_DEGRADED: Degraded data redundancy: 1/2648430090 objects degraded (0.000%), 1 pg degraded
pg 42.e2 is active+recovering+degraded+repair, acting [34,275,284]
[WRN] SLOW_OPS: 1 slow ops, oldest one blocked for 608 sec, osd.34 has slow ops
The logs for OSD.34 are flooded with these messages:
root@hk-cephnode-53:~# tail /var/log/ceph/ceph-osd.34.log
2023-04-26T11:41:00.760+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284
2023-04-26T11:41:00.784+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head
2023-04-26T11:41:00.812+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head
2023-04-26T11:41:00.812+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284
2023-04-26T11:41:00.824+0200 7f03a821f700 -1 osd.34 1352563 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.9534859837.0:20412906 42.e2 42:4703efac:::10003d86a99.00000001:head [read 0~1048576 [307@0] out=1048576b] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected e1352553)
2023-04-26T11:41:00.824+0200 7f03a821f700 0 log_channel(cluster) log [WRN] : 1 slow requests (by type [ 'delayed' : 1 ] most affected pool [ 'qa-cephfs_data' : 1 ])
2023-04-26T11:41:00.840+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head
2023-04-26T11:41:00.864+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head
2023-04-26T11:41:00.864+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284
2023-04-26T11:41:00.888+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head
We have tried the following:
- Restarting the OSD in question clears the error for a few seconds but then we also we get OSD_TOO_MANY_REPAIRS on OSDs with PGs that holds the object that have blocked I/O.
- Trying to repair the PG seems to restart every 10 second and not actually do anything/progressing. (Is there a way to check repair progress?)
- Restarting the MDS and hypervisor clears the error (the hypervisor hangs for several minutes before timing out). However if the object is requested again the error reoccurs. If we don't access the object we are able to eventually repair the PG.
- Occasionally setting the primary-affinity to 0 for the primary OSD in the PG clears the error after restarting all affected OSD and we are able to repair the PG (unless the object is accessed during recovery) and access to the object is OK afterwards.
- Finding and deleting the file pointing to the object (10003d86a99) and restarting OSDs will clear the error.
- Killing the samba process that accessed the object does not clear the SLOW_OPS, and hence the error prevail
- Normal scrubs have revealed a handfull of other PGs in the same pool (id 42) that are damaged and we are doing repairs without any problems.
- We believe MDS_CLIENT_LATE_RELEASE and SLOW_OPS errors are symptoms of the fact that the I/O are blocked.
- We have verified that there are no SMART errors of any kind on any of our disks in the cluster.
- If we don't handle this issue rather promptly, we experience full lockup of the samba container and rebooting hypervisor seems to be the only cure. Trying to force unmount and remount cephfs does not help.
This have now happened 6-7 times over the last 2 weeks and we suspect that a hardware or memory error on one of our nodes may have caused the objects to be written to disk with bad checksums. We have replaced the mainboard in one of our nodes that we might think is the culprit and are currently testing the memory. Can these random checksum errors be caused by anything else that we should investigate? It's a bit suspicious that the error only occurs on one specific pool? If the mainboard are to blame we should see these errors in more pools by now?
Regardless we are stumped by how Ceph handles this error. Checksum-errors should not leave clients hanging like this? Should this be considered a bug? Is there a way to cancel the blocking I/O request to clear the error? And why is the PG flapping between active+recovering+degraded+repair, active+recovering+repair, active+clean+repair every few seconds?
Any ideas on how to gracefully battle this problem? Thanks!
--thomas
Thomas Hukkelberg
thomas(a)hovedkvarteret.no
Dear Ceph users,
my cluster is made of very old machines on a Gbit ethernet. I see that
sometimes some OSDs are marked down due to slow networking, especially
on heavy network load like during recovery. This causes problems, for
example PGs keeps being deactivated and activated as the OSDs are marked
down and up (at least to my best understanding). So I'd need to know if
there is some way to increase the timeout after which an OSD is marked
down, to cope with my slow network.
Thanks,
Nicola
Hi to all
Using ceph 17.2.5 i have 3 pgs in stuck state
ceph pg map 8.2a6
osdmap e32862 pg 8.2a6 (8.2a6) -> up [88,100,59] acting [59,100]
looking at it ho 88 ,100 and 59 i got that
ceph pg ls-by-osd osd.100 | grep 8.2a6
8.2a6 211004 209089 0 0 174797925205 0 0 7075 active+undersized+degraded+remapped+backfilling 21m 32862'1540291 32862:3387785 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+0000 2023-03-12T08:08:00.903727+0000 6839 queued for deep scrub
ceph pg ls-by-osd osd.59 | grep 8.2a6
8.2a6 211005 209084 0 0 174798941087 0 0 7076 active+undersized+degraded+remapped+backfilling 22m 32862'1540292 32862:3387798 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+0000 2023-03-12T08:08:00.903727+0000 6839 queued for deep scrub
BUT
ceph pg ls-by-osd osd.88 | grep 8.2a6 ---> NONE
it is missing .... how to proceed ?
Best regards