July 2023 - ceph-users - lists.ceph.io

upload-part-copy gets access denied after cluster upgrade

by Motahare S

Hello everyone, We have a ceph cluster which was recently updated from octopus(15.2.12) to pacific(16.2.13). There has been a problem in multi part upload, which is, when doing UPLOAD_PART_COPY from a valid and existing previously uploaded part, it gets 403, ONLY WHEN IT'S CALLED BY SERVICE-USER. The same scenario gets a 200 response by a full-access sub-user, and both sub-user and service-user get 200 on the same scenario in octopus version. The policy for service user access is as below: { "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:::user/wid:suserid" }, "Action": "*", "Resource": [ "arn:aws:s3:::bucketname", "arn:aws:s3:::bucketname/*" ] } ] } Note that this very service-user can perform a multi-part upload without any problem on both versions, only the upload_part_copy and only on pacific, it gets 403; which makes it unlikely to be an access problem. Has anyone encountered this issue? I performed multi-part upload using boto3 but there has been the same issue on other clients as well. regards

10 months

2
1
0 0

OSD tries (and fails) to scrub the same PGs over and over

by Vladimir Brik

I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that PG's primary OSD it looks like every once in a while it attempts (and apparently fails) to scrub that PG, along with two others, over and over. For example: 2023-07-19T16:26:07.082 ... 24.3ea scrub starts 2023-07-19T16:26:10.284 ... 27.aae scrub starts 2023-07-19T16:26:11.169 ... 24.aa scrub starts 2023-07-19T16:26:12.153 ... 24.3ea scrub starts 2023-07-19T16:26:13.346 ... 27.aae scrub starts 2023-07-19T16:26:16.239 ... 24.aa scrub starts ... Lines like that are repeated throughout the log file. Has anyone seen something similar? How can I debug this? I am running 17.2.5 Vlad

10 months

2
3
0 0

July Ceph Science Virtual User Group

by Kevin Hrpcek

Hey all, We will be having a Ceph science/research/big cluster call on Wednesday July 26th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members mostly from hpc/htc/research environments where we discuss whatever is on our minds regarding ceph. Updates, outages, features, maintenance, etc...there is no set presenter but I do attempt to keep the conversation lively. NOTE: The change to using Jitsi for the meeting. We are no longer using the bluejeans meeting links. The ceph calendar event does not yet reflect this and has the wrong day as well. Pad URL: https://pad.ceph.com/p/Ceph_Science_User_Group_20230726 Ceph calendar event details: July 26th, 2023 14:00 UTC 4pm Central European 9am Central US Description: Main pad for discussions: https://pad.ceph.com/p/Ceph_Science_User_Group_Index Meetings will be recorded and posted to the Ceph Youtube channel. To join the meeting on a computer or mobile phone: https://meet.jit.si/ceph-science-wg Kevin -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS/TROPICS Space Science & Engineering Center University of Wisconsin-Madison

10 months

1
0
0 0

what is the point of listing "auth: unable to find a keyring on /etc/ceph/ceph.client nfs-ganesha

by Marc

I need some help understanding this. I have configured nfs-ganesha for cephfs using something like this in ganesha.conf FSAL { Name = CEPH; User_Id = "testing.nfs"; Secret_Access_Key = "AAAAAAAAAAAAAAA=="; } But I contstantly have these messages in de ganesha logs, 6x per user_id auth: unable to find a keyring on /etc/ceph/ceph.client.testing I thought this was a ganesha authentication order issue, but they[1] say it has to do with ceph. I am still on Nautilus so maybe this has been fixed in newer releases. I still have a hard time understanding why this is an issue of ceph (libraries). [1] https://github.com/nfs-ganesha/nfs-ganesha/issues/974

10 months

2
3
0 0

Regressed tail (p99.99+) write latency for RBD workloads in Quincy (vs. pre-Pacific)?

by Tyler Stachecki

Hi all, Has anyone else noticed any p99.99+ tail latency regression for RBD workloads in Quincy vs. pre-Pacific, i.e., before the kv_onode cache existed? Some notes from what I have seen thus far: * Restarting OSDs temporarily resolves the problem... then as activity accrues over time, the problem becomes appreciably worse * In comparing profiles of running OSDs, I've noticed that the bluestore block allocators are comparatively more active than in old releases (even though the fragmentation scores of the Quincy OSDs are far better in this case) * The new kv_onode cache often looks like it is often bursting at the seams, whereas the kv/meta/data caches have breathing room I am becoming increasingly confident that the observations are related, though I have not dived enough into bluestore to reason about how/when onodes are allocated on disk to complete the circle. Anyways, I am posting this to see if perhaps the defaults for the priority cache for the new kv_onode slab needs a slight nudge. You can observe them on OSDs with debug_bluestore 20/20 for a second and grepping for cache_size. Cheers, Tyler

10 months

1
0
0 0

index object in shard begins with hex 80

by Christopher Durham

Hi, I am using ceph 17.2.6 on rocky linux 8. I got a large omap object warning today. Ok, So I tracked it down to a shard for a bucket in the index pool of an s3 pool. However, when lisitng the omapkeys with: # rados -p pool.index listomapkeys .dir.zone.bucketid.xx.indexshardnumber it is clear that the problem is caused by many omapkeys with the following name format: <80>0_00004771163.3444695458.6 A hex dump of the output of the listomapkeys command above indicates that the first 'character' is indeed hex 80, but as there is no equivalent ascii for hex 80, I am not sure how to 'get at' those keys to see the values, delete them, etc. The index keys not of the format above appear to be fine, indicating s3 object names as expected. The rest of the index shards for the bucket are reasonable and have less than osd_deep_scrub_large_omap_object_key_threshold index objects , and the overall total of objects in the bucket is way less than osd_deep_scrub_large_omap_object_key_threshold*num_shards. These weird objects seem to be created occasionally.........????? Yes, the bucket is used heavily. Any advice here? -Chris

10 months

2
4
0 0

ceph-mgr ssh connections left open

by Wyll Ingersoll

Every night at midnight, our ceph-mgr daemons open up ssh connections to the other nodes and then leaves them open. Eventually they become zombies. I cannot figure out what module is causing this or how to turn it off. If left unchecked over days/weeks, the zombie ssh connections just keep growing, the only way to clear them is to restart ceph-mgr services. Any idea what is causing this or how it can be disabled? Example: ceph 1350387 1350373 7 Jul17 ? 01:19:39 /usr/bin/ceph-mgr -n mgr.mon03 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix ceph 1350548 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.11 sudo python ceph 1350549 1350387 0 Jul17 ? 00:00:02 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.41 sudo python ceph 1350550 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.42 sudo python ceph 1350551 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.22 sudo python ceph 1350552 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.23 sudo python root 1350553 902 0 Jul17 ? 00:00:00 sshd: xxx [priv] ceph 1350554 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.105 sudo pytho ceph 1350556 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.21 sudo python ceph 1350557 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.101 sudo pytho ceph 1350559 1350387 0 Jul17 ? 00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx(a)10.4.1.102 sudo pytho Our current list of ceph-mgr modules enabled and default is: "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "cephadm", "dashboard", "diskprediction_local", "nfs", "prometheus", "restful" ],

10 months

2
3
0 0

Re: Workload that delete 100 M object daily via lifecycle

by Hoan Nguyen Van

You can enable debug lc to test and tuning rgw lc parameter.

10 months

3
2
0 0

rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

by Christian Rohmann

Hey ceph-users, I setup a multisite sync between two freshly setup Octopus clusters. In the first cluster I created a bucket with some data just to test the replication of actual data later. I then followed the instructions on https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site… to add a second zone. Things went well and both zones are now happily reaching each other and the API endpoints are talking. Also the metadata is in sync already - both sides are happy and I can see bucket listings and users are "in sync": > # radosgw-admin sync status > realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst) > zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra) > zone 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn) > metadata sync no sync (zone is master) > data sync source: c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1) > init > full sync: 128/128 shards > full sync: 0 buckets to sync > incremental sync: 0/128 shards > data is behind on 128 shards > behind shards: [0...127] > and on the other side ... > # radosgw-admin sync status > realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst) > zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra) > zone c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1) > metadata sync syncing > full sync: 0/64 shards > incremental sync: 64/64 shards > metadata is caught up with master > data sync source: 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn) > init > full sync: 128/128 shards > full sync: 0 buckets to sync > incremental sync: 0/128 shards > data is behind on 128 shards > behind shards: [0...127] > also the newly created buckets (read: their metadata) is synced. What is apparently not working in the sync of actual data. Upon startup the radosgw on the second site shows: > 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: start > 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: realm > epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736 > 2021-06-25T16:15:11.525+0000 7fe71dff3700 0 > RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read remote > data log shards > also when issuing # radosgw-admin data sync init --source-zone obst-rgn it throws > 2021-06-25T16:20:29.167+0000 7f87c2aec080 0 > RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data > log shards Does anybody have any hints on where to look for what could be broken here? Thanks a bunch, Regards Christian

10 months

3
4
0 0

cephadm does not redeploy OSD

by Luis Domingues

Hi, We are running a ceph cluster managed with cephadm v16.2.13. Recently we needed to change a disk, and we replaced it with: ceph orch osd rm 37 --replace. It worked fine, the disk was drained and the OSD marked as destroy. However, after changing the disk, no OSD was created. Looking to the db device, the partition for db for OSD 37 was still there. So we destroyed it using: ceph-volume lvm zap --osd-id=37 --destroy. But we still have no OSD redeployed. Here we have our spec: --- service_type: osd service_id: osd-hdd placement: label: osds spec: data_devices: rotational: 1 encrypted: true db_devices: size: '1TB:2TB' db_slots: 12 And the disk looks good: HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357 1600G 12m ago LVM detected, locked node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T Yes 12m ago And VG on db_device looks to have enough space: ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t 173.91g If I remove the db_devices and db_slots from the specs, and do a dry run, the orchestrator seems to see the new disk as available: ceph orch apply -i osd_specs.yml --dry-run WARNING! Dry-Runs are snapshots of a certain point in time and are bound to the current inventory setup. If any of these conditions change, the preview will be invalid. Please make sure to have a minimal timeframe between planning and applying the specs. #################### SERVICESPEC PREVIEWS #################### +---------+------+--------+-------------+ |SERVICE |NAME |ADD_TO |REMOVE_FROM | +---------+------+--------+-------------+ +---------+------+--------+-------------+ ################ OSDSPEC PREVIEWS ################ +---------+---------+-------------------------+----------+----+-----+ |SERVICE |NAME |HOST |DATA |DB |WAL | +---------+---------+-------------------------+----------+----+-----+ |osd |osd-hdd |node05 |/dev/sdk |- |- | +---------+---------+-------------------------+----------+----+-----+ But as soon as I add db_devices back, the orchestrator is happy as it is, like there is nothing to do: ceph orch apply -i osd_specs.yml --dry-run WARNING! Dry-Runs are snapshots of a certain point in time and are bound to the current inventory setup. If any of these conditions change, the preview will be invalid. Please make sure to have a minimal timeframe between planning and applying the specs. #################### SERVICESPEC PREVIEWS #################### +---------+------+--------+-------------+ |SERVICE |NAME |ADD_TO |REMOVE_FROM | +---------+------+--------+-------------+ +---------+------+--------+-------------+ ################ OSDSPEC PREVIEWS ################ +---------+------+------+------+----+-----+ |SERVICE |NAME |HOST |DATA |DB |WAL | +---------+------+------+------+----+-----+ I do not know why ceph will not use this disk, and I do not know where to look. It seems logs are not saying anything. And the weirdest thing, another disk was replaced on the same machine, and it went without any issues. Luis Domingues Proton AG

10 months

2
5
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2023