Hello everyone,
We have a ceph cluster which was recently updated from octopus(15.2.12) to
pacific(16.2.13). There has been a problem in multi part upload, which is,
when doing UPLOAD_PART_COPY from a valid and existing previously uploaded
part, it gets 403, ONLY WHEN IT'S CALLED BY SERVICE-USER. The same scenario
gets a 200 response by a full-access sub-user, and both sub-user and
service-user get 200 on the same scenario in octopus version. The policy
for service user access is as below:
{
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::user/wid:suserid"
},
"Action": "*",
"Resource": [
"arn:aws:s3:::bucketname",
"arn:aws:s3:::bucketname/*"
]
}
]
} Note that this very service-user can perform a multi-part upload without
any problem on both versions, only the upload_part_copy and only on
pacific, it gets 403; which makes it unlikely to be an access problem. Has
anyone encountered this issue?
I performed multi-part upload using boto3 but there has been the same issue
on other clients as well.
regards
I have a PG that hasn't been scrubbed in over a month and
not deep-scrubbed in over two months.
I tried forcing with `ceph pg (deep-)scrub` but with no success.
Looking at the logs of that PG's primary OSD it looks like
every once in a while it attempts (and apparently fails) to
scrub that PG, along with two others, over and over. For
example:
2023-07-19T16:26:07.082 ... 24.3ea scrub starts
2023-07-19T16:26:10.284 ... 27.aae scrub starts
2023-07-19T16:26:11.169 ... 24.aa scrub starts
2023-07-19T16:26:12.153 ... 24.3ea scrub starts
2023-07-19T16:26:13.346 ... 27.aae scrub starts
2023-07-19T16:26:16.239 ... 24.aa scrub starts
...
Lines like that are repeated throughout the log file.
Has anyone seen something similar? How can I debug this?
I am running 17.2.5
Vlad
Hey all,
We will be having a Ceph science/research/big cluster call on Wednesday
July 26th. If anyone wants to discuss something specific they can add it
to the pad linked below. If you have questions or comments you can
contact me.
This is an informal open call of community members mostly from
hpc/htc/research environments where we discuss whatever is on our minds
regarding ceph. Updates, outages, features, maintenance, etc...there is
no set presenter but I do attempt to keep the conversation lively.
NOTE: The change to using Jitsi for the meeting. We are no longer using
the bluejeans meeting links. The ceph calendar event does not yet
reflect this and has the wrong day as well.
Pad URL:
https://pad.ceph.com/p/Ceph_Science_User_Group_20230726
Ceph calendar event details:
July 26th, 2023
14:00 UTC
4pm Central European
9am Central US
Description: Main pad for discussions:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone:
https://meet.jit.si/ceph-science-wg
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS/TROPICS
Space Science & Engineering Center
University of Wisconsin-Madison
I need some help understanding this. I have configured nfs-ganesha for cephfs using something like this in ganesha.conf
FSAL { Name = CEPH; User_Id = "testing.nfs"; Secret_Access_Key = "AAAAAAAAAAAAAAA=="; }
But I contstantly have these messages in de ganesha logs, 6x per user_id
auth: unable to find a keyring on /etc/ceph/ceph.client.testing
I thought this was a ganesha authentication order issue, but they[1] say it has to do with ceph. I am still on Nautilus so maybe this has been fixed in newer releases. I still have a hard time understanding why this is an issue of ceph (libraries).
[1]
https://github.com/nfs-ganesha/nfs-ganesha/issues/974
Hi all,
Has anyone else noticed any p99.99+ tail latency regression for RBD
workloads in Quincy vs. pre-Pacific, i.e., before the kv_onode cache
existed?
Some notes from what I have seen thus far:
* Restarting OSDs temporarily resolves the problem... then as activity
accrues over time, the problem becomes appreciably worse
* In comparing profiles of running OSDs, I've noticed that the
bluestore block allocators are comparatively more active than in old
releases (even though the fragmentation scores of the Quincy OSDs are
far better in this case)
* The new kv_onode cache often looks like it is often bursting at the
seams, whereas the kv/meta/data caches have breathing room
I am becoming increasingly confident that the observations are
related, though I have not dived enough into bluestore to reason about
how/when onodes are allocated on disk to complete the circle.
Anyways, I am posting this to see if perhaps the defaults for the
priority cache for the new kv_onode slab needs a slight nudge. You can
observe them on OSDs with debug_bluestore 20/20 for a second and
grepping for cache_size.
Cheers,
Tyler
Hi,
I am using ceph 17.2.6 on rocky linux 8.
I got a large omap object warning today.
Ok, So I tracked it down to a shard for a bucket in the index pool of an s3 pool.
However, when lisitng the omapkeys with:
# rados -p pool.index listomapkeys .dir.zone.bucketid.xx.indexshardnumber
it is clear that the problem is caused by many omapkeys with the following name format:
<80>0_00004771163.3444695458.6
A hex dump of the output of the listomapkeys command above indicates that the first 'character' is indeed hex 80, but as there is no equivalent ascii for hex 80, I am not sure how to 'get at' those keys to see the values, delete them, etc. The index keys not of the format above appear to be fine, indicating s3 object names as expected.
The rest of the index shards for the bucket are reasonable and have less than osd_deep_scrub_large_omap_object_key_threshold index objects , and the overall total of objects in the bucket is way less than osd_deep_scrub_large_omap_object_key_threshold*num_shards.
These weird objects seem to be created occasionally.........????? Yes, the bucket is used heavily.
Any advice here?
-Chris
Hey ceph-users,
I setup a multisite sync between two freshly setup Octopus clusters.
In the first cluster I created a bucket with some data just to test the
replication of actual data later.
I then followed the instructions on
https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site…
to add a second zone.
Things went well and both zones are now happily reaching each other and
the API endpoints are talking.
Also the metadata is in sync already - both sides are happy and I can
see bucket listings and users are "in sync":
> # radosgw-admin sync status
> realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
> zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
> zone 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
> metadata sync no sync (zone is master)
> data sync source: c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
> init
> full sync: 128/128 shards
> full sync: 0 buckets to sync
> incremental sync: 0/128 shards
> data is behind on 128 shards
> behind shards: [0...127]
>
and on the other side ...
> # radosgw-admin sync status
> realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
> zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
> zone c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
> metadata sync syncing
> full sync: 0/64 shards
> incremental sync: 64/64 shards
> metadata is caught up with master
> data sync source: 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
> init
> full sync: 128/128 shards
> full sync: 0 buckets to sync
> incremental sync: 0/128 shards
> data is behind on 128 shards
> behind shards: [0...127]
>
also the newly created buckets (read: their metadata) is synced.
What is apparently not working in the sync of actual data.
Upon startup the radosgw on the second site shows:
> 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: start
> 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: realm
> epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736
> 2021-06-25T16:15:11.525+0000 7fe71dff3700 0
> RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read remote
> data log shards
>
also when issuing
# radosgw-admin data sync init --source-zone obst-rgn
it throws
> 2021-06-25T16:20:29.167+0000 7f87c2aec080 0
> RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data
> log shards
Does anybody have any hints on where to look for what could be broken here?
Thanks a bunch,
Regards
Christian
Hi,
We are running a ceph cluster managed with cephadm v16.2.13. Recently we needed to change a disk, and we replaced it with:
ceph orch osd rm 37 --replace.
It worked fine, the disk was drained and the OSD marked as destroy.
However, after changing the disk, no OSD was created. Looking to the db device, the partition for db for OSD 37 was still there. So we destroyed it using:
ceph-volume lvm zap --osd-id=37 --destroy.
But we still have no OSD redeployed.
Here we have our spec:
---
service_type: osd
service_id: osd-hdd
placement:
label: osds
spec:
data_devices:
rotational: 1
encrypted: true
db_devices:
size: '1TB:2TB' db_slots: 12
And the disk looks good:
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357 1600G 12m ago LVM detected, locked
node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T Yes 12m ago
And VG on db_device looks to have enough space:
ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t 173.91g
If I remove the db_devices and db_slots from the specs, and do a dry run, the orchestrator seems to see the new disk as available:
ceph orch apply -i osd_specs.yml --dry-run
WARNING! Dry-Runs are snapshots of a certain point in time and are bound
to the current inventory setup. If any of these conditions change, the
preview will be invalid. Please make sure to have a minimal
timeframe between planning and applying the specs.
####################
SERVICESPEC PREVIEWS
####################
+---------+------+--------+-------------+
|SERVICE |NAME |ADD_TO |REMOVE_FROM |
+---------+------+--------+-------------+
+---------+------+--------+-------------+
################
OSDSPEC PREVIEWS
################
+---------+---------+-------------------------+----------+----+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+---------+-------------------------+----------+----+-----+
|osd |osd-hdd |node05 |/dev/sdk |- |- |
+---------+---------+-------------------------+----------+----+-----+
But as soon as I add db_devices back, the orchestrator is happy as it is, like there is nothing to do:
ceph orch apply -i osd_specs.yml --dry-run
WARNING! Dry-Runs are snapshots of a certain point in time and are bound
to the current inventory setup. If any of these conditions change, the
preview will be invalid. Please make sure to have a minimal
timeframe between planning and applying the specs.
####################
SERVICESPEC PREVIEWS
####################
+---------+------+--------+-------------+
|SERVICE |NAME |ADD_TO |REMOVE_FROM |
+---------+------+--------+-------------+
+---------+------+--------+-------------+
################
OSDSPEC PREVIEWS
################
+---------+------+------+------+----+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+------+------+------+----+-----+
I do not know why ceph will not use this disk, and I do not know where to look. It seems logs are not saying anything. And the weirdest thing, another disk was replaced on the same machine, and it went without any issues.
Luis Domingues
Proton AG