Hello,
For some time now I’m struggling with the time it takes to CompleteMultipartUpload on one of my rgw clusters.
I have a customer with ~8M objects in one bucket uploading quite a large files. From 100GB to like 800GB.
I’ve noticed when they are uploading ~200GB files that the requests started timeouting on a LB we have infront of the rgw.
When I’ve started going through the logs I’ve noticed that the CompleteMultipartUpload request took like 700s to finish. Which seemed ok-ish, but the number seem quite large.
However, when they started uploading 750GB files the time to complete the multipart upload ended around 2500s -> more than 40minutes which seems like a way to much.
Do you have a similar experience? Is there anything we can do to improve this? How much time does the CompleteMultipartUpload takes on your clusters?
The cluster is running on version 17.2.6.
Regards,
Ondrej
Hello Anthony,
The replicated index pool has about 20TiB of free space and we are using Intel P5510 NVMe Enterprise SSDs so I guess the HW shouldn’t be the issue.
Yes, I’m able to change the timeout on our LB, but I’m not sure if I want to set it to 40minutes+…
Ondrej
> On 5. 2. 2024, at 20:09, Anthony D'Atri <anthony.datri(a)gmail.com> wrote:
>
> Do you have sufficient capacity in the non-ec pool? Is it on fast media?
>
> You should be able to increase the timeout on your LB.
>
>> On Feb 5, 2024, at 13:51, Ondřej Kukla <ondrej(a)kuuk.la> wrote:
>>
>> Hello,
>>
>> For some time now I’m struggling with the time it takes to CompleteMultipartUpload on one of my rgw clusters.
>>
>> I have a customer with ~8M objects in one bucket uploading quite a large files. From 100GB to like 800GB.
>>
>>
>> I’ve noticed when they are uploading ~200GB files that the requests started timeouting on a LB we have infront of the rgw.
>>
>> When I’ve started going through the logs I’ve noticed that the CompleteMultipartUpload request took like 700s to finish. Which seemed ok-ish, but the number seem quite large.
>>
>> However, when they started uploading 750GB files the time to complete the multipart upload ended around 2500s -> more than 40minutes which seems like a way to much.
>>
>>
>> Do you have a similar experience? Is there anything we can do to improve this? How much time does the CompleteMultipartUpload takes on your clusters?
>>
>> The cluster is running on version 17.2.6.
>>
>> Regards,
>>
>> Ondrej
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
~~~
Hello,
I think that /dev/rbd* devices are flitered "out" or not filter "in" by the fiter
option in the devices section of /etc/lvm/lvm.conf.
So pvscan (pvs, vgs and lvs) don't look at your device.
~~~
Hi Gilles,
So the lvm filter from the lvm.conf file is set to the default of `filter = [ "a|.*|" ]`, so that's accept every block device, so no luck there :-(
~~~
For Ceph based LVM volumes, you would do this to import:
Map every one of the RBDs to the host
Include this in /etc/lvm/lvm.conf:
types = [ "rbd", 1024 ]
pvscan
vgscan
pvs
vgs
If you see the VG:
vgimportclone -n <make a name for VG> /dev/rbd0 /dev/rbd1 ... --import
Now you should be able to vgchange -a y <your VG> and see the LVs
~~~
Hi Alex,
Did the above as you suggested - the rbd devices (3 of them, none of which were originally part of an lvm on the ceph servers - at least, not set up manually by me) still do not show up using pvscan, etc.
So I still can't mount any of them (not without re-creating a fs, anyway, and thus losing the data I'm trying to read/import) - they all return the same error message (see original post).
Anyone got any other ideas? <hopeful tone in voice> :-)
Cheers
Dulux-Oz
Hi,
I have a small cluster with some faulty disks within it and I want to clone
the data from the faulty disks onto new ones.
The cluster is currently down and I am unable to do things like
ceph-bluestore-fsck but ceph-bluestore-tool bluefs-export does appear to
be working.
Any help would be appreciated
Many thanks
Carl
Hello, Ceph users,
I would like to use my secondary Ceph cluster for backing up RBD OpenNebula
volumes from my primary cluster using mirroring in image+snapshot mode.
Because it is for backups only, not a cold-standby, I would like to use
erasure coding on the secondary side to save a disk space.
Is it supported at all?
I tried to create a pool:
secondary# ceph osd pool create one-mirror erasure k6m2
secondary# ceph osd pool set one-mirror allow_ec_overwrites true
set pool 13 allow_ec_overwrites to true
secondary# rbd mirror pool enable --site-name secondary one-mirror image
2024-02-02T11:00:34.123+0100 7f95070ad5c0 -1 librbd::api::Mirror: mode_set: failed to allocate mirroring uuid: (95) Operation not supported
When I created a replicated pool instead, this step worked:
secondary# ceph osd pool create one-mirror-repl replicated
secondary# rbd mirror pool enable --site-name secondary one-mirror-repl image
secondary#
So, is RBD mirroring supported with erasure-coded pools at all? Thanks!
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
Hi,
Can anyone shed light on this please?
I have had our cluster crashed and now managed to get everything back up
and running, osds have nearly rebalanced but I am seeing issues with rgw.
2024-02-05T01:29:56.272+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 realm
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 4 RGWPeriod::init failed to init
realm id : (2) No such file or directory
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=-2
bl.length=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados_obj.operate() r=0
bl.length=17
2024-02-05T01:29:56.276+0000 7f7237e75f40 20 rados->read ofs=0 len=0
The .rgw.root and .rgw.index are both marked incomplete and one pg for each
was restored from a bad disk. Both pools are now showing a status
of peering_blocked_by_history_les_bound.
I do have some other pgs with important data that can be recovered from the
disks but it is not essential that is done straight away. I need to get
RGW running so I can delete old data and free up some space to allow
backfiling to complete.
Version is 18.2.1 running under cephadm
data:
pools: 19 pools, 801 pgs
objects: 9.23M objects, 4.7 TiB
usage: 9.8 TiB used, 2.7 TiB / 12 TiB avail
pgs: 2.122% pgs not active
435947/18456424 objects degraded (2.362%)
559225/18456424 objects misplaced (3.030%)
758 active+clean
17 incomplete
12 active+undersized+degraded+remapped+backfill_toofull
12 active+remapped+backfill_toofull
2 active+clean+scrubbing+deep
If anyone can suggest a known way of recovering from this your advice would
be appreciated.
Kind regards
Carl.
Hi All,
I am in the process of implementing multi-site RGW instance and have successfully set up a POC and confirmed the functionality.
I am working on metrics and alerting for this service, and I am not seeing metrics available for the output shown by
radosgw-admin sync status --rgw-realm=<<realm-name>>
Sample output:
[@cepha-cn02 ~]# radosgw-admin sync status --rgw-realm=<<realm-name>>
realm a207b396-8d1b-408b-851e-10ad545861b7 (realm-name)
zonegroup 77e8924b-05e3-4d86-b887-aedd7fe5306c (zonegroup-name)
zone a26c27b2-d6ac-4eab-a4ce-1036ce2d37dc (zone-name)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 8c7d69db-85ae-45f4-b4ec-f712fad4af07 (zone-name)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
I'd like to measure, track, and alert on shard status during sync operations.
Is there a way to expose these metrics? I'm struggling to find guidance or details.
Thanks in advance
Rhys
Rhys Powell (He/Him)
KORE<https://www.korewireless.com/> | Senior Systems Engineer
(m)
rpowell(a)korewireless.com<mailto:rpowell@korewireless.com>
LinkedIn<https://www.linkedin.com/company/kore-wireless/> | Twitter<https://twitter.com/KORE_Wireless>| Instagram<https://www.instagram.com/kore_wireless/>
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Hi All,
All of this is using the latest version of RL and Ceph Reef
I've got an existing RBD Image (with data on it - not "critical" as I've
got a back up, but its rather large so I was hoping to avoid the restore
scenario).
The RBD Image used to be server out via an (Ceph) iSCSI Gateway, but we
are now looking to use plain old kernal module.
The RBD Image has been RBD Mapped to the client's /dev/rbd0 location.
So now I'm trying a straight `mount /dev/rbd0 /mount/old_image/` as a test
What I'm getting back is `mount: /mount/old_image/: unknown filesystem
type 'LVM2_member'.`
All my Google Foo is telling me that to solve this issue I need to
reformat the image with a new file system - which would mean "losing"
the data.
So my question is: How can I get to this data using rbd kernal modules
(the iSCSI Gateway is no longer available, so not an option), or am I
stuck with the restore option?
Or is there something I'm missing (which would not surprise me in the
least)? :-)
Thanks in advance (as always, you guys and gals are really, really helpful)
Cheers
Dulux-Oz