Hello,
We are running an older version of ceph - 14.2.22 nautilus
We have a radosgw/s3 implementation and had some issues with multi-part uploads failing to complete.
We used s3cmd to delete the failed uploads and clean out the bucket, but when reviewing the space utilization of buckets, it seems this one is still consuming space:
[ ~]# radosgw-admin bucket stats --bucket=BUCKETNAME
{
"bucket": "BUCKETNAME",
"num_shards": 32,
"tenant": "",
"zonegroup": "c73e02d6-d479-4cdc-bf86-8b09f0a9f6ba",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "50ee73bc-bc08-4f9f-9d5b-4492cb4c5e77.1689003.1695",
"marker": "50ee73bc-bc08-4f9f-9d5b-4492cb4c5e77.1689003.1695",
"index_type": "Normal",
"owner": "BUCKETNAME",
"ver": "0#47066,1#30480,2#42797,3#36437,4#47308,5#33285,6#37127,7#24292,8#44567,9#34273,10#29402,11#36228,12#48153,13#32665,14#42314,15#21143,16#34319,17#42818,18#39301,19#23897,20#26225,21#50957,22#39706,23#29723,24#49619,25#44974,26#44020,27#22505,28#46702,29#49390,30#27263,31#21515",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0",
"mtime": "2021-02-08 13:06:13.311932Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551613
},
"rgw.main": {
"size": 34247260247640,
"size_actual": 34247284682752,
"size_utilized": 34247260247640,
"size_kb": 33444590086,
"size_kb_actual": 33444613948,
"size_kb_utilized": 33444590086,
"num_objects": 340627
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
I see under the usage.rgw.main.size_kb_actual the value is 33444613948, or roughly 30TB
When I use the radosgw-admin tool to list objects, I can see many failed multi-part uploads:
[ ~]# radosgw-admin bucket list --bucket BUCKETNAME | jq '.[] | "\(.name), \(.meta.mtime), \(.meta.size)"'
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~07YXhKKZn2XYy-6F0itVB4tpuBm1q1J.1, 2021-02-10 00:57:08.033082Z, 4194304"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~07YXhKKZn2XYy-6F0itVB4tpuBm1q1J.2, 2021-02-10 00:56:36.463099Z, 8794011"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.1, 2021-02-10 00:38:44.572199Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.2, 2021-02-10 00:38:48.680330Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.3, 2021-02-10 00:38:52.232674Z, 95445231"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.1, 2021-02-11 00:30:55.489965Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.2, 2021-02-11 00:30:58.832752Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.3, 2021-02-11 00:31:01.188868Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.4, 2021-02-11 00:30:53.035172Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.5, 2021-02-11 00:30:21.359861Z, 12448760"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.1, 2021-02-11 00:11:52.163319Z, 4194304"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.2, 2021-02-11 00:11:48.293292Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.3, 2021-02-11 00:11:55.320413Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.4, 2021-02-11 00:11:55.039628Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.5, 2021-02-11 00:11:26.493213Z, 2005541"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~05JmbiZqt8tvgVmJ3Ef6WEzBa3Jla7L.1, 2021-02-12 00:53:24.453273Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~05JmbiZqt8tvgVmJ3Ef6WEzBa3Jla7L.2, 2021-02-12 00:54:00.743677Z, 9835956"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~90wJZ6jaWa6BaQC88e9YdXJwsqyme3u.1, 2021-02-12 00:59:24.943370Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~90wJZ6jaWa6BaQC88e9YdXJwsqyme3u.10, 2021-02-12 00:56:56.621609Z, 4194304"
...
However, when I try to delete one of these object via radosgw-admin, I receive an error that the object is not found:
[ ~]# radosgw-admin object rm --bucket BUCKETNAME --object=_multipart_chi-pl-clh-shard-0-0-0-2021-04-17.tar.gz.2~CVL_xbfGjdDckHe_hpJxoUSynjotOtR.18
ERROR: object remove returned: (2) No such file or directory
When I list object via S3 API, none are found:
[ minio-binaries]# ./mc ls BUCKETNAME
[2021-02-08 08:06:13 EST] 0B BUCKETNAME/
[ minio-binaries]# ./mc ls BUCKETNAME/FOLDER
[ minio-binaries]# ./mc ls BUCKETNAME/FOLDER
[ minio-binaries]# ./mc ls --incomplete BUCKETNAME/FOLDER
You have mail in /var/spool/mail/root
[ minio-binaries]#
I am wondering if I were to delete the bucket BUCKETNAME with the --purge option, would that remove the objects?
Are the objects actually there?
How can I confirm the objects exist?
If I delete the bucket, how can I confirm the objects are gone?
Any help would be greatly appreciated!
Rhys
Hello!
Yesterday we found some errors in our cephadm disks, which is making it
impossible to access our HPC Cluster:
# ceph health detail
HEALTH_WARN 3 failed cephadm daemon(s); insufficient standby MDS daemons
available
[WRN] CEPHADM_FAILED_DAEMON: 3 failed cephadm daemon(s)
daemon mds.cephfs.s1.nvopyf on s1.ceph.infra.ufscar.br is in error state
daemon mds.cephfs.s2.qikxmw on s2.ceph.infra.ufscar.br is in error state
daemon mds.cftv.s2.anybzk on s2.ceph.infra.ufscar.br is in error state
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
have 0; want 1 more
Googling we found out that we should remove the failed MDS, but the data in
these disks is relatively important. We would like to know if we need to
remove it or if it can be fixed, and if we have to remove it if the data
will be lost. Please tell me if you need more information.
Thanks in advance,
André de Freitas Smaira
Federal University of São Carlos - UFSCar
Hi everyone,
Our telemetry service is up and running again.
Thanks Adam Kraitman and Dan Mick for restoring the service.
We thank you for your patience and appreciate your contribution to the
project!
Thanks,
Yaarit
On Tue, Jan 3, 2023 at 3:14 PM Yaarit Hatuka <yhatuka(a)redhat.com> wrote:
> Hi everyone,
>
> We are having some infrastructure issues with our telemetry backend, and
> we are working on fixing it.
> Thanks Jan Horacek for opening this issue
> <https://tracker.ceph.com/issues/58371> [1]. We will update once the
> service is back up.
> We are sorry for any inconvenience you may be experiencing, and appreciate
> your patience.
>
> Thanks,
> Yaarit
>
> [1] https://tracker.ceph.com/issues/58371
>
Hi,
Ceph 16 Pacific introduced a new smaller default min_alloc_size of 4096 bytes for HDD and SSD OSDs.
How can I get the current min_allloc_size of OSDs that were created with older Ceph versions? Is there a command that shows this info from the on disk format of a bluestore OSD?
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
Good morning everyone.
That night we went through an accident, where they accidentally renamed the
.data pool of a File System making it instantly inaccessible, when renaming
it again to the correct name it was possible to mount and list the files,
but could not read or write. When trying to write, the FS returned as Read
Only, when trying to read it returned Operation not allowed.
After a period of breaking my head I tried to mount with the ADMIN user and
everything worked correctly.
I tried to remove the authentication of the current user through `ceph auth
rm`, I created a new user through `ceph fs authorize <fs_name>
client.<user> / rw` and it continued the same way, I also tried to recreate
it through `ceph auth get-or-create` and nothing different happened, it
stayed exactly the same.
After setting `allow *` in mon, mds and osd I was able to mount, read and
write again with the new user.
I can understand why the File System stopped after renaming the pool, what
I don't understand is why users are unable to perform operations on FS even
with RW or any other user created.
What could have happened behind the scenes to not be able to perform IO
even with the correct permissions? Or did I apply incorrect permissions
that caused this problem?
Right now everything is working, I would really like to understand what
happened, because I didn't find anything documented about this type of
incident.
Hi,
I am just reading through this document (
https://docs.ceph.com/en/octopus/radosgw/config-ref/) and on the top is
states:
The following settings may added to the Ceph configuration file (i.e.,
> usually ceph.conf) under the [client.radosgw.{instance-name}] section.
>
And my ceph.conf looks like this:
[client.eu-central-1-s3db3]
> rgw_frontends = beast endpoint=[::]:7482
> rgw_region = eu
> rgw_zone = eu-central-1
>
> [client.eu-central-1-s3db3-old]
> rgw_frontends = beast endpoint=[::]:7480
> rgw_region = eu
> rgw_zone = eu-central-1
>
> [client.eu-customer-1-s3db3]
> rgw_frontends = beast endpoint=[::]:7481
> rgw_region = eu-someother
> rgw_zone = eu-someother-1
>
Do I need to change the section names? It also seems that rgw_region is a
non-existing config value (this might have come from very old RHCS
documentation)
Would be very nice if someone could help me clarify this.
Cheers and happy weekend
Boris
Hi Xiubo, Randy,
This is due to '<host_ip_address> host.containers.internal' being added to the container's /etc/hosts since Podman 4.1+.
The workaround consists of either downgrading Podman package to v4.0 (on RHEL8, dnf downgrade podman-4.0.2-6.module+el8.6.0+14877+f643d2d6) or adding the --no-hosts option to 'podman run' command in /var/lib/ceph/$(ceph fsid)/iscsi.iscsi.test-iscsi1.xxxxxx/unit.run and restart the iscsi container service.
[1] and [2] could well have the same cause. RHCS Block Device Guide [3] quotes RHEL 8.4 as a prerequisites. I don't know what was the version of Podman in RHEL 8.4 at the time, but with RHEL 8.7 and Podman 4.2, it's broken.
I'll open a RHCS case today to have it fixed and have other containers like grafana, prometheus, etc. being checked against this new podman behavior.
Regards,
Frédéric.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1979449
[2] https://tracker.ceph.com/issues/57018
[3] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-s…
----- Le 21 Nov 22, à 6:45, Xiubo Li xiubli(a)redhat.com a écrit :
> On 15/11/2022 23:44, Randy Morgan wrote:
>> You are correct I am using the cephadm to create the iscsi portals.
>> The cluster had been one I was learning a lot with and I wondered if
>> it was because of the number of creations and deletions of things, so
>> I rebuilt the cluster, now I am getting this response even when
>> creating my first iscsi target. Here is the output of the gwcli ls:
>>
>> sh-4.4# gwcli ls
>> o- /
>> ........................................................................................................................
>> [...]
>> o- cluster
>> ........................................................................................................
>> [Clusters: 1]
>> | o- ceph
>> .........................................................................................................
>> [HEALTH_WARN]
>> | o- pools
>> .........................................................................................................
>> [Pools: 8]
>> | | o- .rgw.root
>> ............................................................ [(x3),
>> Commit: 0.00Y/71588776M (0%), Used: 1323b]
>> | | o- cephfs_data
>> .......................................................... [(x3),
>> Commit: 0.00Y/71588776M (0%), Used: 1639b]
>> | | o- cephfs_metadata
>> ...................................................... [(x3), Commit:
>> 0.00Y/71588776M (0%), Used: 3434b]
>> | | o- default.rgw.control
>> .................................................. [(x3), Commit:
>> 0.00Y/71588776M (0%), Used: 0.00Y]
>> | | o- default.rgw.log
>> ...................................................... [(x3), Commit:
>> 0.00Y/71588776M (0%), Used: 3702b]
>> | | o- default.rgw.meta
>> ...................................................... [(x3), Commit:
>> 0.00Y/71588776M (0%), Used: 382b]
>> | | o- device_health_metrics
>> ................................................ [(x3), Commit:
>> 0.00Y/71588776M (0%), Used: 0.00Y]
>> | | o- rhv-ceph-ssd
>> ..................................................... [(x3), Commit:
>> 0.00Y/7868560896K (0%), Used: 511746b]
>> | o- topology
>> ..............................................................................................
>> [OSDs: 36,MONs: 3]
>> o- disks
>> ......................................................................................................
>> [0.00Y, Disks: 0]
>> o- iscsi-targets
>> ..............................................................................
>> [DiscoveryAuth: None, Targets: 1]
>> o- iqn.2001-07.com.ceph:1668466555428
>> ............................................................... [Auth:
>> None, Gateways: 1]
>> o- disks
>> .........................................................................................................
>> [Disks: 0]
>> o- gateways
>> ...........................................................................................
>> [Up: 1/1, Portals: 1]
>> | o- host.containers.internal
>> ........................................................................
>> [192.168.105.145 (UP)]
>
> Please manually remove this gateway before doing further steps.
>
> It should be a bug in cephadm and you can raise one tracker for this.
>
> Thanks
>
>
>> o- host-groups
>> .................................................................................................
>> [Groups : 0]
>> o- hosts
>> ......................................................................................
>> [Auth: ACL_ENABLED, Hosts: 0]
>> sh-4.4#
>>
>> Randy
>>
>> On 11/9/2022 6:36 PM, Xiubo Li wrote:
>>>
>>> On 10/11/2022 02:21, Randy Morgan wrote:
>>>> I am trying to create a second iscsi target and I keep getting an
>>>> error when I create the second target:
>>>>
>>>>
>>>> Failed to update target 'iqn.2001-07.com.ceph:1667946365517'
>>>>
>>>> disk create/update failed on host.containers.internal. LUN
>>>> allocation failure
>>>>
>>> I think you were using the cephadm to add the iscsi targets, not the
>>> gwcli or Rest APIs directly.
>>>
>>> Before we hit other issues were login failures, that because there
>>> were two gateways using the same IP address. Please share your `gwcli
>>> ls` output to see what the 'host.containers.internal' gateway's config.
>>>
>>> Thanks!
>>>
>>>
>>>> I am running ceph Pacific: *Version*
>>>> 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
>>>> pacific (stable)
>>>>
>>>> All of the information I can find on this problem is from 3 years
>>>> ago and doesn't seem to apply any more. Does anyone know how to
>>>> correct this problem?
>>>>
>>>> Randy
>>>>
>>>
>>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Dear Xiubo,
could you explain how to enable kernel debug logs (I assume this is on the
client)?
Thanks,
Manuel
On Fri, May 13, 2022 at 9:39 AM Xiubo Li <xiubli(a)redhat.com> wrote:
>
> On 5/12/22 12:06 AM, Stefan Kooman wrote:
> > Hi List,
> >
> > We have quite a few linux kernel clients for CephFS. One of our
> > customers has been running mainline kernels (CentOS 7 elrepo) for the
> > past two years. They started out with 3.x kernels (default CentOS 7),
> > but upgraded to mainline when those kernels would frequently generate
> > MDS warnings like "failing to respond to capability release". That
> > worked fine until 5.14 kernel. 5.14 and up would use a lot of CPU and
> > *way* more bandwidth on CephFS than older kernels (order of
> > magnitude). After the MDS was upgraded from Nautilus to Octopus that
> > behavior is gone (comparable CPU / bandwidth usage as older kernels).
> > However, the newer kernels are now the ones that give "failing to
> > respond to capability release", and worse, clients get evicted
> > (unresponsive as far as the MDS is concerned). Even the latest 5.17
> > kernels have that. No difference is observed between using messenger
> > v1 or v2. MDS version is 15.2.16.
> > Surprisingly the latest stable kernels from CentOS 7 work flawlessly
> > now. Although that is good news, newer operating systems come with
> > newer kernels.
> >
> > Does anyone else observe the same behavior with newish kernel clients?
>
> There have some known bugs, which have been fixed or under fixing
> recently, even in the mainline and, not sure whether are they related.
> Such as [1][2][3][4]. More detail please see ceph-client repo testing
> branch [5].
>
> I have never see the "failing to respond to capability release" issue
> yet, if you have the MDS logs(debug_mds = 25 and debug_ms = 1) and
> kernel debug logs will be better to help debug it further, or provide
> the steps to reproduce it.
>
> [1] https://tracker.ceph.com/issues/55332
> [2] https://tracker.ceph.com/issues/55421
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2063929
> [4] https://tracker.ceph.com/issues/55377
> [5] https://github.com/ceph/ceph-client/commits/testing
>
> Thanks
>
> -- Xiubo
>
> >
> > Gr. Stefan
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>