May 2020 - ceph-users - lists.ceph.io

by Denis Krienbühl

Hi On one of our Ceph clusters, some OSDs have been marked as full. Since this is a staging cluster that does not have much data on it, this is strange. Looking at the full OSDs through “ceph osd df” I figured out that the space is mostly used by metadata: SIZE: 122 GiB USE: 118 GiB DATA: 2.4 GiB META: 116 GiB We run mimic, and for the affected OSDs we use a db device (nvme) in addition to the primary device (hdd). In the logs we see the following errors: 2020-05-12 17:10:26.089 7f183f604700 1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x0; fallback to bdev 2 2020-05-12 17:10:27.113 7f183f604700 1 bluestore(/var/lib/ceph/osd/ceph-8) _balance_bluefs_freespace gifting 0x180a000000~400000 to bluefs 2020-05-12 17:10:27.153 7f183f604700 1 bluefs add_block_extent bdev 2 0x180a000000~400000 We assume it is an issue with Rocksdb, as the following call will quickly fix the problem: ceph daemon osd.8 compact The question is, why is this happening? I would think that “compact" is something that runs automatically from time to time, but I’m not sure. Is it on us to run this regularly? Any pointers are welcome. I’m quite new to Ceph :) Cheers, Denis

3 years, 12 months

3
3
0 0

Difficulty creating a topic for bucket notifications

by Alexis Anand

Hi, I am trying to create a topic so that I can use it to listen for object creation notifications on a bucket. If I make my API call without supplying AWS authorization headers, the topic creation succeeds, and it can be seen by using a ListTopics call. However, in order to attach a topic to a bucket, the topic and bucket must have the same owner. So I tried creating a topic using AWS auth. The credential header I tried was the same as what I use for get/put items to a bucket: Credential=<access key id>/20200512/us-east-1/s3/aws4_request However in this case rather than succeeding I get a NotImplemented error. If I tried changing the service in the credential to something other than s3, like "Credential=<access key id>/20200512/us-east-1/s3/aws4_request" I instead get a SignatureDoesNotMatch error. What is the right way to authenticate a CreateTopic request? Thanks, Alexis

3 years, 12 months

2
1
0 0

Memory usage of OSD

by Rafał Wądołowski

Hi, I noticed strange situation in one of our clusters. The OSD deamons are taking too much RAM. We are running 12.2.12 and have default configuration of osd_memory_target (4GiB). Heap dump shows: osd.2969 dumping heap profile now. ------------------------------------------------ MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 173373288 ( 165.3 MiB) Bytes in central cache freelist MALLOC: + 17163520 ( 16.4 MiB) Bytes in transfer cache freelist MALLOC: + 95339512 ( 90.9 MiB) Bytes in thread cache freelists MALLOC: + 28995744 ( 27.7 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 6696399008 ( 6386.2 MiB) Actual memory used (physical + swap) MALLOC: + 218267648 ( 208.2 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 6914666656 ( 6594.3 MiB) Virtual address space used MALLOC: MALLOC: 408276 Spans in use MALLOC: 75 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. IMO "Bytes in use by application" should be less than osd_memory_target. Am I correct? I checked heap dump with google-pprof and got following results. Total: 149.4 MB 60.5 40.5% 40.5% 60.5 40.5% rocksdb::UncompressBlockContentsForCompressionType 34.2 22.9% 63.4% 34.2 22.9% ceph::buffer::create_aligned_in_mempool 11.9 7.9% 71.3% 12.1 8.1% std::_Rb_tree::_M_emplace_hint_unique 10.7 7.1% 78.5% 71.2 47.7% rocksdb::ReadBlockContents Does it mean that most of RAM is used by rocksdb? How can I take a deeper look into memory usage ? Regards, Rafał Wądołowski

3 years, 12 months

1
0
0 0

Read speed low in cephfs volume exposed as samba share using vfs_ceph

by Amudhan P

Hi, I am running a small 3 node Ceph Nautilus 14.2.8 cluster on Ubuntu 18.04. I am testing cluster to expose cephfs volume in samba v4 share for the user to access from windows for latter use. Samba Version 4.7.6-Ubuntu and mount.cifs version: 6.8. When I did a test with DD Write (600 MB/s) and md5sum file Read speed is (300 - 400 MB/s) from ceph kernel mount. The same volume I have exposed in samba using "vfs_ceph" and mounted it through CIFS in another ubuntu18.04 as client. Now, when I perform DD write I get the speed of 600 MB/s and md5sum of file Read speed is only 65 MB/s. There is a different result when I try to read the same file using smbclinet getting the speed of 101 MB/s. Why is this difference what could be the issue?

3 years, 12 months

1
0
0 0

Re: RGW STS Support in Nautilus ?

by Pritha Srivastava

app_id must match with the 'aud' field in the token introspection result (In the example the value of 'aud' is 'customer-portal') Thanks, Pritha On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll < wyllys.ingersoll(a)keepertech.com> wrote: > > Running Nautilus 14.2.9 and trying to follow the STS example given here: > https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy > for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider. > I am able to see in the rgw debug logs that the token being passed from the > client is passing the introspection check, but it always ends up failing > the final authorization to access the requested bucket resource and is > rejected with a 403 status "AccessDenied". > > I configured my policy as described in the 2nd example on the STS page > above. I suspect the problem is with the "StringEquals" condition statement > in the AssumeRolePolicy document (I could be wrong though). > > The example shows using the keycloak URI followed by ":app_id" matching > with the name of the keycloak client application ("customer-portal" in the > example). My keycloak setup does not have any such field in the > introspection result and I can't seem to figure out how to make this all > work. > > I cranked up the logging to 20/20 and still did not see any hints as to > what part of the policy is causing the access to be denied. > > Any suggestions? > > -Wyllys Ingersoll > > _______________________________________________ > Dev mailing list -- dev(a)ceph.io > To unsubscribe send an email to dev-leave(a)ceph.io >

3 years, 12 months

3
5
0 0

data increase after multisite syncing

by Zhenshi Zhou

Hi, I deployed a multisite in order to sync data from a mimic cluster zone to a nautilus cluster zone. The data sync well at present. However, I check the cluster status and I find something strange. The data in my new cluster seems larger than that in old ones. The data is far from full synced while the space used is nearly the same. Does that normal? 'ceph df ' on old cluster: GLOBAL: SIZE AVAIL RAW USED %RAW USED 82 TiB 41 TiB 41 TiB 50.37 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 1 6.0 KiB 0 10 TiB 19 default.rgw.control 2 0 B 0 10 TiB 8 default.rgw.meta 3 3.5 KiB 0 10 TiB 19 default.rgw.log 4 8.4 KiB 0 10 TiB 1500 default.rgw.buckets.index 5 0 B 0 10 TiB 889 default.rgw.buckets.non-ec 6 0 B 0 10 TiB 497 default.rgw.buckets.data 7 14 TiB 56.96 10 TiB 3968545 testpool 8 0 B 0 10 TiB 0 'ceph df ' on new cluster: RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 137 TiB 98 TiB 38 TiB 38 TiB 28.02 TOTAL 137 TiB 98 TiB 38 TiB 38 TiB 28.02 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL .rgw.root 1 6.4 KiB 21 3.8 MiB 0 26 TiB shubei.rgw.control 13 0 B 8 0 B 0 26 TiB shubei.rgw.meta 14 4.1 KiB 20 3.2 MiB 0 26 TiB shubei.rgw.log 15 9.9 MiB 1.64k 47 MiB 0 26 TiB default.rgw.meta 16 0 B 0 0 B 0 26 TiB shubei.rgw.buckets.index 17 2.7 MiB 889 2.7 MiB 0 26 TiB shubei.rgw.buckets.data 18 11 TiB 2.90M 33 TiB 29.37 26 TiB 'radosgw-admin sync status' on new cluster: realm bde4bb56-fbca-4ef8-a979-935dbf109b78 (new-oriental) zonegroup d25ae683-cdb8-4227-be45-ebaf0aed6050 (beijing) zone 313c8244-fe4d-4d46-bf9b-0e33e46be041 (shubei) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: f70a5eb9-d88d-42fd-ab4e-d300e97094de (oldzone) syncing full sync: 106/128 shards full sync: 350 buckets to sync incremental sync: 22/128 shards data is behind on 115 shards behind shards: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,23,24,25,26,27,28,29,30,32,35,37,38,39,40,41,42,43,44,45,46,47,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,96,97,98,99,100,101,102,103,104,105,107,108,109,110,111,112,113,114,116,118,119,120,121,122,123,124,125,126,127] oldest incremental change not applied: 2020-05-11 10:46:41.0.60179s [80] 5 shards are recovering recovering shards: [21,31,95,104,106]

3 years, 12 months

1
1
0 0

OSD corruption and down PGs

by Kári Bertilsson

Hello I had an incidence where 3 OSD's crashed at once completely and won't power up. And during recovery 3 OSD's in another host have somehow become corrupted. I am running erasure coding with 8+2 setup using crush map which takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's down. Unfortunately these PG's seem to overlap almost all data on the pool, so i believe the entire pool is mostly lost after only these 2% of PG's down. I am running ceph 14.2.9. OSD 92 log https://pastebin.com/5aq8SyCW OSD 97 log https://pastebin.com/uJELZxwr ceph-bluestore-tool repair without --deep showed "success" but OSD's still fail with the log above. Log from trying ceph-bluestore-tool repair --deep which is still running, not sure if it will actually fix anything and log looks pretty bad. https://pastebin.com/gkqTZpY3 Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 --op list" gave me input/output error. But everything in SMART looks OK, and i see no indication of hardware read error in any logs. Same for both OSD. The OSD's with corruption have absolutely no bad sectors and likely have only a minor corruption but at important locations. Any ideas on how to recover this kind of scenario ? Any tips would be highly appreciated. Best regards, Kári Bertilsson

3 years, 12 months

4
6
0 0

DocuBetter Meeting -- EMEA 13 May 2020

by John Zachary Dover

There is a general documentation meeting called the "DocuBetter Meeting", and it is held every two weeks. The next DocuBetter Meeting will be on 13 May 2020 at 0830 PST, and will run for thirty minutes. Everyone with a documentation-related request or complaint is invited. The meeting will be held here: https://bluejeans.com/908675367 Send documentation-related requests and complaints to me by replying to this email and CCing me at zac.dover(a)gmail.com. The next DocuBetter meeting is scheduled for: 13 May 2020 0830 PST 13 May 2020 1630 UTC 14 May 2020 0230 AEST Etherpad: https://pad.ceph.com/p/Ceph_Documentation Meeting: https://bluejeans.com/908675367 Thanks, everyone. Zac Dover

3 years, 12 months

2
1
0 0

Unable to reshard bucket

by Timothy Geier

Hello all, I'm having an issue with a bucket that refuses to be resharded..for the record, the cluster was recently upgraded from 13.2.4 to 13.2.10. # radosgw-admin reshard add --bucket foo --num-shards 3300 ERROR: the bucket is currently undergoing resharding and cannot be added to the reshard list at this time # radosgw-admin reshard list [] # radosgw-admin reshard status --bucket=foo [ { "reshard_status": "not-resharding", "new_bucket_instance_id": "", "num_shards": -1 }, <snip> # radosgw-admin reshard cancel --bucket foo ERROR: failed to remove entry from reshard log, oid=reshard.0000000009 tenant= bucket=foo # radosgw-admin reshard stale-instances list [] Is there anything else I should check to troubleshoot this? I was able to reshard another bucket since the upgrade, so I suspect there's something lingering that's blocking this.

3 years, 12 months

2
2
0 0

Ceph Apply/Commit vs Read/Write Op Latency

by John Petrini

Hello, I was hoping someone could clear up the difference between these metrics. In filestore the difference between Apply and Commit Latency was pretty clear and these metrics gave a good representation of how the cluster was performing. High commit usually meant our journals were performing poorly while high apply pointed to an OSD issue. With bluestore Apply & Commit are now tied to the same metric and it's not as clear to me what that metric is. In addition new metrics such as Read and Write Op Latency have been added. I'm led to believe that these are similar to what Apply Latency used to represent but is that actually the case? If anyone who has a better understanding of this than I do can enlighten me I'd appreciate it! Thanks, John

3 years, 12 months

1
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2020