January 2021 - ceph-users

by Mika Saari

Hi, Using Ceph 15.2.8 installed with cephadm. Trying to get RadosGW to work. I have managed to get the RadosGW working. I can manage it through a dashboard and use aws s3 client to create new buckets etc. When trying to use swift I get errors. Not sure how to continue to track the problem here. Any tips are welcome. Thank you very much, -Mika ------- What I have done and what are the results. Some data changed manually ------- What I have done: At OpenStack Side: 1) openstack user create --domain default --password-prompt swift 2) openstack role add --project service --user swift admin 3) openstack endpoint create --region RegionOne object-store public http://ceph1/swift/v1/AUTH_%\(project_id\)s 4) openstack endpoint create --region RegionOne object-store internal http://ceph1/swift/v1/AUTH_%\(project_id\)s 5) openstack endpoint create --region RegionOne object-store admin http://ceph1/swift/v1 At Ceph side: 1) ceph config set mgr rgw_keystone_api_version 3 2) ceph config set mgr rgw_keystone_url http://controller:5000 3) ceph config set mgr rgw_keystone_accepted_admin_roles admin 4) ceph config set mgr rgw_keystone_admin_user swift 5) ceph config set mgr rgw_keystone_admin_password swift_test 6) ceph config set mgr rgw_keystone_admin_domain default 7) ceph config set mgr rgw_keystone_admin_project service for project I have tested different projects e.g. service and admin Now when testing the API using swift client I get next: 1) swift post test3 --debug DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://controller:5000/v3/auth/tokens DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): controller:5000 DEBUG:urllib3.connectionpool:http://controller:5000 "POST /v3/auth/tokens HTTP/1.1" 201 7032 . some openstack data here . DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ceph1:80 DEBUG:urllib3.connectionpool:http://ceph1:80 "POST /swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 HTTP/1.1" 401 12 INFO:swiftclient:REQ: curl -i http://ceph1/swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 -X POST -H "X-Auth-Token: <Token would be here>" -H "Content-Length: 0" INFO:swiftclient:RESP STATUS: 401 Unauthorized and finally I get Container POST failed: http://ceph1/swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 401 Unauthorized b'AccessDenied'

3 years, 3 months

4
12
0 0

OSDs in pool full : can't restart to clean

by Paul Mezzanini

Hey all We landed in a bad place (tm) with our nvme metadata tier. I'll root cause how we got here after it's all back up. I suspect it was a pool got misconfigured and just filled it all up. Short version, the OSDs are all full (or full enough) that I can't get them to spin back up. They crash with enospc. Average fragmentation for block is in the .8 range and bluefs-db is slightly better (using ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-412 free-score). I've tried all sorts of things. I was able to get a few to spin up but once they came up and rejoined they tried to pull MORE data in and crashed out again. I changed the crush_rule for the pool I care about to a much larger (and slower) set of disks. That way if I get anything else to come up I'm not just making it worse. I increased the size of the backing LV for one of the OSDs to see if I could get ceph-bluestore-tool to expand it, but that too crashes out enospc. In theory, there are a few pools I don't care about as much on there and I could delete them to make space, but I can't get them up enough -or- get the offline tools to do it. Some logs from the attempted expansion that fails: [root@ceph-b-07 ceph-412]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-412 bluefs-bdev-expand inferring bluefs devices from bluestore path 1 : device size 0x44aa000000 : own 0x[520000~20000,23e0000~620000,2ae0000~4d20000,78d0000~f30000,8900000~1600000,9fc0000~30000,a000000~5d00000,fe00000~3b00000,139e0000~5420000,19000000~100000, ::snip:: 4f0000~20000,25c17c0000~10000,25c2ea0000~20000,25c9f20000~10000,25d0860000~10000,25d50e0000~20000,25d5170000~10000,25ded20000~20000,25f4fc0000~20000] = 0x59c5b0000 : using 0x58f220000(22 GiB) : bluestore has 0x10260000(258 MiB) available Expanding DB/WAL... Expanding Main... 2021-01-13 16:40:46.481 7f33d1998ec0 -1 bluestore(/var/lib/ceph/osd/ceph-412) allocate_bluefs_freespace failed to allocate on 0x32c70000 min_size 0xf700000 > allocated total 0x1e80000 bluefs_shared_alloc_size 0x10000 allocated 0x1e80000 available 0x 90210000 2021-01-13 16:40:46.482 7f33d1998ec0 -1 bluefs _allocate failed to expand slow device to fit +0xf6f0def 2021-01-13 16:40:46.482 7f33d1998ec0 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0xf6f0def /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.15/rpm/el7/BUILD/ceph-14.2.15/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f33d1998ec0 time 2021-01-13 16:40:46.482978 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.15/rpm/el7/BUILD/ceph-14.2.15/src/os/bluestore/BlueFS.cc: 2351: ceph_abort_msg("bluefs enospc") ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable) The orignal LV under that is 172g and the new LV size is double that. I'm going to keep poking at this, but I'm really hoping for some new info. Either to increase the size of the OSDs to get it back up enough so I can then rebuild them with a different layout, delete some data I don't care about, pull the data off and put it back to defrag... I don't care which so long as I get it back up. Thanks -paul

3 years, 3 months

1
1
0 0

Which version of Ceph fully supports CephFS Snapshot?

by fantastic2085

I would like to use the Cephfs Snapshot feature, which version of Ceph supports it

3 years, 3 months

3
2
0 0

Question about osdmap

by Andrea Bolzonella

Hi all. I'm running version 14.2.16. My cluster is remapping and backfilling after "reweight-by-utilization". This morning I started to have some issues: - all my mgr active and standby crashed - MDS stopped reporting statistics - MON is using lot of memory - Cluster was very slow I restarted everything but osd I managed to get the statistics back, restarting the MDS a couple of times. All above issues are gone but I have this "mon.node1 (mon.0) 1529395 : cluster [DBG] osdmap e??????:" about 1 every 2 second with a different epoc. Question: Is it normal or there is something wrong? Should be the osdmap more "stable" ? What can change the osdmap? Thanks for the help Andrea

3 years, 3 months

1
0
0 0

Re: Global AVAIL vs Pool MAX AVAIL

by Mark Johnson

Thanks Anthony, Shortly after I made that post, I found a Server Fault post where someone had asked the exact same question. The reply was this - "The 'MAX AVAIL' column represents the amount of data that can be used before the first OSD becomes full. It takes into account the projected distribution of data across disks from the CRUSH map and uses the 'first OSD to fill up' as the target." To answer your question, yes we have a rather unbalanced cluster which is something I'm working on. When I saw these figures, I got scared that I had less time to work on it than I thought. There are about 10 pools in the cluster, but we primarily use one for almost all of our storage and it only has 64 pgs & 1 replica across 20 OSDs. So, as data has grown, it works out that each PG in this cluster accounts for about 148GB, and the OSDs are about 1.4TB each, so it's easy to see how it's found itself way out of balance. Anyway, once I've added the OSDs and data has rebalanced, I'm going to start the process of incrementally increasing the PG count for this pool in a staged process to reduce the amount of data per PG and (hopefully) balance out the data distribution better than it is. This is one big learning process - I just wish I wasn't learning in production so much. On Mon, 2021-01-11 at 15:58 -0800, Anthony D'Atri wrote: Either you have multiple CRUSH roots or device classes, or you have unbalanced OSD utilization. What version of Ceph? Do you have any balancing enabled? Do ceph osd df | sort -nk8 | head ceph osd df | sort -nk8 | tail and I’ll bet you have OSDs way more full than others. The STDDEV value that ceph df reports I suspect is accordingly high On Jan 11, 2021, at 2:07 PM, Mark Johnson < <mailto:markj@iovox.com> markj(a)iovox.com > wrote: Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following: GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 28219G 6840G 19945G 70.68 36112k But the POOLS table from the same output shows the MAX AVAIL for each pool as 498G and the pool with all the data shows 9472G used with a %USED of 95.00. If it matters, the pool size is set to 2 so my guess is the global available figure is raw, meaning I should still have approx. 3.4TB available, but that 95% used has me concerned. I'm going to be adding some OSDs soon but still would like to understand the difference and how much trouble I'm in at this point. _______________________________________________ ceph-users mailing list -- <mailto:ceph-users@ceph.io> ceph-users(a)ceph.io To unsubscribe send an email to <mailto:ceph-users-leave@ceph.io> ceph-users-leave(a)ceph.io

3 years, 3 months

2
1
0 0

Unable to cancel buckets from resharding queue

by Wout van Heeswijk

We are experiencing some issues with the bucket resharding queue in Ceph Mimic at one of our customers. I suspect that some of the issues are related to the upgrades of earlier versions of the cluster/radosgw. 1) When we cancel the resharding of a bucket, the bucket resharding entry is removed from the queue and almost immediately re-added. We confirm the removal by listing the omapkeys of all the reshard.0000## objects. The relevant omap key is temporarily removed. After a short time it is re-added. I haven't yet determined which process adds it back, but I can only think it is one of the two rgws. 2) We see a lot of objects,1265, in the reshard_pool: "default.rgw.log:reshard". Most look like they might be old to buckets markers or something like that. Most of these objects have not been touched in a long time (mtime 2018). Some numbers: Total buckets: 569 Objects in index pool: 649 Objects in default.rgw.log:reshard namespace: 1265 (of which 16 are the ‘rgw_reshard_num_logs’ objects) these are size '0' Buckets in reshard queue: 41 The objects in "default.rgw.log:reshard" have a similar naming schema to the index objects or , but I cannot relate them directly. Obviously there are loads more than the bucket index objects. The pools used for RGW are for an older generation of Radosgw: .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log default.rgw.users.uid default.rgw.users.keys default.rgw.buckets.index default.rgw.buckets.data default.rgw.users.email default.rgw.buckets.non-ec default.rgw.users.swift default.rgw.usage My two main questions are: 1) What process, other than dynamic resharding, could cause the re-adding of these buckets to the resharding queue? 2) Do more people see lots of objects in the reshard pool/namespace and can somebody help me understand what these object are? If somebody can point me in the direction of some more documentation or a good talk regarding the resharding mechanism that would also be great. Thanks and with kind regards, Wout 42on

3 years, 3 months

1
0
0 0

Global AVAIL vs Pool MAX AVAIL

by Mark Johnson

Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following: GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 28219G 6840G 19945G 70.68 36112k But the POOLS table from the same output shows the MAX AVAIL for each pool as 498G and the pool with all the data shows 9472G used with a %USED of 95.00. If it matters, the pool size is set to 2 so my guess is the global available figure is raw, meaning I should still have approx. 3.4TB available, but that 95% used has me concerned. I'm going to be adding some OSDs soon but still would like to understand the difference and how much trouble I'm in at this point.

3 years, 3 months

1
0
0 0

bluefs_buffered_io=false performance regression

by Robert Sander

Hi, bluefs_buffered_io was disabled in Ceph version 14.2.11. The cluster started last year with 14.2.5 and got upgraded over the year now running 14.2.16. The performance was OK first but got abysmal bad at the end of 2020. We checked the components and HDDs and SSDs seem to be fine. Single disk benchmarks showed performance according the specs. Today we (re-)enabled bluefs_buffered_io and restarted all OSD processes on 248 HDDs distributed over 12 nodes. Now the benchmarks are fine again: 434MB/s write instead of 60MB/s, 960MB/s read instead of 123MB/s. This setting was disabled in 14.2.11 because "in some test cases it appears to cause excessive swap utilization by the linux kernel and a large negative performance impact after several hours of run time." We have to monitor if this will happen in our cluster. Is there any other negative side effect currently known? Here are the rados bench values, first with bluefs_buffered_io=false, then with bluefs_buffered_io=true: Bench Total Total Write Object Band Stddev Max Min Average Stddev Max Min Average Stddev Max Min time writes Read size width Bandwidth IOPS Latency (s) run reads size (MB/sec) made false write 33,081 490 4194304 4194304 59,2485 71,3829 264 0 14 17,8702 66 0 1,07362 2,83017 20,71 0,0741089 false seq 15,8226 490 4194304 4194304 123,874 30 46,8659 174 0 0,51453 9,53873 0,00343417 false rand 38,2615 2131 4194304 4194304 222,782 55 109,374 415 0 0,28191 12,1039 0,00327948 true write 30,4612 3308 4194304 4194304 434,389 26,0323 480 376 108 6,50809 120 94 0,14683 0,07368 0,99791 0,0751249 true seq 13,7628 3308 4194304 4194304 961,429 240 22,544 280 184 0,06528 0,88676 0,00338191 true rand 30,1007 8247 4194304 4194304 1095,92 273 25,5066 313 213 0,05719 0,99140 0,00325295 Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

3 years, 3 months

3
3
0 0

denied reconnect attempt for ceph fs client

by Frank Schilder

Hi all, I'm not 100% sure, but I believe that since the update from mimic-13.2.8 to mimic-13.2.10 I have a strange issue. If a ceph fs client becomes unresponsive, it is evicted, but it cannot reconnect; see ceph.log extract below. In the past, clients would retry after the blacklist period and everything continued fine. I'm wondering why the clients cannot reconnect any more. I see this now every time a client gets thrown out and didn't have this problem before. Any hints to what I might want to change are welcome, and also information what might have changed during the update (eg. is this expected or not). 2021-01-11 19:45:53.839721 [INF] denied reconnect attempt (mds is up:active) from client.30770997 192.168.56.121:0/2325067585 after 1.82224e+06 (allowed interval 45) 2021-01-11 19:45:46.713822 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2021-01-11 19:45:45.937126 [INF] MDS health message cleared (mds.0): 1 slow requests are blocked > 30 secs 2021-01-11 19:45:39.527180 [INF] Evicting (and blacklisting) client session 30770997 (192.168.56.121:0/2325067585) 2021-01-11 19:45:39.527168 [WRN] evicting unresponsive client HOSTNAME:CLIENT_NAME (30770997), after 62.735 seconds 2021-01-11 19:45:39.522141 [WRN] 1 slow requests, 0 included below; oldest blocked for > 50.371751 secs 2021-01-11 19:45:34.522085 [WRN] 1 slow requests, 0 included below; oldest blocked for > 45.371685 secs 2021-01-11 19:45:29.521991 [WRN] 1 slow requests, 0 included below; oldest blocked for > 40.371604 secs 2021-01-11 19:45:24.521907 [WRN] 1 slow requests, 0 included below; oldest blocked for > 35.371520 secs 2021-01-11 19:45:19.521895 [WRN] slow request 30.371469 seconds old, received at 2021-01-11 19:44:49.150361: client_request(client.30771333:10419033 getattr pAsLsXsFs #0x10014446b0f 2021-01-11 19:44:49.145012 caller_uid=257062, caller_gid=257062{}) currently failed to rdlock, waiting Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 3 months

1
0
0 0

DocuBetter Meeting This Week -- 13 Jan 2021 1730 UTC

by John Zachary Dover

Unless an unforeseen crisis arises, the DocuBetter meetings for the next two months will focus on ensuring that we have a smooth and easy-to-understand docs suite for the release of Pacific. Meeting: https://bluejeans.com/908675367 Etherpad: https://pad.ceph.com/p/Ceph_Documentation

3 years, 3 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2021