Hi,
Using Ceph 15.2.8 installed with cephadm. Trying to get RadosGW to work.
I have managed to get the RadosGW working. I can manage it through a
dashboard and use aws s3 client to create new buckets etc. When trying to
use swift I get errors.
Not sure how to continue to track the problem here. Any tips are welcome.
Thank you very much,
-Mika
------- What I have done and what are the results. Some data changed
manually -------
What I have done:
At OpenStack Side:
1) openstack user create --domain default --password-prompt swift
2) openstack role add --project service --user swift admin
3) openstack endpoint create --region RegionOne object-store public
http://ceph1/swift/v1/AUTH_%\(project_id\)s
4) openstack endpoint create --region RegionOne object-store internal
http://ceph1/swift/v1/AUTH_%\(project_id\)s
5) openstack endpoint create --region RegionOne object-store admin
http://ceph1/swift/v1
At Ceph side:
1) ceph config set mgr rgw_keystone_api_version 3
2) ceph config set mgr rgw_keystone_url http://controller:5000
3) ceph config set mgr rgw_keystone_accepted_admin_roles admin
4) ceph config set mgr rgw_keystone_admin_user swift
5) ceph config set mgr rgw_keystone_admin_password swift_test
6) ceph config set mgr rgw_keystone_admin_domain default
7) ceph config set mgr rgw_keystone_admin_project service
for project I have tested different projects e.g. service and admin
Now when testing the API using swift client I get next:
1) swift post test3 --debug
DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to
http://controller:5000/v3/auth/tokens
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
controller:5000
DEBUG:urllib3.connectionpool:http://controller:5000 "POST /v3/auth/tokens
HTTP/1.1" 201 7032
. some openstack data here .
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ceph1:80
DEBUG:urllib3.connectionpool:http://ceph1:80 "POST
/swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 HTTP/1.1" 401 12
INFO:swiftclient:REQ: curl -i
http://ceph1/swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 -X POST -H
"X-Auth-Token: <Token would be here>" -H "Content-Length: 0"
INFO:swiftclient:RESP STATUS: 401 Unauthorized
and finally I get
Container POST failed:
http://ceph1/swift/v1/AUTH_adsfasdfasdfasdfasdfasdf/test3 401 Unauthorized
b'AccessDenied'
Hey all
We landed in a bad place (tm) with our nvme metadata tier. I'll root cause how we got here after it's all back up. I suspect it was a pool got misconfigured and just filled it all up.
Short version, the OSDs are all full (or full enough) that I can't get them to spin back up. They crash with enospc. Average fragmentation for block is in the .8 range and bluefs-db is slightly better (using ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-412 free-score). I've tried all sorts of things. I was able to get a few to spin up but once they came up and rejoined they tried to pull MORE data in and crashed out again.
I changed the crush_rule for the pool I care about to a much larger (and slower) set of disks. That way if I get anything else to come up I'm not just making it worse.
I increased the size of the backing LV for one of the OSDs to see if I could get ceph-bluestore-tool to expand it, but that too crashes out enospc.
In theory, there are a few pools I don't care about as much on there and I could delete them to make space, but I can't get them up enough -or- get the offline tools to do it.
Some logs from the attempted expansion that fails:
[root@ceph-b-07 ceph-412]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-412 bluefs-bdev-expand
inferring bluefs devices from bluestore path
1 : device size 0x44aa000000 : own 0x[520000~20000,23e0000~620000,2ae0000~4d20000,78d0000~f30000,8900000~1600000,9fc0000~30000,a000000~5d00000,fe00000~3b00000,139e0000~5420000,19000000~100000,
::snip::
4f0000~20000,25c17c0000~10000,25c2ea0000~20000,25c9f20000~10000,25d0860000~10000,25d50e0000~20000,25d5170000~10000,25ded20000~20000,25f4fc0000~20000] = 0x59c5b0000 : using 0x58f220000(22 GiB) : bluestore has 0x10260000(258 MiB) available
Expanding DB/WAL...
Expanding Main...
2021-01-13 16:40:46.481 7f33d1998ec0 -1 bluestore(/var/lib/ceph/osd/ceph-412) allocate_bluefs_freespace failed to allocate on 0x32c70000 min_size 0xf700000 > allocated total 0x1e80000 bluefs_shared_alloc_size 0x10000 allocated 0x1e80000 available 0x 90210000
2021-01-13 16:40:46.482 7f33d1998ec0 -1 bluefs _allocate failed to expand slow device to fit +0xf6f0def
2021-01-13 16:40:46.482 7f33d1998ec0 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0xf6f0def
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.15/rpm/el7/BUILD/ceph-14.2.15/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f33d1998ec0 time 2021-01-13 16:40:46.482978
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.15/rpm/el7/BUILD/ceph-14.2.15/src/os/bluestore/BlueFS.cc: 2351: ceph_abort_msg("bluefs enospc")
ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
The orignal LV under that is 172g and the new LV size is double that.
I'm going to keep poking at this, but I'm really hoping for some new info. Either to increase the size of the OSDs to get it back up enough so I can then rebuild them with a different layout, delete some data I don't care about, pull the data off and put it back to defrag... I don't care which so long as I get it back up.
Thanks
-paul
Hi all.
I'm running version 14.2.16.
My cluster is remapping and backfilling after "reweight-by-utilization".
This morning I started to have some issues:
- all my mgr active and standby crashed
- MDS stopped reporting statistics
- MON is using lot of memory
- Cluster was very slow
I restarted everything but osd
I managed to get the statistics back, restarting the MDS a couple of times.
All above issues are gone but I have this "mon.node1 (mon.0) 1529395 :
cluster [DBG] osdmap e??????:" about 1 every 2 second with a different
epoc.
Question:
Is it normal or there is something wrong?
Should be the osdmap more "stable" ?
What can change the osdmap?
Thanks for the help
Andrea
Thanks Anthony,
Shortly after I made that post, I found a Server Fault post where someone had asked the exact same question. The reply was this - "The 'MAX AVAIL' column represents the amount of data that can be used before the first OSD becomes full. It takes into account the projected distribution of data across disks from the CRUSH map and uses the 'first OSD to fill up' as the target."
To answer your question, yes we have a rather unbalanced cluster which is something I'm working on. When I saw these figures, I got scared that I had less time to work on it than I thought. There are about 10 pools in the cluster, but we primarily use one for almost all of our storage and it only has 64 pgs & 1 replica across 20 OSDs. So, as data has grown, it works out that each PG in this cluster accounts for about 148GB, and the OSDs are about 1.4TB each, so it's easy to see how it's found itself way out of balance.
Anyway, once I've added the OSDs and data has rebalanced, I'm going to start the process of incrementally increasing the PG count for this pool in a staged process to reduce the amount of data per PG and (hopefully) balance out the data distribution better than it is.
This is one big learning process - I just wish I wasn't learning in production so much.
On Mon, 2021-01-11 at 15:58 -0800, Anthony D'Atri wrote:
Either you have multiple CRUSH roots or device classes, or you have unbalanced OSD utilization. What version of Ceph? Do you have any balancing enabled?
Do
ceph osd df | sort -nk8 | head
ceph osd df | sort -nk8 | tail
and I’ll bet you have OSDs way more full than others. The STDDEV value that ceph df reports I suspect is accordingly high
On Jan 11, 2021, at 2:07 PM, Mark Johnson <
<mailto:markj@iovox.com>
markj(a)iovox.com
> wrote:
Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
28219G 6840G 19945G 70.68 36112k
But the POOLS table from the same output shows the MAX AVAIL for each pool as 498G and the pool with all the data shows 9472G used with a %USED of 95.00. If it matters, the pool size is set to 2 so my guess is the global available figure is raw, meaning I should still have approx. 3.4TB available, but that 95% used has me concerned. I'm going to be adding some OSDs soon but still would like to understand the difference and how much trouble I'm in at this point.
_______________________________________________
ceph-users mailing list --
<mailto:ceph-users@ceph.io>
ceph-users(a)ceph.io
To unsubscribe send an email to
<mailto:ceph-users-leave@ceph.io>
ceph-users-leave(a)ceph.io
We are experiencing some issues with the bucket resharding queue in Ceph Mimic at one of our customers. I suspect that some of the issues are related to the upgrades of earlier versions of the cluster/radosgw.
1) When we cancel the resharding of a bucket, the bucket resharding entry is removed from the queue and almost immediately re-added. We confirm the removal by listing the omapkeys of all the reshard.0000## objects. The relevant omap key is temporarily removed. After a short time it is re-added. I haven't yet determined which process adds it back, but I can only think it is one of the two rgws.
2) We see a lot of objects,1265, in the reshard_pool: "default.rgw.log:reshard". Most look like they might be old to buckets markers or something like that. Most of these objects have not been touched in a long time (mtime 2018).
Some numbers:
Total buckets: 569
Objects in index pool: 649
Objects in default.rgw.log:reshard namespace: 1265 (of which 16 are the ‘rgw_reshard_num_logs’ objects) these are size '0'
Buckets in reshard queue: 41
The objects in "default.rgw.log:reshard" have a similar naming schema to the index objects or , but I cannot relate them directly. Obviously there are loads more than the bucket index objects.
The pools used for RGW are for an older generation of Radosgw:
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.users.uid
default.rgw.users.keys
default.rgw.buckets.index
default.rgw.buckets.data
default.rgw.users.email
default.rgw.buckets.non-ec
default.rgw.users.swift
default.rgw.usage
My two main questions are:
1) What process, other than dynamic resharding, could cause the re-adding of these buckets to the resharding queue?
2) Do more people see lots of objects in the reshard pool/namespace and can somebody help me understand what these object are?
If somebody can point me in the direction of some more documentation or a good talk regarding the resharding mechanism that would also be great.
Thanks and with kind regards,
Wout
42on
Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
28219G 6840G 19945G 70.68 36112k
But the POOLS table from the same output shows the MAX AVAIL for each pool as 498G and the pool with all the data shows 9472G used with a %USED of 95.00. If it matters, the pool size is set to 2 so my guess is the global available figure is raw, meaning I should still have approx. 3.4TB available, but that 95% used has me concerned. I'm going to be adding some OSDs soon but still would like to understand the difference and how much trouble I'm in at this point.
Hi,
bluefs_buffered_io was disabled in Ceph version 14.2.11.
The cluster started last year with 14.2.5 and got upgraded over the year now running 14.2.16.
The performance was OK first but got abysmal bad at the end of 2020.
We checked the components and HDDs and SSDs seem to be fine. Single disk benchmarks showed performance according the specs.
Today we (re-)enabled bluefs_buffered_io and restarted all OSD processes on 248 HDDs distributed over 12 nodes.
Now the benchmarks are fine again: 434MB/s write instead of 60MB/s, 960MB/s read instead of 123MB/s.
This setting was disabled in 14.2.11 because "in some test cases it appears to cause excessive swap utilization by the linux kernel and a large negative performance impact after several hours of run time."
We have to monitor if this will happen in our cluster. Is there any other negative side effect currently known?
Here are the rados bench values, first with bluefs_buffered_io=false, then with bluefs_buffered_io=true:
Bench Total Total Write Object Band Stddev Max Min Average Stddev Max Min Average Stddev Max Min
time writes Read size width Bandwidth IOPS Latency (s)
run reads size (MB/sec)
made
false write 33,081 490 4194304 4194304 59,2485 71,3829 264 0 14 17,8702 66 0 1,07362 2,83017 20,71 0,0741089
false seq 15,8226 490 4194304 4194304 123,874 30 46,8659 174 0 0,51453 9,53873 0,00343417
false rand 38,2615 2131 4194304 4194304 222,782 55 109,374 415 0 0,28191 12,1039 0,00327948
true write 30,4612 3308 4194304 4194304 434,389 26,0323 480 376 108 6,50809 120 94 0,14683 0,07368 0,99791 0,0751249
true seq 13,7628 3308 4194304 4194304 961,429 240 22,544 280 184 0,06528 0,88676 0,00338191
true rand 30,1007 8247 4194304 4194304 1095,92 273 25,5066 313 213 0,05719 0,99140 0,00325295
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Hi all,
I'm not 100% sure, but I believe that since the update from mimic-13.2.8 to mimic-13.2.10 I have a strange issue. If a ceph fs client becomes unresponsive, it is evicted, but it cannot reconnect; see ceph.log extract below. In the past, clients would retry after the blacklist period and everything continued fine. I'm wondering why the clients cannot reconnect any more. I see this now every time a client gets thrown out and didn't have this problem before.
Any hints to what I might want to change are welcome, and also information what might have changed during the update (eg. is this expected or not).
2021-01-11 19:45:53.839721 [INF] denied reconnect attempt (mds is up:active) from client.30770997 192.168.56.121:0/2325067585 after 1.82224e+06 (allowed interval 45)
2021-01-11 19:45:46.713822 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)
2021-01-11 19:45:45.937126 [INF] MDS health message cleared (mds.0): 1 slow requests are blocked > 30 secs
2021-01-11 19:45:39.527180 [INF] Evicting (and blacklisting) client session 30770997 (192.168.56.121:0/2325067585)
2021-01-11 19:45:39.527168 [WRN] evicting unresponsive client HOSTNAME:CLIENT_NAME (30770997), after 62.735 seconds
2021-01-11 19:45:39.522141 [WRN] 1 slow requests, 0 included below; oldest blocked for > 50.371751 secs
2021-01-11 19:45:34.522085 [WRN] 1 slow requests, 0 included below; oldest blocked for > 45.371685 secs
2021-01-11 19:45:29.521991 [WRN] 1 slow requests, 0 included below; oldest blocked for > 40.371604 secs
2021-01-11 19:45:24.521907 [WRN] 1 slow requests, 0 included below; oldest blocked for > 35.371520 secs
2021-01-11 19:45:19.521895 [WRN] slow request 30.371469 seconds old, received at 2021-01-11 19:44:49.150361: client_request(client.30771333:10419033 getattr pAsLsXsFs #0x10014446b0f 2021-01-11 19:44:49.145012 caller_uid=257062, caller_gid=257062{}) currently failed to rdlock, waiting
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14