Good call. I just restarted the whole cluster,
but the problem still
persists.
I don't think it is a problem with the rados, but with the radosgw.
But I still struggle to pin the issue.
Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
Thomas.Schneider-q2p(a)ruhr-uni-bochum.de>gt;:
Hey all,
we had slow RGW access when some OSDs were slow due to an (to us)
unknown OSD bug that made PG access either slow or impossible. (It showed
itself through slowness of the mgr as well, but nothing other than that).
We restarted all OSDs that held RGW data and the problem was gone.
I have no good way to debug the problem since it never occured again
after we restarted the OSDs.
Kind regards,
Thomas
Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens <bb(a)kervyn.de>de>:
Hi Amit,
I just pinged the mons from every system and they are all available.
Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
amitg.b14(a)gmail.com>gt;:
> We seen slowness due to unreachable one of them mgr service, maybe
here
> are different, you can check monmap/
ceph.conf mon entry and then
verify
> all nodes are successfully ping.
>
>
> -AmitG
>
>
> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens <bb(a)kervyn.de> wrote:
>
>> Hi guys,
>>
>> does someone got any idea?
>>
>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens <bb(a)kervyn.de
:
>>
>> > Hi,
>> > since a couple of days we experience a strange slowness on some
>> > radosgw-admin operations.
>> > What is the best way to debug this?
>> >
>> > For example creating a user takes over 20s.
>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> > --display-name=test-bb-user
>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first
you
>> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.uid:test-bb-user
>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first
you
>> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.keys:****
>> > {
>> > "user_id": "test-bb-user",
>> > "display_name": "test-bb-user",
>> > ....
>> > }
>> > real 0m20.557s
>> > user 0m0.087s
>> > sys 0m0.030s
>> >
>> > First I thought that rados operations might be slow, but adding and
>> > deleting objects in rados are fast as usual (at least from my
>> perspective).
>> > Also uploading to buckets is fine.
>> >
>> > We changed some things and I think it might have to do with this:
>> > * We have a HAProxy that distributes via leastconn between the 3
>> radosgw's
>> > (this did not change)
>> > * We had three times a daemon with the name "eu-central-1" running
(on
>> the
>> > 3 radosgw's)
>> > * Because this might have led to our data duplication problem, we
have
>> > split that up so now the daemons are
named per host
(eu-central-1-s3db1,
>> > eu-central-1-s3db2,
eu-central-1-s3db3)
>> > * We also added dedicated rgw daemons for garbage collection,
because
>> the
>> > current one were not able to keep up.
>> > * So basically ceph status went from "rgw: 1 daemon active
>> (eu-central-1)"
>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >
>> >
>> > Cheers
>> > Boris
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal
abweichend im
>> groüen Saal.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>
--
Thomas Schneider
IT.SERVICES
Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
Bochum
Telefon: +49 234 32 23939
http://www.it-services.rub.de/
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.