New subject: radosgw-admin user create takes a long time (with failed to distribute cache message)

11 May 2021

Good call. I just restarted the whole cluster, but the problem still
persists.
I don't think it is a problem with the rados, but with the radosgw.

But I still struggle to pin the issue.

Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
Thomas.Schneider-q2p(a)ruhr-uni-bochum.de&gt;gt;:

...
  Hey all,

 we had slow RGW access when some OSDs were slow due to an (to us) unknown
 OSD bug that made PG access either slow or impossible. (It showed itself
 through slowness of the mgr as well, but nothing other than that).
 We restarted all OSDs that held RGW data and the problem was gone.
 I have no good way to debug the problem since it never occured again after
 we restarted the OSDs.

 Kind regards,
 Thomas

 Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens &lt;bb(a)kervyn.de&gt;de>:
 Hi Amit,

I just pinged the mons from every system and they are all available.

Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < 
amitg.b14(a)gmail.com&gt;gt;:

> We seen slowness due to unreachable one of them mgr service, maybe here
> are different, you can check monmap/ ceph.conf mon entry and then verify
> all nodes are successfully ping.
>
>
> -AmitG
>
>
> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens &lt;bb(a)kervyn.de&gt; wrote:
>
>> Hi guys,
>>
>> does someone got any idea?
>>
>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens &lt;bb(a)kervyn.de&gt;de>:
>>
>> > Hi,
>> > since a couple of days we experience a strange slowness on some
>> > radosgw-admin operations.
>> > What is the best way to debug this?
>> >
>> > For example creating a user takes over 20s.
>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> > --display-name=test-bb-user
>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first 
you
 >> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.uid:test-bb-user
>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first 
you
 >> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.keys:****
>> > {
>> >     "user_id": "test-bb-user",
>> >     "display_name": "test-bb-user",
>> >    ....
>> > }
>> > real 0m20.557s
>> > user 0m0.087s
>> > sys 0m0.030s
>> >
>> > First I thought that rados operations might be slow, but adding and
>> > deleting objects in rados are fast as usual (at least from my
>> perspective).
>> > Also uploading to buckets is fine.
>> >
>> > We changed some things and I think it might have to do with this:
>> > * We have a HAProxy that distributes via leastconn between the 3
>> radosgw's
>> > (this did not change)
>> > * We had three times a daemon with the name "eu-central-1" running
 (on
 >> the
>> > 3 radosgw's)
>> > * Because this might have led to our data duplication problem, we 
have
 >> > split that up so now the daemons are
named per host  (eu-central-1-s3db1,
 >> > eu-central-1-s3db2,
eu-central-1-s3db3)
>> > * We also added dedicated rgw daemons for garbage collection, because
>> the
>> > current one were not able to keep up.
>> > * So basically ceph status went from "rgw: 1 daemon active
>> (eu-central-1)"
>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >
>> >
>> > Cheers
>> >  Boris
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
 im
    groÃƒ¼en Saal.
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

 --
 Thomas Schneider
 IT.SERVICES
 Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
 Bochum
 Telefon: +49 234 32 23939
 http://www.it-services.rub.de/

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.

Re: radosgw-admin user create takes a long time (with failed to distribute cache message)