I hope you are using a single network interface for
the public and cluster?
On Tue, May 11, 2021 at 2:15 PM Thomas Schneider <
Thomas.Schneider-q2p(a)ruhr-uni-bochum.de> wrote:
Hey all,
we had slow RGW access when some OSDs were slow due to an (to us) unknown
OSD bug that made PG access either slow or impossible. (It showed itself
through slowness of the mgr as well, but nothing other than that).
We restarted all OSDs that held RGW data and the problem was gone.
I have no good way to debug the problem since it never occured again
after we restarted the OSDs.
Kind regards,
Thomas
Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens <bb(a)kervyn.de>de>:
Hi Amit,
I just pinged the mons from every system and they are all available.
Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
amitg.b14(a)gmail.com>gt;:
> We seen slowness due to unreachable one of them mgr service, maybe here
> are different, you can check monmap/ ceph.conf mon entry and then
verify
all nodes
are successfully ping.
-AmitG
On Tue, 11 May 2021 at 12:12 AM, Boris Behrens <bb(a)kervyn.de> wrote:
> Hi guys,
>
> does someone got any idea?
>
> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens <bb(a)kervyn.de
:
>>
>> > Hi,
>> > since a couple of days we experience a strange slowness on some
>> > radosgw-admin operations.
>> > What is the best way to debug this?
>> >
>> > For example creating a user takes over 20s.
>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> > --display-name=test-bb-user
>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first
you
>> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.uid:test-bb-user
>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first
you
>> > don't succeed: (110) Connection
timed out
>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.keys:****
>> > {
>> > "user_id": "test-bb-user",
>> > "display_name": "test-bb-user",
>> > ....
>> > }
>> > real 0m20.557s
>> > user 0m0.087s
>> > sys 0m0.030s
>> >
>> > First I thought that rados operations might be slow, but adding and
>> > deleting objects in rados are fast as usual (at least from my
>> perspective).
>> > Also uploading to buckets is fine.
>> >
>> > We changed some things and I think it might have to do with this:
>> > * We have a HAProxy that distributes via leastconn between the 3
>> radosgw's
>> > (this did not change)
>> > * We had three times a daemon with the name "eu-central-1" running
(on
>> the
>> > 3 radosgw's)
>> > * Because this might have led to our data duplication problem, we
have
>> > split that up so now the daemons are
named per host
(eu-central-1-s3db1,
>> > eu-central-1-s3db2,
eu-central-1-s3db3)
>> > * We also added dedicated rgw daemons for garbage collection,
because
>> the
>> > current one were not able to keep up.
>> > * So basically ceph status went from "rgw: 1 daemon active
>> (eu-central-1)"
>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >
>> >
>> > Cheers
>> > Boris
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im
>
groüen Saal.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Thomas Schneider
IT.SERVICES
Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
Bochum
Telefon: +49 234 32 23939
http://www.it-services.rub.de/