Hi Stefan,
I was not able to reproduce the issue of not reconnecting after slow-down.
My steps are documented here:
Can you please share some of the radosgw logs after the broker is up again
and the reconnect fails?
Regardless, there are several race conditions that happened with kafka and
persistent notifications and also exists for amqp. Will be fixing that as
part of:
Yuval
On Sun, Jun 11, 2023 at 11:48 AM Yuval Lifshitz <ylifshit(a)redhat.com> wrote:
Hi Stefan,
Thanks for the inputs. Replied inline
On Fri, Jun 9, 2023 at 6:53 PM Stefan Reuter <stefan.reuter(a)reucon.com>
wrote:
Hi Yuval,
Thanks for having a look at bucket notifications and collecting
feedback. I also see potential for improvement in the area of bucket
notifications.
We have observed issues in a setup with Rabbit MQ as a broker where the
RADOS queue seems to fill up and cients receive "slow down" replies.
Unfortunately this state did not recover. The only solution to overcome
the situation was to remove and recreate the topic and bucket
notification configuration. This happened multiple times on differenct
ceph clusters with latest quincy.
will check that. We had a similar issue with Kafka broker that was
recently fixed.
It would be great to improve the ability to
monitor bucket notifications
(e.g. via prometheus/grafana) to see the RADOS queues and their
usage/queue depth as well as the health of the process that consumes the
queue and passes the notifications to the broker.
agree. we are working on that. see:
https://tracker.ceph.com/issues/52927
> For our use case notifications are very important as they trigger
> downstream processing of the uploaded files. If the notification does
> not happen, the files are not processed and the result is the same as if
> the upload did not happen at all.
>
> Best regards,
>
> Stefan
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>