Thanks for the info. Comments inline:
On Wed, Oct 13, 2021 at 7:21 PM Dave Piper <david.piper(a)microsoft.com>
We're using pubsub!
We opted for pubsub over bucket notifications as the pull mode fits well
with our requirements.
1) We want to be able to guarantee that our client (the external server)
has received and processed each event. My initial understanding of bucket
notifications was that they weren't stored on ceph at all, and were simply
broadcast and then forgotten.
correct, this was the case in "nautilus" and "octopus"
Actually I see that the docs state the notification will be retried until
Is that guaranteed?
yes, "persistent notifications" were added in "pacific". note that if
queue for the notifications fills up, the S3 operations triggering them
Will ceph ultimately give up and drop an event?
we will try indefinitely (or until the topic is deleted)
Is there a way of seeing how many events have been
unacked / dropped?
for persistent notifications, we currently only have a global counter (for
all topics) indicating the notifications that were successfully sent:
"pubsub_push_ok". this is something we should probably improve on. filed:
feel free to add your thoughts there
2) Being able to pull a list of missed events back,
rather than receiving
them one at a time, allows our client to cut down on processing. As an
example, if the same object is updated 10 times, pubsub catchup list will
list 10 events for the same object, and the client can recognise this and
only needs to process the object once and ack all 10 events. The bucket
notification model suggests we will have to process each event in turn.
There are possibly ways we can work around this though (e.g. queue incoming
bucket notifications on the client and process them in batches).
our intent is not to deprecate the "pull" functionality. instead, we
to replace the pubsub sync module with a notification queue that external
applications would be able to pull from. the overall idea was presented in
note that this effort is in the early stages, so it is hard to forecast a
time when this would be ready
We've had a number of issues with pubsub and still
aren't confident in its
behaviour. Your post suggests its not well used, which might imply it has
less field hardening that bucket notifications.
correct. there are inherent problems with utilizing the multisite synching
mechanism for bucket notifications:
* some information on the original transactions is lost (since it is not
needed for syncing) and cannot be sent in the notification payload
* as you observed, there are duplicates... when syncing objects, duplicates
are not really an issue, as the end result is the same. but for
notifications they create a problem
* clients don't scale easily: unless you build a complex mechanism for the
client, there could be only one client processing the notifications
* setup is more complex for pubsub: it requires a separate zone;
non-standard tools (bucket notifications work with boto3 etc.)
If so, it sounds like it might be better for us both
if we switched to
using the bucket notifications method instead. It'd be good to get your
thoughts on how we could satisfy two requirements above.
until we deliver our own notification queue solution. would recommend using
an external one (kafka or rabbitmnq).
these solutions are reliable and persistent.
* you can have explicit "commits" in kafka to preserve the "backing"
semantics that you currently have
* amqp also has explicit consumer "acks" that serves the same purpose
If pubsub is likely to be deprecated, we'll need to start moving fast.
What's the latest thinking on this?
we are not going to deprecate that until we have an alternative solution in
From: Yuval Lifshitz <ylifshit(a)redhat.com>
Sent: 05 November 2020 06:57
To: ceph-users <ceph-users(a)ceph.io>
Subject: [ceph-users] RGW pubsub deprecation
NOTE: Message is from an external sender
Since Nautilus, we have 2 mechanisms for notifying 3rd parties on changes
in buckets and objects: "bucket notifications"  and "pubsub"
In "bucket notifications" (="push mode") the events are sent from the
to an external entity (kafka, rabbitmq etc.), while in "pubsub" (="pull
mode") the events are synched with a special zone, where they are stored
and could be later fetched by an external app.
From communications that I've seen so far, users preferred to use "bucket
notifications" over "pubsub". Since supporting both modes has
overhead, I was considering deprecating "pubsub".
However, before doing that I would like to see what the community has to
So, if you are currently using pubsub, or plan to use it, as "pull mode"
fits your usecase better than "push mode" please chime in.
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io