two updates on the design (after some discussions):
(1) "best effort queue" (stretch goal) is probably not needed:
- cls queue performance should be high enough when put on fast media pool
- the "acl level" settings allow for existing mechanism to perform as
"best effort" and non-blocking for topics that does not need delivery
guarantees
(2) since the cls queue does not allow for random access (without linear
search) the retries will have to be implemented based only on the end of
the queue. This means that we must assume that the acks or nack arrive in
the same order in which the notifications were set. This is true only for a
specific endpoint (e.g. a specific kafka broker) which means that there
will have to be a separate cls queue instance for each endpoint
On Tue, Jan 14, 2020 at 3:47 PM Yuval Lifshitz <ylifshit(a)redhat.com> wrote:
Dear Community,
Would like to share some design ideas around the above topic. Feedback is
welcomed!
Current State
- in "pull mode" [1] we have the same guarantees as the multisite syncing
mechanism (guarantee against HW/SW failures). On top of that, if writing
the event to RADOS fails, this trickle back as sync failure, which means
that the master zone will try to sync the pubsub zone
- in "push mode" [2] we send the notification from the ops context that
triggered the notification. The original operation is blocked until we get
a reply from the endpoint. As part of the configuration for the endpoint,
we also configure the "ack level", indicating whether we block until we get
a reply from the endpoint or not.
Since the operation response is not sent back to the client until the
endpoint acks, this method guarantees against any failure in the radosgw
(at the cost of adding latency to the operation).
This, however, does not guarantee delivery if the endpoint is down or
disconnected. The endpoint we interact with (rabbitmq, kafka) , usually
have built in redundancy mechanism, but this does not cover the case where
there is a network disconnect between our gateways and these systems.
In some cases we can get a nack from the endpoint, indicating that our
message would never reach the endpoint. But we can only log these cases:
- we cannot fail the operation that triggered us, because we send the
notification only after the actual operation (e.g. "put object") was done
(=no atomicity)
- no retry mechanism (in theory, we can add one)
Next Phase Requirements
We would like to add delivery guarantee to "push mode" for endpoint
failures. For that we would use a message queue with the following features:
- rados backed, so it would survive HW/SW failures
- blocking only on local read/writes (so it introduces smaller latency
than over-the-wire endpoint acks)
- has reserve/commit semantics, so we can "reserve" before the operation
(e.g. "put object") was done, and fail it if we cannot reserve a slot on
the queue, and commit the notification to the queue only after the
operation was successful (and unreserve if the operation failed)
- we would have a retry mechanism based on the queue, which means that if
a notification was successfully pushed into the queue, we can assume it
would (eventually) be successfully delivered to the endpoint
Proposed Solution
- use the cls_queue [3] (cls_queue is not omap based, hence, no builtin
iops limitations)
- add reserve/commit functionality (probably store that info in the queue
head)
- a dedicated thread(s) should be reading requests from the queue, sending
the notifications to the endpoints, and waiting for the replies (if needed)
- this should be done via coroutines
- acked requests are removed from the queue, nacked or timed-out requests
should be retried (at least for a while)
- both mechanism would coexist, as this would be configurable per topic
- as a stretch goal, we may add a "best effort queue". This would be
similar to the cls_queue solution, but won't address radosgw failures (as
the queue would be in-memory), only endpoint failures/disconnects
- for now, this mechanism won't be supported for pushing events from the
pubsub zone (="pull+push mode"), but might be added if users would find it
useful
Yuval
[1]
https://docs.ceph.com/docs/master/radosgw/pubsub-module/
[2]
https://docs.ceph.com/docs/master/radosgw/notifications/
[3]
https://github.com/ceph/ceph/tree/master/src/cls/queue