Well, the need for reservation is general, arising from need for
reliable delivery.
You could substitute the fifo abstraction if 128M
is too small.
Matt
On Thu, Jan 16, 2020 at 1:23 PM Casey Bodley <cbodley@redhat.com> wrote:
>
>
> On 1/16/20 12:13 PM, Yuval Lifshitz wrote:
> > two updates on the design (after some discussions):
> >
> > (1) "best effort queue" (stretch goal) is probably not needed:
> > - cls queue performance should be high enough when put on fast media pool
> > - the "acl level" settings allow for existing mechanism to perform as
> > "best effort" and non-blocking for topics that does not need delivery
> > guarantees
> >
> > (2) since the cls queue does not allow for random access (without
> > linear search) the retries will have to be implemented based only on
> > the end of the queue. This means that we must assume that the acks or
> > nack arrive in the same order in which the notifications were set.
> > This is true only for a specific endpoint (e.g. a specific kafka
> > broker) which means that there will have to be a separate cls queue
> > instance for each endpoint
> >
> >
> > On Tue, Jan 14, 2020 at 3:47 PM Yuval Lifshitz <ylifshit@redhat.com
> > <mailto:ylifshit@redhat.com>> wrote:
> >
> > Dear Community,
> > Would like to share some design ideas around the above topic.
> > Feedback is welcomed!
> >
> > Current State
> >
> > - in "pull mode" [1] we have the same guarantees as the multisite
> > syncing mechanism (guarantee against HW/SW failures). On top of
> > that, if writing the event to RADOS fails, this trickle back as
> > sync failure, which means that the master zone will try to sync
> > the pubsub zone
> >
> > - in "push mode" [2] we send the notification from the ops context
> > that triggered the notification. The original operation is blocked
> > until we get a reply from the endpoint. As part of the
> > configuration for the endpoint, we also configure the "ack level",
> > indicating whether we block until we get a reply from the endpoint
> > or not.
> > Since the operation response is not sent back to the client until
> > the endpoint acks, this method guarantees against any failure in
> > the radosgw (at the cost of adding latency to the operation).
> > This, however, does not guarantee delivery if the endpoint is down
> > or disconnected. The endpoint we interact with (rabbitmq, kafka) ,
> > usually have built in redundancy mechanism, but this does not
> > cover the case where there is a network disconnect between our
> > gateways and these systems.
> > In some cases we can get a nack from the endpoint, indicating that
> > our message would never reach the endpoint. But we can only log
> > these cases:
> > - we cannot fail the operation that triggered us, because we send
> > the notification only after the actual operation (e.g. "put
> > object") was done (=no atomicity)
> > - no retry mechanism (in theory, we can add one)
> >
> > Next Phase Requirements
> >
> > We would like to add delivery guarantee to "push mode" for
> > endpoint failures. For that we would use a message queue with the
> > following features:
> > - rados backed, so it would survive HW/SW failures
> > - blocking only on local read/writes (so it introduces smaller
> > latency than over-the-wire endpoint acks)
> > - has reserve/commit semantics, so we can "reserve" before the
> > operation (e.g. "put object") was done, and fail it if we cannot
> > reserve a slot on the queue, and commit the notification to the
> > queue only after the operation was successful (and unreserve if
> > the operation failed)
> >
> I guess this reservation piece is only a requirement because of the
> choice of cls_queue, which resides in a single rados object and so
> enforces a bound on the total space used. The maximum size is
> configurable, but can't exceed osd_max_object_size=128M. How many
> notifications could we fit within that the 128M limit?
I worry that
> clusters at a sufficient scale could fill that pretty quickly if the
> notification endpoint is unavailable or slow, and that would leave
> radosgw unable to satisfy any requests that would generate a notification.
>
> > - we would have a retry mechanism based on the queue, which means
> > that if a notification was successfully pushed into the queue, we
> > can assume it would (eventually) be successfully delivered to the
> > endpoint
> >
> > Proposed Solution
> >
> > - use the cls_queue [3] (cls_queue is not omap based, hence, no
> > builtin iops limitations)
> > - add reserve/commit functionality (probably store that info in
> > the queue head)
> > - a dedicated thread(s) should be reading requests from the queue,
> > sending the notifications to the endpoints, and waiting for
> > the replies (if needed) - this should be done via coroutines
> > - acked requests are removed from the queue, nacked or
> > timed-out requests should be retried (at least for a while)
> > - both mechanism would coexist, as this would be configurable per
> > topic
> > - as a stretch goal, we may add a "best effort queue". This would
> > be similar to the cls_queue solution, but won't address
> > radosgw failures (as the queue would be in-memory), only endpoint
> > failures/disconnects
> > - for now, this mechanism won't be supported for pushing events
> > from the pubsub zone (="pull+push mode"), but might be added if
> > users would find it useful
> >
> > Yuval
> >
> > [1] https://docs.ceph.com/docs/master/radosgw/pubsub-module/
> > [2] https://docs.ceph.com/docs/master/radosgw/notifications/
> > [3] https://github.com/ceph/ceph/tree/master/src/cls/queue
> >
> >
> > _______________________________________________
> > Dev mailing list -- dev@ceph.io
> > To unsubscribe send an email to dev-leave@ceph.io
> _______________________________________________
> Dev mailing list -- dev@ceph.io
> To unsubscribe send an email to dev-leave@ceph.io
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
_______________________________________________
Dev mailing list -- dev@ceph.io
To unsubscribe send an email to dev-leave@ceph.io