I think in order to fix this we need some infrastructure in the mons
to make commands truly idempotent. Since all update operations go
through the leader monitor, we could add a unique request id on the
client command (if there isn't already one) and maintain a persistent
set of recent completed commands in paxos.
This kind of approach is probably also a win because a *lot* of random
mon commands have subtle idempotency bugs where they respond
immediately based on uncommitted state that may be lost in an untimely
election or mon restart/failure.
s
On Mon, May 10, 2021 at 2:07 PM Brad Hubbard <bhubbard(a)redhat.com> wrote:
>
> The purpose of this email is to trigger a discussion on how we rectify
> the following situation so client commands are executed by the monitor
> only once and in the order they are submitted.
>
>
https://tracker.ceph.com/issues/49428 describes a scenario where we
> can end up with commands executed more than once and out of order
> according to the client.
>
> In that tracker a client sends an 'erasure-code-profile rm' to mon.c
> and immediately receives an injected connection failure. The client
> then connects to mon.a and reissues the 'rm' command which returns the
> expected 'does not exist' result. The client code then issues an
> 'erasure-code-profile set' command. Shortly after this the original
> 'rm' command is forwarded to mon.a from mon.c and the 'set' command
is
> cancelled because of this and the command, and the test, fails.
>
> From the client side the commands executed look like this.
>
> erasure-code-profile rm
> erasure-code-profile set
>
> From the mon side it looks like this.
>
> erasure-code-profile rm
> erasure-code-profile set
> erasure-code-profile rm
>
> I appreciate any feedback on the best way to tackle this one.
>
> --
> Cheers,
> Brad
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io