RGW Lifecycle Processing and Promote Master Process - ceph-users

14 Aug 2020

Hi,

I've previously discussed some issues I've had with the RGW lifecycle processing.
I've discovered that the root cause of my problem is that:

  *   I'm running a multisite configuration
     *   Life cycle processing is done on the master site each night. `radosgw-admin lc
list` correctly returns all buckets with lc config.
  *   I simulate the master site being destroyed from my VM host.
  *   I promote the secondary site to master following the instructions here: 
https://docs.ceph.com/docs/master/radosgw/multisite/
     *   The new master site isn't doing any lifecycle processing. `radosgw-admin lc
list` returns empty.
  *   I recreate a cluster and pair it with the new master site to get back to having
multisite redundancy.
     *   Neither site is doing any lifecycle processing. `radosgw-admin lc list` returns
empty.
So in the process of failover/recovery I have gone from having two paired clusters
performing lifecycle processing, to two paired clusters NOT performing lifecycle
processing.

Is this behaviour expected? I've found `radosgw-admin lc reshard fix` will
"remind" the cluster that I run it on that it needs to do lifecycle processing.
Although I found no mention of having to use this in the docs, for that command the docs
state it's only relevant on earlier Ceph versions. I'm running Nautilus 14.2.9.

In addition, if I have two healthy clusters paired in a multisite system, and swap the
master cluster by promoting the non-master, the demoted cluster seems to still continue
doing lifecycle processing, while the promote does not. If I run `radosgw-admin lc reshard
fix` on the promoted cluster, then both clusters seem to claim they are doing the
processing. Is this a happy state to be in?

Does anyone have any experience with this?

Thanks,
Alex