Hi Ceph Users,
I'm struggling with an issue that I'm hoping someone can point me towards a
solution.
We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup that
I'm working with has 3 VMs, but of-course our design expects this to be scaled by a
user as appropriate. I have a cluster deployed and it's functioning happily as storage
for our product, the error occurs when I go to setup a second cluster and pair it with the
first. I'm using ceph-ansible to deploy. I get the following error about 20 minutes
into running the site-container playbook.
2020-07-09 14:21:10,966 p=2134 u=qs-admin | TASK [ceph-rgw : fetch the realm]
***********************************************************************************************
************************************************************************************
2020-07-09 14:21:10,966 p=2134 u=qs-admin | Thursday 09 July 2020 14:21:10 +0000
(0:00:00.410) 0:16:18.245 *********
2020-07-09 14:21:11,901 p=2134 u=qs-admin | fatal: [10.225.21.213 -> 10.225.21.213]:
FAILED! => changed=true
cmd:
- docker
- exec
- ceph-mon-albamons_sc2
- radosgw-admin
- realm
- pull
- --url=https://10.225.36.197:7480
- --access-key=2CQ006Lereqpysbr0l0s
- --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
delta: '0:00:00.545895'
end: '2020-07-09 14:21:11.516539'
msg: non-zero return code
rc: 13
start: '2020-07-09 14:21:10.970644'
stderr: |-
request failed: (13) Permission denied
If the realm has been changed on the master zone, the master zone's gateway may
need to be restarted to recognize this user.
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
Re-running the command manually reproduces the error. I understand that the permission
denied error appears to indicate the keys are not valid, suggested by
https://tracker.ceph.com/issues/36619. However, I've triple checked the keys are
correct on the other site. I'm at a loss of where to look for debugging, I've
turned up logs on both the local and remote site for RGW and MON processes but neither
seem to yield anything related. I've tried restarting everything as suggested in the
error text from all the processes to a full reboot of all the VMs. I've no idea why
the keys are being declined either, as they are correct (or atleast `radosgw-admin period
get` on the primary site thinks so).
Thanks for your help,
Alex