Hi -
We keep on getting errors like these on specific OSDs with Nautilus (14.2.16):
2021-01-29 06:14:19.174 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service
keys; retrying
2021-01-29 06:14:49.173 7fbeaab92c00 0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:14:49.173 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service
keys; retrying
2021-01-29 06:15:19.173 7fbeaab92c00 0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:15:19.173 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service
keys; retrying
2021-01-29 06:15:49.174 7fbeaab92c00 0 monclient: wait_auth_rotating timed out after 30
2021-01-29 06:15:49.174 7fbeaab92c00 -1 osd.8 12568359 unable to obtain rotating service
keys; retrying
2021-01-29 06:15:49.174 7fbeaab92c00 -1 osd.8 12568359 init wait_auth_rotating timed out
From googling it seems like it could be a variety of things. We do think time is in sync.
It is particularly perplexing as we'll have a single OSD get this error while all
other OSDs on the same node are fine.
It seems exactly like this:
https://tracker.ceph.com/issues/17170
Stopping the managers and restarting the mons fixes it temporarily.
From this old thread we do have msgr2 enabled:
https://www.spinics.net/lists/ceph-users/msg60631.html
This blog seems to point to storage slowness being the root cause in there env:
http://www.florentflament.com/blog/ceph-monitor-status-switching-due-to-slo…
Any advice for sorting out what is causing this?
Thanks,
Will