On Dec 11, 2023, at 23:28, Eugen Block <eblock@nde.ag> wrote:

Update: apparently, we did it!
We walked through the disaster recovery steps where one of the steps was to reset the journal. I was under the impression that the specified command 'cephfs-journal-tool [--rank=N] journal reset' would simply reset all the journals (mdlog and purge_queue), but it seems like it doesn't. After Mykola (once again, thank you so much for your input) pointed towards running the command for the purge_queue specifically, the filesystem got out of the read-only mode and was mountable again. the exact command was:

cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal reset

We didn't have to walk through the recovery with an empty pool, which is nice. I have a suggestion to include the "journal inspect" command to the docs for both mdlog and purge_queue to understand that both journals might need a reset.

Thanks again, Mykola!
Eugen