Just a user opinion, maybe add the following additions to the options?
For option 1:
* Clear instructions how to remove all traces to the failed installation (if you can
automate it, you can write a manual) or provide instructions to start a cleanup script.
* Don't allow another deployment of Cephadm if there's a failed deployment, only
if everything is cleaned up.
For option 2:
* If an installation failed and gotten completely removed, don't allow another run
unless the user sets an override (or removes the thing which triggers the check for failed
installations). This to prevent a user in an endless loop to try and deploy Cephadm.
Inform the user about the last failed deployment, show the available options for a retry
and the option to keep the deployment files to troubleshoot the issue.
* If the deployment failed (or got interrupted) and the user wanted to keep a failed
deployment, provide just like Option 1 clear instructions how to clean up the failed
deployment.
With the above additions, I would prefer Option 1. Because there's almost always a
reason a deployment fails and I would like to investigate directly why it happened.
Best regards,
Sake