On Mon, May 18, 2020 at 9:56 AM Ken Dreyer <kdreyer(a)redhat.com> wrote:
Hi folks,
I was reading
https://ceph.io/community/automatic-cephfs-recovery-after-blacklisting/
about the new recover_session=clean feature.
The end of that blog post says that this setting involves a trade-off:
"availability is more important than correctness"
Are there cases where the old behavior is really safer than simply
returning errors?
Basically: a frozen (hung mount) or dead (restarted box) application
can't have unintended side-effects. If the application is poorly
written to not handle I/O errors or to not fsync, then any undesirable
behavior resulting from that may occur after the mount reconnects.
It seems like this feature would not make things worse
for
applications. Can we make recover_session=clean the default?
There was a proposal for recover_session=strict which would (IIRC)
basically kill any application that had any file descriptor open with
the backend file system. That would probably be the safest default but
also the most intrusive and (perhaps) surprising. Unfortunately, I
think there were implementation issues that blocked it and we tabled
the idea.
Whether or not recover_session=clean should be the default is
undecided. I think we should wait to hear back from the community
testing it before deciding.
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D