On Fri, Oct 30, 2020 at 2:13 AM Frank Schilder <frans(a)dtu.dk> wrote:
Dear cephers,
I have a somewhat strange situation. I have the health warning:
# ceph health detail
HEALTH_WARN 3 clients failing to respond to capability release
MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release
mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30716617
mdsceph-12(mds.0): Client sn269.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30717358
mdsceph-12(mds.0): Client sn009.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30749150
However, these clients are not busy right now. Also, they hold almost nothing; see
snippets from "session ls" below. It is possible that a very IO intensive
application was running on these nodes and these release requests got stuck. How do I
resolve this issue? Can I just evict the client?
Version is mimic 13.2.8. Note that we execute a drop cache command after a job finishes
on these clients. Its possible that the clients dropped the caps already before the MDS
request was handled/received.
Can you share any config changes you've made on the MDS?
Also, Mimic is EOL as you probably know. Please upgrade :)
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D