Dear cephers,
I have a somewhat strange situation. I have the health warning:
# ceph health detail
HEALTH_WARN 3 clients failing to respond to capability release
MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release
mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30716617
mdsceph-12(mds.0): Client sn269.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30717358
mdsceph-12(mds.0): Client sn009.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to
capability release client_id: 30749150
However, these clients are not busy right now. Also, they hold almost nothing; see
snippets from "session ls" below. It is possible that a very IO intensive
application was running on these nodes and these release requests got stuck. How do I
resolve this issue? Can I just evict the client?
Version is mimic 13.2.8. Note that we execute a drop cache command after a job finishes on
these clients. Its possible that the clients dropped the caps already before the MDS
request was handled/received.
Best regards,
Frank
{
"id": 30717358,
"num_leases": 0,
"num_caps": 44,
"state": "open",
"request_load_avg": 0,
"uptime": 6632206.332307,
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.30717358 192.168.57.140:0/3212676185",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn269.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
--
{
"id": 30716617,
"num_leases": 0,
"num_caps": 48,
"state": "open",
"request_load_avg": 1,
"uptime": 6632206.336307,
"replay_requests": 0,
"completed_requests": 1,
"reconnecting": false,
"inst": "client.30716617 192.168.56.233:0/2770977433",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn106.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
--
{
"id": 30749150,
"num_leases": 0,
"num_caps": 44,
"state": "open",
"request_load_avg": 0,
"uptime": 6632206.338307,
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.30749150 192.168.56.136:0/2578719015",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn009.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14