total ceph outage again, need help - ceph-users

20 May 2020

Dear cephers,

I'm sitting with a major ceph outage again. The mon/mgr hosts suffer from a packet
storm of ceph traffic between ceph fs clients and the mons. No idea why this is
happening.

Main problem is, that I can't get through to the cluster. Admin commands hang
forever:

[root@gnosis ~]# ceph osd set nodown

However, "ceph status" returns and shows me that I need to do something:

[root@gnosis ~]# ceph status
  cluster:
    id:     ---
    health: HEALTH_WARN
            2 MDSs report slow metadata IOs
            1 MDSs report slow requests
            8 osds down

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
    mgr: ceph-01(active, starting), standbys: ceph-02, ceph-03
    mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
    osd: 288 osds: 208 up, 216 in; 153 remapped pgs

  data:
    pools:   10 pools, 2545 pgs
    objects: 86.71 M objects, 218 TiB
    usage:   277 TiB used, 1.5 PiB / 1.8 PiB avail
    pgs:     2542 active+clean
             3    active+clean+scrubbing+deep

  io:
    client:   152 MiB/s rd, 72 MiB/s wr, 854 op/s rd, 796 op/s wr

Is there any way to get admin commands to the mons with higher priority?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14