18.2.2 dashboard really messed up. - ceph-users

11 Mar 2024

Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my 
capacity is 'warned', and 95% is 'in danger'.   There is no hint given 
as to the nature of the danger or reason for the warning.  Though 
apparently with merely 5% of my ceph world 'normal', the cluster reports 
'ok'.  Which, you know, seems contradictory.  I've used just under 40% 
of capacity.

Further down the dashboard, all the subsections of 'Cluster Utilization' 
are '1' and '0.5' with nothing whatever in the graphics area.

Previous versions of ceph presented a normal dashboard.

It's just a little half rack, 5 hosts, a few physical drives each, been 
running ceph for a couple years now.  Orchestrator is cephadm.  It's 
just about as 'plain vanilla' at it gets.  I've had to mute one alert, 
because cephadm refresh aborts when it finds drives on any host that 
have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.  
Seems unrelated to a totally messed up dashboard.  (The tracker for that 
is here: https://tracker.ceph.com/issues/63502 ).

Any idea what the steps are to get useful stuff back on the dashboard?   
Any idea where I can learn what my 85% danger and 95% warning is 
'about'?  (You'd think 'danger' (The volcano is blowing up now!) 
would 
be worse than 'warning' (the volcano might blow up soon) , so how can 
warning+danger > 100%, or if not additive how can warning < danger?)

  Here's a bit of detail:

root@noc1:~# ceph -s
  cluster:
    id:     4067126d-01cb-40af-824a-881c130140f8
    health: HEALTH_OK
            (muted: CEPHADM_REFRESH_FAILED)

  services:
    mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
    mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac, 
noc3.sybsfb, noc1.jtteqg
    mds: 1/1 daemons up, 3 standby
    osd: 27 osds: 27 up (since 20m), 27 in (since 2d)

  data:
    volumes: 1/1 healthy
    pools:   16 pools, 1809 pgs
    objects: 12.29M objects, 17 TiB
    usage:   44 TiB used, 67 TiB / 111 TiB avail
    pgs:     1793 active+clean
             9    active+clean+scrubbing
             7    active+clean+scrubbing+deep

  io:
    client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr