Thanks! Oddly, all the dashboard checks you suggest appear normal, yet
the result remains broken.
Before I used your instruction about the dashboard, I have this result:
root@noc3:~# ceph dashboard get-prometheus-api-host
http://noc3.1.quietfountain.com:9095
root@noc3:~# netstat -6nlp | grep 9095
tcp6 0 0 :::9095 :::*
LISTEN 80963/prometheus
root@noc3:~#
To check it, I tried setting it to something random, the browser aimed
at the dashboard site reported no connection. The error message ended
when I restored the above. But the graphs remain empty, the numbers 1
and 0.5 on each.
Regarding the used storage, notice the overall usage is 43.6 of 111
TiB. Seems quite a distance from the trigger warning points of 85 and
95? The default values are in use. All the OSDs are between 37% to 42%
usage. What am I missing?
Thanks!
On 3/12/24 02:07, Nizamudeen A wrote:
> Hi,
>
> The warning and danger indicator in the capacity chart points to the
> nearful and full ratio set to the cluster and
> the default values for them are 85% and 95% respectively. You can do a
> `ceph osd dump | grep ratio` and see those.
>
> When this got introduced, there was a blog post
> <https://ceph.io/en/news/blog/2023/landing-page/#capacity-card>explaining
> how this is mapped in the chart. But when your used storage
> crosses that 85% mark, the chart is colored with yellow to indicate
> the user, and when it crosses 95% (or the full ratio) the
> chart is colored with red to tell that. But that doesn't mean the
> cluster is in bad shape but its a visual indicator to tell you
> you are running out of storage.
>
> Regarding the Cluster Utilization chart, it gets metrics directly from
> prometheus so that it can be used to show a time-series
> data in UI rather than the metrics at current point in time (which was
> used before). So if you have prometheus configured in
> dashboard and its url is provided in the dashboard settings `ceph
> dashboard set-prometheus-api-host <url-of-prometheus>`
> then you should be able to see the metrics.
>
> In case you need to read more about the new page you can check here
>
<https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>.
>
> Regards,
> Nizam
>
>
>
> On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin <hgcoin(a)gmail.com> wrote:
>
> Looking at ceph -s, all is well. Looking at the dashboard, 85% of my
> capacity is 'warned', and 95% is 'in danger'. There is no hint
> given
> as to the nature of the danger or reason for the warning. Though
> apparently with merely 5% of my ceph world 'normal', the cluster
> reports
> 'ok'. Which, you know, seems contradictory. I've used just under
> 40%
> of capacity.
>
> Further down the dashboard, all the subsections of 'Cluster
> Utilization'
> are '1' and '0.5' with nothing whatever in the graphics area.
>
> Previous versions of ceph presented a normal dashboard.
>
> It's just a little half rack, 5 hosts, a few physical drives each,
> been
> running ceph for a couple years now. Orchestrator is cephadm. It's
> just about as 'plain vanilla' at it gets. I've had to mute one
> alert,
> because cephadm refresh aborts when it finds drives on any host that
> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
> Seems unrelated to a totally messed up dashboard. (The tracker
> for that
> is here:
https://tracker.ceph.com/issues/63502 ).
>
> Any idea what the steps are to get useful stuff back on the
> dashboard?
> Any idea where I can learn what my 85% danger and 95% warning is
> 'about'? (You'd think 'danger' (The volcano is blowing up
now!)
> would
> be worse than 'warning' (the volcano might blow up soon) , so how can
> warning+danger > 100%, or if not additive how can warning < danger?)
>
> Here's a bit of detail:
>
> root@noc1:~# ceph -s
> cluster:
> id: 4067126d-01cb-40af-824a-881c130140f8
> health: HEALTH_OK
> (muted: CEPHADM_REFRESH_FAILED)
>
> services:
> mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
> mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
> noc3.sybsfb, noc1.jtteqg
> mds: 1/1 daemons up, 3 standby
> osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>
> data:
> volumes: 1/1 healthy
> pools: 16 pools, 1809 pgs
> objects: 12.29M objects, 17 TiB
> usage: 44 TiB used, 67 TiB / 111 TiB avail
> pgs: 1793 active+clean
> 9 active+clean+scrubbing
> 7 active+clean+scrubbing+deep
>
> io:
> client: 5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>