Hi Eugen,
Yes, you are right.
After upgrade from v18.2.0 ---> v18.2.1 it is necessary to create the
ceph-exporter service manually and deploy to all hosts.
The dasboard is fine as well.
Thanks for help.
Martin
On 26/01/2024 00:17, Eugen Block wrote:
> Ah, there they are (different port):
>
> reef01:~ # curl
http://localhost:9926/metrics | grep ceph_osd_op | head
> % Total % Received % Xferd Average Speed Time Time Time
> Current
> Dload Upload Total Spent Left
> Speed
> 100 124k 100 124k 0 0 111M 0 --:--:-- --:--:--
> --:--:-- 121M
> # HELP ceph_osd_op Client operations
> # TYPE ceph_osd_op counter
> ceph_osd_op{ceph_daemon="osd.1"} 25
> ceph_osd_op{ceph_daemon="osd.4"} 543
> ceph_osd_op{ceph_daemon="osd.5"} 12192
> # HELP ceph_osd_op_delayed_degraded Count of ops delayed due to target
> object being degraded
> # TYPE ceph_osd_op_delayed_degraded counter
> ceph_osd_op_delayed_degraded{ceph_daemon="osd.1"} 0
> ceph_osd_op_delayed_degraded{ceph_daemon="osd.4"} 0
> ceph_osd_op_delayed_degraded{ceph_daemon="osd.5"} 0
>
> I can't check the dashboard right now, that I will definitely do
> tomorrow.
> Good night!
>
> Zitat von Eugen Block <eblock(a)nde.ag>ag>:
>
>> Yeah, it's mentioned in the upgrade docs [2]:
>>
>>> Monitoring & Alerting
>>> Ceph-exporter: Now the performance metrics for Ceph daemons
>>> are exported by ceph-exporter, which deploys on each daemon rather
>>> than using prometheus exporter. This will reduce performance
>>> bottlenecks.
>>
>>
>> [2]
>>
https://docs.ceph.com/en/latest/releases/reef/#major-changes-from-quincy
>>
>> Zitat von Eugen Block <eblock(a)nde.ag>ag>:
>>
>>> Hi,
>>>
>>> I got those metrics back after setting:
>>>
>>> reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters
>>> false
>>>
>>> reef01:~ # curl
http://localhost:9283/metrics | grep ceph_osd_op | head
>>> % Total % Received % Xferd Average Speed Time Time Time
>>> Current
>>> Dload Upload Total Spent Left
>>> Speed
>>> 100 324k 100 324k 0 0 72.5M 0 --:--:-- --:--:--
>>> --:--:-- 79.1M
>>> # HELP ceph_osd_op Client operations
>>> # TYPE ceph_osd_op counter
>>> ceph_osd_op{ceph_daemon="osd.0"} 139650.0
>>> ceph_osd_op{ceph_daemon="osd.11"} 9711090.0
>>> ceph_osd_op{ceph_daemon="osd.2"} 3864.0
>>> ceph_osd_op{ceph_daemon="osd.1"} 25.0
>>> ceph_osd_op{ceph_daemon="osd.4"} 543.0
>>> ceph_osd_op{ceph_daemon="osd.5"} 12192.0
>>> ceph_osd_op{ceph_daemon="osd.3"} 3661521.0
>>> ceph_osd_op{ceph_daemon="osd.6"} 2030.0
>>>
>>>
>>> I found the option in the docs [1], but the same section is in the
>>> quincy docs as well, although there's no such option in my quincy
>>> cluster, maybe that's why it still exports those performance
>>> counters in my quincy cluster:
>>>
>>> quincy-1:~ # ceph config get mgr mgr/prometheus/exclude_perf_counters
>>> Error ENOENT: unrecognized key
'mgr/prometheus/exclude_perf_counters'
>>>
>>> Anyway, this should bring back the metrics the "legacy" way (I
>>> guess). Apparently, the ceph-exporter daemon is now required on your
>>> hosts to collect those metrics.
>>> After adding the ceph-exporter service (ceph orch apply
>>> ceph-exporter) and setting mgr/prometheus/exclude_perf_counters back
>>> to "true" I see that there are "ceph_osd_op" metrics
defined but no
>>> values yet. Apparently, I'm still missing something, I'll check
>>> tomorrow. But this could/should be in the upgrade docs IMO.
>>>
>>> Regards,
>>> Eugen
>>>
>>> [1]
>>>
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-cou…
>>>
>>> Zitat von Martin <ceph(a)firma.azet.sk>sk>:
>>>
>>>> Hi,
>>>>
>>>> Confirmed that this happens to me as well.
>>>> After upgrading from 18.2.0 to 18.2.1 OSD metrics
>>>> like: ceph_osd_op_* are missing from ceph-mgr.
>>>>
>>>> The Grafana dashboard also doesn't display all graphs correctly.
>>>>
>>>> ceph-dashboard/Ceph - Cluster : Capacity used, Cluster I/O, OSD
>>>> Capacity Utilization, PGs per OSD....
>>>>
>>>> curl
http://localhost:9283/metrics | grep -i ceph_osd_op
>>>> % Total % Received % Xferd Average Speed Time Time
>>>> Time Current
>>>> Dload Upload Total Spent
>>>> Left Speed
>>>> 100 38317 100 38317 0 0 9.8M 0 --:--:-- --:--:--
>>>> --:--:-- 12.1M
>>>>
>>>> Before the upgrading to reef 18.2.1 I could get all the metrics.
>>>>
>>>> Martin
>>>>
>>>> On 18/01/2024 12:32, Jose Vicente wrote:
>>>>> Hi,
>>>>> After upgrading from Quincy to Reef the ceph-mgr daemon is not
>>>>> throwing some throughput OSD metrics like: ceph_osd_op_*
>>>>> curl
http://localhost:9283/metrics | grep -i ceph_osd_op
>>>>> % Total % Received % Xferd Average Speed Time Time
>>>>> Time Current
>>>>> Dload Upload Total Spent
>>>>> Left Speed
>>>>> 100 295k 100 295k 0 0 144M 0 --:--:-- --:--:--
>>>>> --:--:-- 144M
>>>>> However I can get other metrics like:
>>>>> # curl
http://localhost:9283/metrics | grep -i ceph_osd_apply
>>>>> # HELP ceph_osd_apply_latency_ms OSD stat apply_latency_ms
>>>>> # TYPE ceph_osd_apply_latency_ms gauge
>>>>> ceph_osd_apply_latency_ms{ceph_daemon="osd.275"} 152.0
>>>>> ceph_osd_apply_latency_ms{ceph_daemon="osd.274"} 102.0
>>>>> ...
>>>>> Before the upgrading to reef (from quincy) I I could get all the
>>>>> metrics. MGR module prometheus is enabled.
>>>>> Rocky Linux release 8.8 (Green Obsidian)
>>>>> ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76)
>>>>> reef (stable)
>>>>> # netstat -nap | grep 9283
>>>>> tcp 0 0 127.0.0.1:53834 127.0.0.1:9283
>>>>> ESTABLISHED 3561/prometheus
>>>>> tcp6 0 0 :::9283 :::* LISTEN
>>>>> 804985/ceph-mgr
>>>>> Thanks,
>>>>> Jose C.
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list --ceph-users(a)ceph.io
>>>>> To unsubscribe send an email toceph-users-leave(a)ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io