I dislike replying to my own post, but I found the issue:
Looking at the changelog for 14.2.5, the zabbix key
ceph.num_pg_wait_backfill has been renamed to ceph.num_pg_backfill_wait.
This needs to be updated in the zabbix_template.yml
Before the change:
# /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s
controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0
Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1;
total: 1; seconds spent: 0.000033"
sent: 1; skipped: 0; total: 1
# /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s
controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0
Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0;
total: 1; seconds spent: 0.000059"
sent: 1; skipped: 0; total: 1
After the key update:
# /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s
controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0
Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0;
total: 1; seconds spent: 0.000053"
sent: 1; skipped: 0; total: 1
# /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s
controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0
Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1;
total: 1; seconds spent: 0.000032"
sent: 1; skipped: 0; total: 1
Gary.
On 2019-12-11 10:54 a.m., Gary Molenkamp wrote:
After updating/restarting the manager to v14.2.5 we
are no longer able
to send data to our zabbix servers.
Ceph reports a non-zero exit status from zabbix_sender, but I have not
been able to identify the cause of the non-zero exit.
# ceph health detail
HEALTH_WARN Failed to send data to Zabbix
MGR_ZABBIX_SEND_FAILED Failed to send data to Zabbix
/usr/bin/zabbix_sender exited non-zero:
Setting "debug mgr = 20" yields no additional information that I could
see wrt to above issue.
zabbix configuration in ceph has not changed since the v14.2.5 update,
and was working under v14.2.4:
# ceph zabbix config-show
{"zabbix_port": 10051, "zabbix_host":
"controller03.mgmt.cloud",
"identifier": "controller02.mgmt.cloud", "zabbix_sender":
"/usr/bin/zabbix_sender", "interval": 60}
And I can force a send without error:
# /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s
controller02.mgmt.cloud -p 10051 -k ceph.total_used_bytes -o 0
Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0;
total: 1; seconds spent: 0.000062"
sent: 1; skipped: 0; total: 1
# echo $?
0
Any pointers/assistance would be appreciated.
Thanks
Gary
--
Gary Molenkamp Computer Science/Science Technology Services
Systems Administrator University of Western Ontario
molenkam(a)uwo.ca
http://www.csd.uwo.ca
(519) 661-2111 x86882 (519) 661-3566