Hi,
You can upgrade the grafana version individually by setting the config_opt
for grafana container image like:
ceph config set mgr mgr/cephadm/container_image_grafana
quay.io/ceph/ceph-grafana:8.3.5
and then redeploy the grafana container again either via dashboard or
cephadm.
Regards,
Nizam
On Fri, Jun 23, 2023 at 12:05 AM Adiga, Anantha <anantha.adiga(a)intel.com>
wrote:
Hi Eugen,
Thank you so much for the details. Here is the update (comments in-line
>):
Regards,
Anantha
-----Original Message-----
From: Eugen Block <eblock(a)nde.ag>
Sent: Monday, June 19, 2023 5:27 AM
To: ceph-users(a)ceph.io
Subject: [ceph-users] Re: Grafana service fails to start due to bad
directory name after Quincy upgrade
Hi,
so grafana is starting successfully now? What did you change?
> I stopped and removed the Grafana image and
started it from "Ceph
Dashboard" service. The version is still 6.7.4. I
also had to change the
following.
I do not have a way to make this permanent, if the service is redeployed
I will lose the changes.
I did not save the file that cephadm generated. This was one reason why
Grafana service would not start. I had replace it with the one below to
resolve this issue.
[users]
default_theme = light
[auth.anonymous]
enabled = true
org_name = 'Main Org.'
org_role = 'Viewer'
[server]
domain = 'bootstrap.storage.lab'
protocol = https
cert_file = /etc/grafana/certs/cert_file
cert_key = /etc/grafana/certs/cert_key
http_port = 3000
http_addr =
[snapshots]
external_enabled = false
[security]
disable_initial_admin_creation = false
cookie_secure = true
cookie_samesite = none
allow_embedding = true
admin_password = paswd-value
admin_user = user-name
Also this was the other change:
# This file is generated by cephadm.
apiVersion: 1 <-- This was the line added to
var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml
>
Regarding the container images, yes
there are defaults in cephadm which
can be overridden with ceph config. Can you share this output?
ceph config dump | grep container_image
>
Here it is
root@fl31ca104ja0201:/# ceph config dump | grep container_image
global basic
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
*
mgr advanced
mgr/cephadm/container_image_alertmanager
docker.io/prom/alertmanager:v0.16.2
*
mgr advanced
mgr/cephadm/container_image_base quay.io/ceph/daemon
mgr advanced
mgr/cephadm/container_image_grafana docker.io/grafana/grafana:6.7.4
*
mgr advanced
mgr/cephadm/container_image_node_exporter
docker.io/prom/node-exporter:v0.17.0
*
mgr advanced
mgr/cephadm/container_image_prometheus
docker.io/prom/prometheus:v2.7.2
*
client.rgw.default.default.fl31ca104ja0201.ninovs basic
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
*
client.rgw.default.default.fl31ca104ja0202.yhjkmb basic
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
*
client.rgw.default.default.fl31ca104ja0203.fqnriq basic
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
*
>
I tend to always use a specific image
as described here [2]. I also
haven't deployed grafana via dashboard yet so I can't really comment on
that as well as on the warnings you report.
>OK. The need for that is, in Quincy when you
enable Loki and Promtail,
to view the daemon logs Ceph board pulls in Grafana
dashboard. I will let
you know once that issue is resolved.
Regards,
Eugen
[2]
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-i…
> Thank you I am following the document now
Zitat von "Adiga, Anantha" <anantha.adiga(a)intel.com>om>:
Hi Eugene,
Thank you for your response, here is the update.
The upgrade to Quincy was done following the cephadm orch upgrade
procedure ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.6
Upgrade completed with out errors. After the upgrade, upon creating
the Grafana service from Ceph dashboard, it deployed Grafana 6.7.4.
The version is hardcoded in the code, should it not be 8.3.5 as listed
below in Quincy documentation? See below
[Grafana service started from Cephdashboard]
Quincy documentation states:
https://docs.ceph.com/en/latest/releases/quincy/
……documentation snippet
Monitoring and alerting:
43 new alerts have been added (totalling 68) improving observability
of events affecting: cluster health, monitors, storage devices, PGs
and CephFS.
Alerts can now be sent externally as SNMP traps via the new SNMP
gateway service (the MIB is provided).
Improved integrated full/nearfull event notifications.
Grafana Dashboards now use grafonnet format (though they’re still
available in JSON format).
Stack update: images for monitoring containers have been updated.
Grafana 8.3.5, Prometheus 2.33.4, Alertmanager 0.23.0 and Node
Exporter 1.3.1. This reduced exposure to several Grafana
vulnerabilities (CVE-2021-43798, CVE-2021-39226, CVE-2021-43798,
CVE-2020-29510, CVE-2020-29511).
………………….
I notice that the versions of the remaining stack, that Ceph
dashboard deploys, are also older than what is documented.
Prometheus 2.7.2, Alertmanager 0.16.2 and Node Exporter 0.17.0.
AND 6.7.4 Grafana service reports a few warnings: highlighted below
root@fl31ca104ja0201:/home/general# systemctl status
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e(a)grafana.fl31ca104ja0201.serv
ice
●
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e(a)grafana.fl31ca104ja0201.serv
ice - Ceph grafana.fl31ca104ja0201 for
d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
Loaded: loaded
(/etc/systemd/system/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@.servic
e;
enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-06-13 03:37:58 UTC; 11h ago
Main PID: 391896 (bash)
Tasks: 53 (limit: 618607)
Memory: 17.9M
CGroup:
/system.slice/system-ceph\x2dd0a3b6e0\x2dd2c3\x2d11ed\x2dbe05\x2da7a3a1d7a87e.slice/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e(a)grafana.fl31ca104j
├─391896 /bin/bash
/var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/unit.run
└─391969 /usr/bin/docker run --rm
--ipc=host
--stop-signal=SIGTERM --net=host --init --name
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-grafana-fl>
-- Logs begin at Sun 2023-06-11 20:41:51 UTC, end at Tue 2023-06-13
15:35:12 UTC. --
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="alter user_auth.auth_id to length 190"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="Add OAuth access token to user_auth"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="Add OAuth refresh token to user_auth"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="Add OAuth token type to user_auth"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="Add OAuth expiry to user_auth"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="Add index to user_id column in user_auth"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="create server_lock table"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="add index server_lock.operation_uid"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="create user auth token table"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="add unique index user_auth_token.auth_token"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="add unique index user_auth_token.prev_auth_token"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="create cache_data table"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"
logger=migrator id="add unique index cache_data.cache_key"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Created default organization"
logger=sqlstore Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HTTPServer"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
BackendPluginManager" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing PluginManager"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Starting plugin search"
logger=plugins
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HooksService"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
OSSLicensingService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
InternalMetricsService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing RemoteCache"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
RenderingService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing AlertEngine"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing QuotaService"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
ServerLockService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
UserAuthTokenService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
DatasourceCacheService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing LoginService"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing SearchService"
logger=server
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing TracingService"
logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
UsageStatsService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing CleanUpService"
logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
NotificationService" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing
provisioningServiceImpl" logger=server Jun 13 03:37:59 fl31ca104ja0201
bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=warn msg="[Deprecated] the datasource
provisioning config is outdated. please upgrade"
logger=provisioning.datasources
filename=/etc/grafana/provisioning/datasources/ceph-dashboard.yml
This warning comes due to the missing “ apiVersion: 1” first line
entry in /etc/grafana/provisioning/datasources/ceph-dashboard.yml
created by cephadm.
If the file is modified to include the apiversion line and restart
Grafana service,
Is this a known ISSUE ?
Here is the content of the ceph-dashboard.yml produced by cephadm
deleteDatasources:
- name: 'Dashboard1'
orgId: 1
- name: 'Loki'
orgId: 2
datasources:
- name: 'Dashboard1'
type: 'prometheus'
access: 'proxy'
orgId: 1
url: 'http://fl31ca104ja0201.xxx.xxx.com:9095'
basicAuth: false
isDefault: true
editable: false
- name: 'Loki'
type: 'loki'
access: 'proxy'
orgId: 2
url: ''
basicAuth: false
isDefault: true
editable: false
--------------------------------------------------------------
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from
configuration " logger=provisioning.datasources name=Dashboard1 Jun 13
03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from
configuration " logger=provisioning.datasources name=Loki Jun 13
03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Backend rendering via
phantomJS" logger=rendering renderer=phantomJS Jun 13 03:37:59
fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=warn msg="phantomJS is deprecated and
will be removed in a future release. You should consider migrating
from phantomJS to grafana-image-renderer plugin. Read more at
https://grafana.com/docs/grafana/latest/administration/image_rendering/&quo…
logger=rendering renderer=phantomJS
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing Stream Manager"
Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
t=2023-06-13T03:37:59+0000 lvl=info msg="HTTP Server Listen"
logger=http.server address=[::]:3000 protocol=https subUrl= socket=
I also had to change a few other things to keep all the services
running. The last issue that I have not been able to resolve yet is
the Cephbash board gives this error even though grafana is running on
the same server. However, the grafana dashboard cannot be accessed
without tunnelling.
[cid:image002.png@01D9A10B.F8B9D220]
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io