Hi,
this is documented with many links to other documents, which
unfortunately only confused me. In our 6-Node-Ceph-Cluster (Pacific)
the Dashboard tells me that I should "provide the URL to the API of
Prometheus' Alertmanager". We only use Grafana and Prometheus which
are deployed by cephadm. We did not configure anything unusual, with
own containers or so. We use just the standard cephadm installation.
What the documentation writes about how it "should look like"
(https://docs.ceph.com/en/pacific/mgr/dashboard/#enabling-prometheus-alerting),
seems to exist in the Docker-Container "prom/alertmanager:v0.20.0" in
file /etc/alertmanager/alertmanager.yml
[…]
- name: 'ceph-dashboard'
webhook_configs:
- url: 'https://ceph01:8443/api/prometheus_receiver'
- url: 'https://10.149.12.22:8443/api/prometheus_receiver'
[…]
(10.149.12.22 is the IP address for ceph01)
Nevertheless I get the message above from the Dashboard.
My questions: What do I have to write in which file or which commands,
so that I can access the alerts via the dashboard? Of course this
should survive reboots and updates.
Thanks.
Erich
Hi there,
I'm developing a custom ceph-mgr module and have issues deploying this on a cluster deployed with cephadm.
With a cluster deployed with ceph-deploy, I can just put my code under /usr/share/ceph/mgr/ and load the module. This works fine.
I think I found 2 options to do this with cephadm:
1. build a custom container image: https://docs.ceph.com/en/octopus/cephadm/install/#deploying-custom-containe…
2. use the --shared_ceph_folder during cephadm bootstrap: 'Development mode. Several folders in containers are volumes mapped to different sub-folders in the ceph source folder'
The shared folder method is only meant for development. So that is not an option in a production environment.
Building a custom container image should be possible, but I don't think I want to go there.
Are there more options?
It would be nice if it was possible to deploy the managers with a custom service specification that for example mounts a folder from the host system to /usr/share/ceph/mgr/<module> in the container.
Thanks!
Rob Haverkamp
We have a ceph octopus cluster running 15.2.6, its indicating a near full
osd which I can see is not weighted equally with the rest of the osds. I
tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a
little bit, but unlike the nautilus clusters, I see no data movement when
issuing the command. If I run a ceph osd tree, it shows the reweight
setting, but no data movement appears to be occurring.
Is there some new thing in ocotopus I am missing? I looked through the
release notes for .7, .8 and .9 and didn't see any fixes that jumped out as
resolving a bug related to this. The Octopus cluster was deployed using
ceph-ansible and upgraded to 15.2.6. I plan to upgrade to 15.2.9 in the
coming month.
Any thoughts?
Regards,
-Brent
Existing Clusters:
Test: Ocotpus 15.2.5 ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways,
2 iscsi gateways
UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways
I hope someone can help out. I cannot run 'rbd info' on any image.
# rbd ls openstack-volumes
volume-628efc47-fc57-4630-8661-a13210a4e02c
volume-e4fe1e24-fb26-4abc-a458-f936a4e75715
volume-1ce1439d-767b-4b1d-8217-51464a11c5cc
volume-0a01d7e3-2c8f-4fab-9f9f-d84bbc7fa3c7
volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac
# rbd info openstack-volumes/volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac
rbd: error opening image volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac:
(2) No such file or directory
We're running nautilus 14.2.16 on ubuntu bionic
Marcel
Hello,
A while back I asked about the troubles I was having with Ceph-Ansible when
I kept existing OSDs in my inventory file when managing my Nautilus cluster.
At the time it was suggested that once the OSDs have been configured they
should be excluded from the inventory file.
However, when processing certain configuration changes Ceph-Ansible updates
ceph.conf on all cluster nodes and clients in the inventory file.
Is there an alternative way to keep OSD nodes in the inventory file without
listing them as OSD nodes, so they get other updates, but also so
Ceph-Ansible doesn't try to do any of the ceph-volume stuff that seems to
be failing after the OSDs are configured?
Or maybe I just have something odd in my inventory file. I'd be glad to
share - either in this list or off line.
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Hi All,
Use anybody windows file server with ceph storage? Finally I can do the gateways. We've a ceph storage with 3 nodes and we can add this to windows via ceph-iscsi. I'd like to use it with 2 windows 2019 servers in failover cluster. I can connect to the storage each sides. But when I check the MPIO device Details all nods are connected and active, I've not "stand by" node. I'm not for sure it is right or it is a problem. I setup up the deatils from the ceph documment. TimeOutValue = 65; LinkDownTime = 25; SRBTimeoutDelta = 15.
I try to validate failover cluster configuration and I get an error: "Failure issuing call to Persistent Reservation REGISTER AND IGNORE EXISTING on Test Disk 0 from node FS102.trafficom.hu when the disk has no existing registration. It is expected to succeed. The device is not ready."
Did anybody see this error?
jansz0
I am attempting to upgrade a Ceph Upgrade cluster that was deployed with
Octopus 15.2.8 and upgraded to 15.2.10 successfully. I'm not attempting to
upgrade to 16.2.0 Pacific, and it is not going very well.
I am using cephadm. It looks to have upgraded the managers and stopped,
and not moved on to the monitors or anything else. I've attempted stopping
the upgrade and restarting it, with debug on and I'm not seeing anything to
say why it is not progressing any further.
I've also tried rebooting machines and failing the managers over with
no success. I'm currently thinking its stuck attempting to upgrade a
manager that does not exist.
Its a test cluster of 16 nodes, bit of a proof of concept, so if I've got
something terribly wrong I'm happy to look at deploying, (running on top of
CentOS 7 but I'm fast heading to using something else) (apart from anything
its not really a production ready system yet)
Just not sure where cephadm upgrade has crashed in 16.2.0
Thanks in advance
Peter