Hi,
we're having a peculiar issue which we found out during HA/DR testing in our Ceph cluster.
Basic info about cluster:
Version: Quincy (17.2.6)
5 nodes configured in stretch cluster (2 DCs with one arbiter node which is also admin node for the cluster)
On every node beside the admin node we have OSD and MON services. We have 3 MGR instances in cluster.
Specific thing that we wanted to test is multiple CephFS with each having multiple MDS (with HA in mind).
We deployed MDS on every node, increased max_mds to 2 for every CephFS and other two MDS-es are in standby-replay mode (they are automatically configured during CephFS creation to follow specific CephFS - join_fscid).
We did multiple tests and when we have only one CephFS it behaves as expected (two MDS are in up:active state and clients can connect and interact with CephFS as if nothing has happened).
When we test with multiple CephFS (two for example) and we shutdown two nodes one of MDS is stuck in up:active laggy state and when this happens the CephFS for which this happens is unusable, client hangs and it is stuck like that until we power on other DC. This happens even when there are no clients connected to this specific CephFS.
We can provide additional logs and do any tests necessary. We checked the usual culprits and our nodes don't show any excessive CPU or memory usage.
We would appreciate any help.
Hi everyone,
I just like to know what's your opinion about the reliability of erasure
coding.
Of course I can understand if we want the «best of the best of the best»
;-) I can choose the replica method.
I heard in many location “replica” are more reliable, “replica” are more
efficient etc...
Yes...well since 25 years I'm using raid (5, 6, lvm, raidz1, raidz2, etc.)
I never loose data only once when a firmware bug in some xxxxx card crash
the raid volume.
Now 25 years later lot of people recommend to use replica so if I buy XTo
I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12
disks).
So my question are : Anyone use in large scale erasure coding for critical
(same level as raidz1/raid5 ou raidz2/raid6) ?
Regards
--
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
jeu. 23 nov. 2023 14:51:28 CET
Hi everyone.
Still me with my newbie question....sorry.
I'm using cephadm to deploy my ceph cluster, but when I search in the
documentation «docs.ceph.com» I see in some place like
https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/
to change something in the /etc/ceph/ceph.conf.
How that is taking account by cephadm ? I see in the docker container they
have a overlay for /etc/ceph/ceph.conf.
Should I modify the ceph.conf (vi/emacs) directly ?
Should I modify the ceph.conf (vi/emacs) directly and restart something ?
Should I use some cephadm shell and don't manually touche ceph.conf ?
And what about the future ?
Regards.
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
jeu. 23 nov. 2023 15:21:47 CET
Hi everyone,
In the purpose to deploy a medium size of ceph cluster (300 OSD) we have 6
bare-metal server for the OSD, and 5 bare-metal server for the service
(MDS, Mon, etc.)
Those 5 bare-metal server have each 48 cores and 256 Gb.
What would be the smartest way to use those 5 server, I see two way :
first :
Server 1 : MDS,MON, grafana, prometheus, webui
Server 2: MON
Server 3: MON
Server 4 : MDS
Server 5 : MDS
so 3 MDS, 3 MON. and we can loose 2 servers.
Second
KVM on each server
Server 1 : 3 VM : One for grafana & CIe, and 1 MDS, 2 MON
other server : 1 MDS, 1 MON
in total : 5 MDS, 5 MON and we can loose 4 servers.
So on paper it's seem the second are smarter, but it's also more complex,
so my question are «is it worth the complexity to have 5 MDS/MON for 300
OSD».
Important : The main goal of this ceph cluster are not to get the maximum
I/O speed, I would not say the speed is not a factor, but it's not the main
point.
Regards.
--
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
ven. 17 nov. 2023 10:49:27 CET
Hi,
In a IPv6 only deployment the ceph-exporter daemons are not listening on
IPv6 address(es). This can be fixed by editing the unit.run file of the
ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::".
Is this configurable? So that cephadm deploys ceph-exporter with proper
unit.run arguments?
Gr. Stefan
... who really thinks the Ceph test lab should have an IPv6 only test
environment to catch these things
Greetings group!
We recently reloaded a cluster from scratch using cephadm and reef. The
cluster came up, no issues. We then decided to upgrade two existing cephadm
clusters that were on quincy. Those two clusters came up just fine but
there is an issue with the Grafana graphs on both clusters ( which were
working before the upgrade ). They are now blank. There is an error in the
Prometheus alerts (PrometheusJobMissing) that is alerting and it states the
following:
The prometheus job that scrapes from Ceph is no longer defined, this will
effectively mean you'll have no metrics or alerts for the cluster. Please
review the job definitions in the prometheus.yml file of the prometheus
instance.
summary: The scrape job for Ceph is missing from Prometheus
When I look at the Prometheus.yml file on the performance monitoring node,
this is what is there( I replaced ip with x.x.x.x ):
global:
scrape_interval: 10s
evaluation_interval: 10s
rule_files:
- /etc/prometheus/alerting/*
alerting:
alertmanagers:
- scheme: http
http_sd_configs:
- url:
http://x.x.x.x:8765/sd/prometheus/sd-config?service=alertmanager
scrape_configs:
- job_name: 'ceph'
honor_labels: true
http_sd_configs:
- url:
http://x.x.x.x:8765/sd/prometheus/sd-config?service=mgr-prometheus
- job_name: 'node'
http_sd_configs:
- url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=node-exporter
- job_name: 'ceph-exporter'
honor_labels: true
http_sd_configs:
- url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=ceph-exporter
When I open a run "netstat -ntlp" on the active mgr node, I see the 8765
port being used by docker. However, when I try to use the chrome browser to
access the URLs listed in the Prometheus.yml file, the page times out.
However, if I do this with the active manager on the cluster that was
installed from scratch ( and not upgraded ), the URL for that cluster
returns output( different for each URL ).
So it appears to me that the service discovery function is not working for
upgrades from quincy. Also, the ceph-exporter service was not installed on
the cluster during the upgrade process. I manually added the service when I
noticed that it was not there ( when comparing the from scratch cluster to
the upgraded cluster ).
Not sure if this will help or is even related, but I saw it in the cephadm
log:
2023-11-15T04:22:30.789998+0000 mgr. CEPH-MON-01.mlmups (mgr.144601) 753 :
cephadm 4 host CEPH-MON-02 `cephadm gather-facts` failed: Cannot decode
JSON:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1425, in
_run_cephadm_json
return json.loads(''.join(out))
File "/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Is there any way to fix the service discovery?
Thanks!
-Brent
Existing Clusters:
Test: Reef 18.2.0 ( all virtual on nvme )
US Production(HDD): Reef 18.2.0 with 11 osd servers, 3 mons, 4 gateways, 2
iscsi gateways
UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
US Production(SSD): Reef 18.2.0 Cephadm with 6 osd servers, 5 mons, 4
gateways
UK Production(SSD): Reef 18.2.0 cephadm with 7 osd servers, 5 mons, 4
gateways
Hello Ceph users,
Together with ShapeBlue and Adyen, 42on is organizing a CloudStack and Ceph Day; this time in Amsterdam, The Netherlands. We are planning this for February 8 | 2024.
We want to create a technical event that shares updates on both technologies, as well as 'use cases', stories, challenges and perhaps some crazy ideas or configurations. Let’s share information and make Ceph even bigger and better!
I am still looking for some speakers who would love to share something about their Ceph infrastructure or configuration. As small as it might seem to you, any ideas are welcome and if you are in doubt about the subject, please message or call me to discuss. I would love to hear about your ideas.
To RSVP and more information can be found here: https://www.eventbrite.nl/e/cloudstack-and-ceph-day-netherlands-2024-ticket…
We hope to see you in Amsterdam once more.
Sincerely,
Michiel Manten
Hello all,
today i got a new certificate for our internal domain based on RSA/4096
secp384r1. After inserting CRT and Key i got both "...updated" messages.
After checking the dashboard i got an empty page and this error:
health: HEALTH_ERR
Module 'dashboard' has failed: key type unsupported
So we tried to go back to the original state by removing CRT anf KEY but
without success. The new key seems to stuck into the config
[root@cephxxxx ~]# ceph config-key get mgr/dashboard/crt
-----BEGIN CERTIFICATE-----
MIIFqTCCBJGgAwIBAgIMB5tjLSz264Ic8zeHMA0GCSqGSIb3DQEBCwUAMEwxCzAJ
[...]
ItzkEzq4SZ6V1Jhuf4bFlOMBVAKgAwZ90gXlguoiFFQu5+NIqNljZ8Jz7d0jhH43
e3zhm5sn21+eIqRbiQ==
-----END CERTIFICATE-----
[root@cephxxxx ~]# ceph config-key get mgr/dashboard/key
*Error ENOENT: *
We tried to generate a self signed Cert but no luck. It looks like manger
stays in an intermediate state. The only way to get back the dashboard is
to disable SSL and use plain http.
Can somebody explain this behaviour? Maybe secp384r1 elliptic curves
aren't supported? How can we clean up SSL configuration?
Thanks,
Christoph Ackermann
Ps we checked some Information like
https://tracker.ceph.com/issues/57924#change-227744 and others but no
luck...