November 2023 - ceph-users

[MDS] mds stuck in laggy state, CephFS unusable

by kvesligaj＠cs.hr

Hi, we're having a peculiar issue which we found out during HA/DR testing in our Ceph cluster. Basic info about cluster: Version: Quincy (17.2.6) 5 nodes configured in stretch cluster (2 DCs with one arbiter node which is also admin node for the cluster) On every node beside the admin node we have OSD and MON services. We have 3 MGR instances in cluster. Specific thing that we wanted to test is multiple CephFS with each having multiple MDS (with HA in mind). We deployed MDS on every node, increased max_mds to 2 for every CephFS and other two MDS-es are in standby-replay mode (they are automatically configured during CephFS creation to follow specific CephFS - join_fscid). We did multiple tests and when we have only one CephFS it behaves as expected (two MDS are in up:active state and clients can connect and interact with CephFS as if nothing has happened). When we test with multiple CephFS (two for example) and we shutdown two nodes one of MDS is stuck in up:active laggy state and when this happens the CephFS for which this happens is unusable, client hangs and it is stuck like that until we power on other DC. This happens even when there are no clients connected to this specific CephFS. We can provide additional logs and do any tests necessary. We checked the usual culprits and our nodes don't show any excessive CPU or memory usage. We would appreciate any help.

5 months, 3 weeks

1
0
0 0

CephFS - MDS removed from map - filesystem keeps to be stopped

by Denis Polom

Hi running Ceph Pacific 16.2.13. we had full CephFS filesystem and after adding new HW we tried to start it but our MDS daemons are pushed to be standby and are removed from MDS map. Filesystem was broken, so we repaired it with: # ceph fs fail cephfs # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary # cephfs-journal-tool --rank=cephfs:0 journal reset then I started ceph-mds service and marked rank as repaired mds after some time has switched to standby. Log is bellow. I would appreciate any help to resolve this situation. Thank you. from log: 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map i am now mds.0.9604 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map state change up:rejoin --> up:active 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 recovery_done -- successful recovery! 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 active_start 2023-11-22T14:11:49.216+0100 7f5dc155e700 1 mds.0.9604 cluster recovered. 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.127:0/2123529386 conn(0x55a60627a800 0x55a606e5b000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.88:0/1899426587 conn(0x55a60627ac00 0x55a6070d0000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.216:0/2058542052 conn(0x55a6070c9800 0x55a6070d1800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.220:0/1549374180 conn(0x55a60708d000 0x55a6070d0800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.180:0/270666178 conn(0x55a60703a000 0x55a6070cf800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.178:0/3673271488 conn(0x55a6070c9400 0x55a6070d1000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.167:0/2667964940 conn(0x55a6070c9c00 0x55a607112000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.70:0/3181830075 conn(0x55a607116000 0x55a607112800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.72:0/3744737352 conn(0x55a60627a800 0x55a606e5b000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.244.18.140:0/1607447464 conn(0x55a60627ac00 0x55a6070d0000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0) .handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.mds1 Updating MDS map to version 9608 from mon.1 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map i am now mds.0.9604 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map state change up:active --> up:stopping 2023-11-22T14:11:52.412+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:11:57.412+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:02.416+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:07.420+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:12.420+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:13.552+0100 7f5dc155e700 1 mds.mds1 Updating MDS map to version 9609 from mon.1 2023-11-22T14:12:13.552+0100 7f5dc155e700 1 mds.mds1 Map removed me [mds.mds1{0:5320528} state up:stopping seq 67 addr [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] compat {c=[1],r=[1],i=[7ff]}] from cluster; respawnin g! See cluster/monitor logs for details. 2023-11-22T14:12:13.552+0100 7f5dc155e700 1 mds.mds1 respawn!

5 months, 4 weeks

2
1
0 0

CLT Meeting minutes 2023-11-23

by Nizamudeen A

Hello, etherpad history lost need a way to recover from DB or find another way to back things up discuss the quincy/dashboard-v3 backports? was tabled from 11/1 https://github.com/ceph/ceph/pull/54252 agreement is to not backport breaking features to stable branches. 18.2.1 LRC upgrade affected by https://tracker.ceph.com/issues/62570#note-4 https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1700575544548809 Figure out the reproducer and add tests https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… is the trigger on the kernel client side? Clients with that patch should work with the server-side code that broke older ones Suggestion to introduce a matrix of pre-built "older" kernels into the fs suite gibba vs LRC upgrades LRC shouldn't be updated often, it should act more like a production cluster RCs for reef, quincy and pacific for next week when there is more time to discuss Regards, -- Nizamudeen A Software Engineer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>

5 months, 4 weeks

3
2
0 0

Erasure vs replica

by Albert Shih

Hi everyone, I just like to know what's your opinion about the reliability of erasure coding. Of course I can understand if we want the «best of the best of the best» ;-) I can choose the replica method. I heard in many location “replica” are more reliable, “replica” are more efficient etc... Yes...well since 25 years I'm using raid (5, 6, lvm, raidz1, raidz2, etc.) I never loose data only once when a firmware bug in some xxxxx card crash the raid volume. Now 25 years later lot of people recommend to use replica so if I buy XTo I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12 disks). So my question are : Anyone use in large scale erasure coding for critical (same level as raidz1/raid5 ou raidz2/raid6) ? Regards -- Albert SHIH 🦫 🐸 Observatoire de Paris France Heure locale/Local time: jeu. 23 nov. 2023 14:51:28 CET

5 months, 4 weeks

4
3
0 0

cephadm vs ceph.conf

by Albert Shih

Hi everyone. Still me with my newbie question....sorry. I'm using cephadm to deploy my ceph cluster, but when I search in the documentation «docs.ceph.com» I see in some place like https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/ to change something in the /etc/ceph/ceph.conf. How that is taking account by cephadm ? I see in the docker container they have a overlay for /etc/ceph/ceph.conf. Should I modify the ceph.conf (vi/emacs) directly ? Should I modify the ceph.conf (vi/emacs) directly and restart something ? Should I use some cephadm shell and don't manually touche ceph.conf ? And what about the future ? Regards. -- Albert SHIH 🦫 🐸 France Heure locale/Local time: jeu. 23 nov. 2023 15:21:47 CET

5 months, 4 weeks

4
6
0 0

How to use hardware

by Albert Shih

Hi everyone, In the purpose to deploy a medium size of ceph cluster (300 OSD) we have 6 bare-metal server for the OSD, and 5 bare-metal server for the service (MDS, Mon, etc.) Those 5 bare-metal server have each 48 cores and 256 Gb. What would be the smartest way to use those 5 server, I see two way : first : Server 1 : MDS,MON, grafana, prometheus, webui Server 2: MON Server 3: MON Server 4 : MDS Server 5 : MDS so 3 MDS, 3 MON. and we can loose 2 servers. Second KVM on each server Server 1 : 3 VM : One for grafana & CIe, and 1 MDS, 2 MON other server : 1 MDS, 1 MON in total : 5 MDS, 5 MON and we can loose 4 servers. So on paper it's seem the second are smarter, but it's also more complex, so my question are «is it worth the complexity to have 5 MDS/MON for 300 OSD». Important : The main goal of this ceph cluster are not to get the maximum I/O speed, I would not say the speed is not a factor, but it's not the main point. Regards. -- Albert SHIH 🦫 🐸 Observatoire de Paris France Heure locale/Local time: ven. 17 nov. 2023 10:49:27 CET

5 months, 4 weeks

5
8
0 0

ceph-exporter binds to IPv4 only

by Stefan Kooman

Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm deploys ceph-exporter with proper unit.run arguments? Gr. Stefan ... who really thinks the Ceph test lab should have an IPv6 only test environment to catch these things

5 months, 4 weeks

1
1
0 0

Service Discovery issue in Reef 18.2.0 release ( upgrading )

by Brent Kennedy

Greetings group! We recently reloaded a cluster from scratch using cephadm and reef. The cluster came up, no issues. We then decided to upgrade two existing cephadm clusters that were on quincy. Those two clusters came up just fine but there is an issue with the Grafana graphs on both clusters ( which were working before the upgrade ). They are now blank. There is an error in the Prometheus alerts (PrometheusJobMissing) that is alerting and it states the following: The prometheus job that scrapes from Ceph is no longer defined, this will effectively mean you'll have no metrics or alerts for the cluster. Please review the job definitions in the prometheus.yml file of the prometheus instance. summary: The scrape job for Ceph is missing from Prometheus When I look at the Prometheus.yml file on the performance monitoring node, this is what is there( I replaced ip with x.x.x.x ): global: scrape_interval: 10s evaluation_interval: 10s rule_files: - /etc/prometheus/alerting/* alerting: alertmanagers: - scheme: http http_sd_configs: - url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=alertmanager scrape_configs: - job_name: 'ceph' honor_labels: true http_sd_configs: - url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=mgr-prometheus - job_name: 'node' http_sd_configs: - url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=node-exporter - job_name: 'ceph-exporter' honor_labels: true http_sd_configs: - url: http://x.x.x.x:8765/sd/prometheus/sd-config?service=ceph-exporter When I open a run "netstat -ntlp" on the active mgr node, I see the 8765 port being used by docker. However, when I try to use the chrome browser to access the URLs listed in the Prometheus.yml file, the page times out. However, if I do this with the active manager on the cluster that was installed from scratch ( and not upgraded ), the URL for that cluster returns output( different for each URL ). So it appears to me that the service discovery function is not working for upgrades from quincy. Also, the ceph-exporter service was not installed on the cluster during the upgrade process. I manually added the service when I noticed that it was not there ( when comparing the from scratch cluster to the upgraded cluster ). Not sure if this will help or is even related, but I saw it in the cephadm log: 2023-11-15T04:22:30.789998+0000 mgr. CEPH-MON-01.mlmups (mgr.144601) 753 : cephadm 4 host CEPH-MON-02 `cephadm gather-facts` failed: Cannot decode JSON: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1425, in _run_cephadm_json return json.loads(''.join(out)) File "/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Is there any way to fix the service discovery? Thanks! -Brent Existing Clusters: Test: Reef 18.2.0 ( all virtual on nvme ) US Production(HDD): Reef 18.2.0 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways US Production(SSD): Reef 18.2.0 Cephadm with 6 osd servers, 5 mons, 4 gateways UK Production(SSD): Reef 18.2.0 cephadm with 7 osd servers, 5 mons, 4 gateways

5 months, 4 weeks

2
2
0 0

CloudStack and Ceph Day 2024

by 42on - Michiel Manten

Hello Ceph users, Together with ShapeBlue and Adyen, 42on is organizing a CloudStack and Ceph Day; this time in Amsterdam, The Netherlands. We are planning this for February 8 | 2024. We want to create a technical event that shares updates on both technologies, as well as 'use cases', stories, challenges and perhaps some crazy ideas or configurations. Let’s share information and make Ceph even bigger and better! I am still looking for some speakers who would love to share something about their Ceph infrastructure or configuration. As small as it might seem to you, any ideas are welcome and if you are in doubt about the subject, please message or call me to discuss. I would love to hear about your ideas. To RSVP and more information can be found here: https://www.eventbrite.nl/e/cloudstack-and-ceph-day-netherlands-2024-ticket… We hope to see you in Amsterdam once more. Sincerely, Michiel Manten

5 months, 4 weeks

1
0
0 0

No SSL Dashboard working after installing mgr crt|key with RSA/4096 secp384r1

by Ackermann, Christoph

Hello all, today i got a new certificate for our internal domain based on RSA/4096 secp384r1. After inserting CRT and Key i got both "...updated" messages. After checking the dashboard i got an empty page and this error: health: HEALTH_ERR Module 'dashboard' has failed: key type unsupported So we tried to go back to the original state by removing CRT anf KEY but without success. The new key seems to stuck into the config [root@cephxxxx ~]# ceph config-key get mgr/dashboard/crt -----BEGIN CERTIFICATE----- MIIFqTCCBJGgAwIBAgIMB5tjLSz264Ic8zeHMA0GCSqGSIb3DQEBCwUAMEwxCzAJ [...] ItzkEzq4SZ6V1Jhuf4bFlOMBVAKgAwZ90gXlguoiFFQu5+NIqNljZ8Jz7d0jhH43 e3zhm5sn21+eIqRbiQ== -----END CERTIFICATE----- [root@cephxxxx ~]# ceph config-key get mgr/dashboard/key *Error ENOENT: * We tried to generate a self signed Cert but no luck. It looks like manger stays in an intermediate state. The only way to get back the dashboard is to disable SSL and use plain http. Can somebody explain this behaviour? Maybe secp384r1 elliptic curves aren't supported? How can we clean up SSL configuration? Thanks, Christoph Ackermann Ps we checked some Information like https://tracker.ceph.com/issues/57924#change-227744 and others but no luck...

5 months, 4 weeks

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2023