March 2021 - ceph-users - lists.ceph.io

by Gert Wieberdink

Trying to configure Zabbix module in Octopus 15.2.3. CentOS 8.1 environment. Installed zabbix40-agent for CentOS 8.1 (from epel repository). This will also install zabbix_sender. After enabling the Zabbix module in Ceph, I configured my Zabbix host and Zabbix identifier. # ceph zabbix config-set zabbix_host <zabbix-fqdn> # ceph zabbix config-set zabbix_identifier <ident> # ceph zabbix config-show Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/zabbix/module.py", line 407, in handle_command return 0, json.dumps(self.config, index=4, sort_keys=True), '' File "/lib64/python3.6/json/__init__.py", line 238, in dumps **kw).encode(obj) TypeError: __init__() got an unexpected keyword argument 'index' # ceph -v ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable) # ceph health detail HEALTH_OK Anyone found a solution? rgds, -gw

2 years, 11 months

5
4
0 0

OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

by Frank Schilder

Hi all, this is a follow-up on "reboot breaks OSDs converted from ceph-disk to ceph-volume simple". I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and "simple activate". Somewhere along the way, the OSDs meta-data gets rigged and the prominent symptom is that the symlink block is changes from a part-uuid target to an unstable device name target like: before conversion: block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7 after conversion: block -> /dev/sdj2 This is a huge problem as the "after conversion" device names are unstable. I have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly re-assigned devices will refuse to start with: 2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248 Please help me with getting out of this mess. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

2 years, 11 months

2
7
0 0

OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag

by Jonas Jelten

Hi! After upgrading MONs and MGRs successfully, the first OSD host I upgraded on Ubuntu Bionic from 14.2.16 to 15.2.10 shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot. RocksDB complains "Corruption: unknown WriteBatch tag". The initial crash/corruption occured when the automatic fsck was ran, and when it committed the changes for a lot of "zombie spanning blobs". Tracker issue with logs: https://tracker.ceph.com/issues/50017 Anyone else encountered this error? I've "suspended" the upgrade for now :) -- Jonas

2 years, 11 months

5
12
0 0

Ceph osd Reweight command in octopus

by Brent Kennedy

We have a ceph octopus cluster running 15.2.6, its indicating a near full osd which I can see is not weighted equally with the rest of the osds. I tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a little bit, but unlike the nautilus clusters, I see no data movement when issuing the command. If I run a ceph osd tree, it shows the reweight setting, but no data movement appears to be occurring. Is there some new thing in ocotopus I am missing? I looked through the release notes for .7, .8 and .9 and didn't see any fixes that jumped out as resolving a bug related to this. The Octopus cluster was deployed using ceph-ansible and upgraded to 15.2.6. I plan to upgrade to 15.2.9 in the coming month. Any thoughts? Regards, -Brent Existing Clusters: Test: Ocotpus 15.2.5 ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways

3 years

2
2
0 0

upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

by Simon Oosthoek

Hi I'm in a bit of a panic :-( Recently we started attempting to configure a radosgw to our ceph cluster, which was until now only doing cephfs (and rbd wss working as well). We were messing about with ceph-ansible, as this was how we originally installed the cluster. Anyway, it installed nautilus 14.2.18 on the radosgw and I though it would be good to pull up the rest of the cluster to that level as well using our tried and tested ceph upgrade script (it basically does an update of all ceph nodes one by one and checks whether ceph is ok again before doing the next) After the 3rd mon/mgr was done, all pg's were unavailable :-( obviously, the script is not continuing, but ceph is also broken now... The message deceptively is: HEALTH_WARN Reduced data availability: 5568 pgs inactive That's all PGs! I tried as a desperate measure to upgrade one ceph OSD node, but that broke as well, the osd service on that node gets an interrupt from the kernel.... the versions are now like: 20:29 [root@cephmon1 ~]# ceph versions { "mon": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156 }, "mds": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2 }, "overall": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158, "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6 } } 12 OSDs are down # ceph -s cluster: id: b489547c-ba50-4745-a914-23eb78e0e5dc health: HEALTH_WARN Reduced data availability: 5568 pgs inactive services: mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m) mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2 mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 remapped pgs data: pools: 12 pools, 5568 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 5568 unknown progress: Rebalancing after osd.103 marked in [..............................]

3 years

5
11
0 0

Ceph CFP Coordination for 2021

by Mike Perez

Hi everyone, I cleaned up the CFP coordination etherpad with some events coming up. Please add other events you think the community should be considering proposing content on Ceph or adjacent projects like Rook. KubeCon NA CFP, for example, is ending April 11. Take a look: https://pad.ceph.com/p/cfp-coordination I have also added this to our wiki for discovery. https://tracker.ceph.com/projects/ceph/wiki/Community -- Mike Perez

3 years

1
1
0 0

cephadm and ha service for rgw

by Seba chanel

Hi everyone, I try to configure HA service for rgw with cephadm. I have 2 rgw on cnrgw1 et cnrgw2 for the same pool. i use a virtual IP address 192.168.0.15 cnrgwha and the config from https://docs.ceph.com/en/latest/cephadm/rgw/#high-availability-service-for-… # from root@cnrgw1 [root@cnrgw1 ~]# cat /etc/sysctl.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_nonlocal_bind = 1 [root@cnrgw1 ~]# sysctl -p net.ipv6.conf.all.disable_ipv6 = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_nonlocal_bind = 1 #same from cnrgw2 #generate cert [vagrant@cn1 ~]# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ./rgwha.key -out ./rgwha.crt Generating a RSA private key .............+++++ ........................................................+++++ writing new private key to './rgwha.key' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [XX]:fr State or Province Name (full name) []:est Locality Name (eg, city) [Default City]:sbg Organization Name (eg, company) [Default Company Ltd]:cephlab.org Organizational Unit Name (eg, section) []: Common Name (eg, your name or your server's hostname) []:cnrgwha Email Address []:root@localhost # write the YAML rgwha.yaml service_type: ha-rgw service_id: haproxy_for_rgw placement: hosts: - cnrgw1 - cnrgw2 spec: virtual_ip_interface: eth1 virtual_ip_address: 192.168.0.15/24 frontend_port: 8080 ha_proxy_port: 1967 ha_proxy_stats_enabled: true ha_proxy_stats_user: admin ha_proxy_stats_password: true ha_proxy_enable_prometheus_exporter: true ha_proxy_monitor_uri: /haproxy_health keepalived_user: admin keepalived_password: admin ha_proxy_frontend_ssl_certificate: [ "-----BEGIN CERTIFICATE-----", "MIICSzCCAfWgAwIBAgIUWKC9e+5tnIAjddECXOGc144p8E0wDQYJKoZIhvcNAQEL", "BQAwejELMAkGA1UEBhMCZnIxDDAKBgNVBAgMA2VzdDEMMAoGA1UEBwwDc2JnMRAw", "DgYDVQQKDAdjZXBobGFiMQwwCgYDVQQLDANvcmcxEDAOBgNVBAMMB2Nucmd3aGEx", "HTAbBgkqhkiG9w0BCQEWDnJvb3RAbG9jYWxob3N0MB4XDTIxMDMwOTE0MjI0N1oX", "DTIyMDMwOTE0MjI0N1owejELMAkGA1UEBhMCZnIxDDAKBgNVBAgMA2VzdDEMMAoG", "A1UEBwwDc2JnMRAwDgYDVQQKDAdjZXBobGFiMQwwCgYDVQQLDANvcmcxEDAOBgNV", "BAMMB2Nucmd3aGExHTAbBgkqhkiG9w0BCQEWDnJvb3RAbG9jYWxob3N0MFwwDQYJ", "KoZIhvcNAQEBBQADSwAwSAJBAMqji/AKBr6DbuHKOTWyIBWbeYkyZ7Jn7fqfZceE", "p7G321t1TvAjD7sa64FRT6n4x8CtzKPGXXpRr28o8oR1h70CAwEAAaNTMFEwHQYD", "VR0OBBYEFIQim5ZxojFny+srzQJIs1N8wLmYMB8GA1UdIwQYMBaAFIQim5ZxojFn", "y+srzQJIs1N8wLmYMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADQQCE", "eCwMQFNYtw+4I1QzTV13ewawuPkPdrhiNzcs0mgt93+quE0zBIeOY2jnFmlo6H/h", "syYGvwgcAh9VW9qo5fsk", "-----END CERTIFICATE-----", "-----BEGIN PRIVATE KEY-----", "MIIBVQIBADANBgkqhkiG9w0BAQEFAASCAT8wggE7AgEAAkEAyqOL8AoGvoNu4co5", "NbIgFZt5iTJnsmft+p9lx4SnsbfbW3VO8CMPuxrrgVFPqfjHwK3Mo8ZdelGvbyjy", "hHWHvQIDAQABAkB0kt2AO+RhWS9CyZlb4JtAku66FLs/ETcAxQ5CV3g5beq8/wRs", "x3xZhIsjdr7OZZ+BEoJYn+0upywoctXmwM8BAiEA+KG26RADqJfAdoRn640UrT9E", "pfF3drDrQg0WrKAf3N0CIQDQpOZa0pV2GL28u2NaU85uJCDeKDWhTnvFEqlLu/S4", "YQIhAPY+0/WIUtdLVOcMxA/bLrtXihoASR1Yo+hLJkXaYTRRAiB3Rh1txD6vEXu+", "Hb2xUIGNE1g6x+/ItA4rXfysD9nZYQIhAKYn3IdG55JwiwSKv8gVAEdX8xiUfEjY", "pnvk3p52VHHI", "-----END PRIVATE KEY-----" ] ha_proxy_frontend_ssl_port: 8090 ha_proxy_ssl_dh_param: 1024 ha_proxy_ssl_ciphers: ECDH+AESGCM:!MD5 ha_proxy_ssl_options: no-sslv3 haproxy_container_image: haproxy:2.4-dev3-alpine keepalived_container_image: arcts/keepalived:1.2.2 # apply the new config [ceph: root@cn1 ~]# ceph orch apply -i rgwha.yaml Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 'virtual_ip_interface' Do you have any leads why it doesn't work? [ceph: root@cn1 /]# ceph versions { "mon": { "ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)": 5 }, "mgr": { "ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)": 2 }, "osd": { "ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)": 8 }, "mds": {}, "rgw": { "ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)": 2 }, "overall": { "ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)": 17 } }

3 years

2
2
0 0

understanding orchestration and cephadm

by Gary Molenkamp

Good morning all, I'm experimenting with ceph orchestration and cephadm after using ceph-deploy for several years, and I have a hopefully simple question. I've converted a basic nautilus cluster over to cephadm+orchestration and I tried adding, then removing a monitor. However, when I removed the host using 'ceph orch host rm', it removed two mons. I may have missed something in the adoption/upgrade that has left the cluster in a bad state. Any advice/pointers/clarification would be of assistance. Details: A nautilus cluster with two mons (I know this is not correct for quorum), a mgr, and a handful of osds. I went though the adoption process and enabled the ceph orch backend. [root@osdev-ctrl2 ~]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID mgr.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f e73c19b51a09 mon.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f a6bfc27221f0 mon.osdev-net1 osdev-net1 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f f66e2bef3d44 osd.0 osdev-stor1 running (17h) 50s ago 17h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f ac59dbdc267c ... [root@osdev-ctrl2 ~]# ceph orch status Backend: cephadm Available: True [root@osdev-ctrl2 ~]# ceph orch host ls HOST ADDR LABELS STATUS osdev-ctrl2 osdev-ctrl2 mon mgr osdev-net1 osdev-net1 mon osdev-stor1 osdev-stor1 osd [root@osdev-ctrl2 ~]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID mgr 1/1 9m ago 20h label:mgr docker.io/ceph/ceph:v15.2.10 5b724076c58f mon 2/2 9m ago 20h label:mon docker.io/ceph/ceph:v15.2.10 5b724076c58f I then added a new mon host: [root@osdev-ctrl2 ~]# ceph orch host add osdev-ctrl3 mon It did not spawn a mon container on osdev-ctrl3 until I defined the public network in the config: [root@osdev-ctrl2 ~]# ceph config set global public_network 10.10.10.0/24 At this point all is good with three running mon as expected. Now I wanted to delete the mon using [root@osdev-ctrl2 ~]# ceph orch host rm osdev-ctrl3 This had the effect of: 1. removing the osdev-ctrl3 mon from 'ceph orch ls' and 'ceph orch ps' 2. the mon on osdev-ctrl3 is still running, and is part of 'ceph -s' but reported as not managed by cephadm 3. (Big issue) the mon running on osdev-net1 was completely destroyed. Any ideas what is going on? Sorry for the long post, but I tried to be as clear as possible. -- Gary Molenkamp Computer Science/Science Technology Services Systems Administrator University of Western Ontario molenkam(a)uwo.ca http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566

3 years

3
3
0 0

First 6 nodes cluster with Octopus

by mabi

Hello, I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra. 1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04 instead or should I stick with 18.04? 2) The documentation recommends using Cephadm for new deployments, so I will use that but I read that with Cephadm everything is running in containers, so is this the new way to go? Or is Ceph in containers kind of still experimental? 3) As I will be needing cephfs I will also need MDS servers so with a total of 6 nodes I am planning the following layout: Node 1: MGR+MON+MDS Node 2: MGR+MON+MDS Node 3: MGR+MON+MDS Node 4: OSD Node 5: OSD Node 6: OSD Does this make sense? I am mostly interested in stability and HA with this setup. 4) Is there any special kind of demand in terms of disks on the MGR+MON+MDS nodes? Or can I use have my OS disks on these nodes? As far as I understand the MDS will create a metadata pool on the OSDs. Thanks for the hints. Best, Mabi

3 years

4
7
0 0

Ceph User Survey Working Group - Next Steps

by Mike Perez

Hi everyone, We are approaching the April 2nd deadline in two weeks, so we should start proposing the next meeting to plan the survey results. Anybody in the community is welcome to join the Ceph Working Groups. Please add your name to: https://ceph.io/user-survey/ I have started a doodle: https://doodle.com/poll/y3t2ttdt8a3egz4v?utm_source=poll&utm_medium=link Please help promote the User Survey: https://twitter.com/Ceph/status/1369589099716349956 -- Mike Perez (thingee)

3 years

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021