Hi!
After upgrading MONs and MGRs successfully, the first OSD host I upgraded on Ubuntu Bionic from 14.2.16 to 15.2.10
shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot.
RocksDB complains "Corruption: unknown WriteBatch tag".
The initial crash/corruption occured when the automatic fsck was ran, and when it committed the changes for a lot of "zombie spanning blobs".
Tracker issue with logs: https://tracker.ceph.com/issues/50017
Anyone else encountered this error? I've "suspended" the upgrade for now :)
-- Jonas
We have a ceph octopus cluster running 15.2.6, its indicating a near full
osd which I can see is not weighted equally with the rest of the osds. I
tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a
little bit, but unlike the nautilus clusters, I see no data movement when
issuing the command. If I run a ceph osd tree, it shows the reweight
setting, but no data movement appears to be occurring.
Is there some new thing in ocotopus I am missing? I looked through the
release notes for .7, .8 and .9 and didn't see any fixes that jumped out as
resolving a bug related to this. The Octopus cluster was deployed using
ceph-ansible and upgraded to 15.2.6. I plan to upgrade to 15.2.9 in the
coming month.
Any thoughts?
Regards,
-Brent
Existing Clusters:
Test: Ocotpus 15.2.5 ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways,
2 iscsi gateways
UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways
Hi
I'm in a bit of a panic :-(
Recently we started attempting to configure a radosgw to our ceph
cluster, which was until now only doing cephfs (and rbd wss working as
well). We were messing about with ceph-ansible, as this was how we
originally installed the cluster. Anyway, it installed nautilus 14.2.18
on the radosgw and I though it would be good to pull up the rest of the
cluster to that level as well using our tried and tested ceph upgrade
script (it basically does an update of all ceph nodes one by one and
checks whether ceph is ok again before doing the next)
After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...
The message deceptively is: HEALTH_WARN Reduced data availability: 5568
pgs inactive
That's all PGs!
I tried as a desperate measure to upgrade one ceph OSD node, but that
broke as well, the osd service on that node gets an interrupt from the
kernel....
the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
"mon": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
},
"mds": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
},
"overall": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
}
}
12 OSDs are down
# ceph -s
cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
Reduced data availability: 5568 pgs inactive
services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
remapped pgs
data:
pools: 12 pools, 5568 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
5568 unknown
progress:
Rebalancing after osd.103 marked in
[..............................]
Hi everyone,
I cleaned up the CFP coordination etherpad with some events coming up.
Please add other events you think the community should be considering
proposing content on Ceph or adjacent projects like Rook.
KubeCon NA CFP, for example, is ending April 11. Take a look:
https://pad.ceph.com/p/cfp-coordination
I have also added this to our wiki for discovery.
https://tracker.ceph.com/projects/ceph/wiki/Community
--
Mike Perez
Hi everyone,
I try to configure HA service for rgw with cephadm. I have 2 rgw on cnrgw1
et cnrgw2 for the same pool.
i use a virtual IP address 192.168.0.15 cnrgwha and the config from
https://docs.ceph.com/en/latest/cephadm/rgw/#high-availability-service-for-…
# from root@cnrgw1
[root@cnrgw1 ~]# cat /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
[root@cnrgw1 ~]# sysctl -p
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
#same from cnrgw2
#generate cert
[vagrant@cn1 ~]# openssl req -x509 -nodes -days 365 -newkey rsa:2048
-keyout ./rgwha.key -out ./rgwha.crt
Generating a RSA private key
.............+++++
........................................................+++++
writing new private key to './rgwha.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:fr
State or Province Name (full name) []:est
Locality Name (eg, city) [Default City]:sbg
Organization Name (eg, company) [Default Company Ltd]:cephlab.org
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:cnrgwha
Email Address []:root@localhost
# write the YAML rgwha.yaml
service_type: ha-rgw
service_id: haproxy_for_rgw
placement:
hosts:
- cnrgw1
- cnrgw2
spec:
virtual_ip_interface: eth1
virtual_ip_address: 192.168.0.15/24
frontend_port: 8080
ha_proxy_port: 1967
ha_proxy_stats_enabled: true
ha_proxy_stats_user: admin
ha_proxy_stats_password: true
ha_proxy_enable_prometheus_exporter: true
ha_proxy_monitor_uri: /haproxy_health
keepalived_user: admin
keepalived_password: admin
ha_proxy_frontend_ssl_certificate:
[
"-----BEGIN CERTIFICATE-----",
"MIICSzCCAfWgAwIBAgIUWKC9e+5tnIAjddECXOGc144p8E0wDQYJKoZIhvcNAQEL",
"BQAwejELMAkGA1UEBhMCZnIxDDAKBgNVBAgMA2VzdDEMMAoGA1UEBwwDc2JnMRAw",
"DgYDVQQKDAdjZXBobGFiMQwwCgYDVQQLDANvcmcxEDAOBgNVBAMMB2Nucmd3aGEx",
"HTAbBgkqhkiG9w0BCQEWDnJvb3RAbG9jYWxob3N0MB4XDTIxMDMwOTE0MjI0N1oX",
"DTIyMDMwOTE0MjI0N1owejELMAkGA1UEBhMCZnIxDDAKBgNVBAgMA2VzdDEMMAoG",
"A1UEBwwDc2JnMRAwDgYDVQQKDAdjZXBobGFiMQwwCgYDVQQLDANvcmcxEDAOBgNV",
"BAMMB2Nucmd3aGExHTAbBgkqhkiG9w0BCQEWDnJvb3RAbG9jYWxob3N0MFwwDQYJ",
"KoZIhvcNAQEBBQADSwAwSAJBAMqji/AKBr6DbuHKOTWyIBWbeYkyZ7Jn7fqfZceE",
"p7G321t1TvAjD7sa64FRT6n4x8CtzKPGXXpRr28o8oR1h70CAwEAAaNTMFEwHQYD",
"VR0OBBYEFIQim5ZxojFny+srzQJIs1N8wLmYMB8GA1UdIwQYMBaAFIQim5ZxojFn",
"y+srzQJIs1N8wLmYMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADQQCE",
"eCwMQFNYtw+4I1QzTV13ewawuPkPdrhiNzcs0mgt93+quE0zBIeOY2jnFmlo6H/h",
"syYGvwgcAh9VW9qo5fsk",
"-----END CERTIFICATE-----",
"-----BEGIN PRIVATE KEY-----",
"MIIBVQIBADANBgkqhkiG9w0BAQEFAASCAT8wggE7AgEAAkEAyqOL8AoGvoNu4co5",
"NbIgFZt5iTJnsmft+p9lx4SnsbfbW3VO8CMPuxrrgVFPqfjHwK3Mo8ZdelGvbyjy",
"hHWHvQIDAQABAkB0kt2AO+RhWS9CyZlb4JtAku66FLs/ETcAxQ5CV3g5beq8/wRs",
"x3xZhIsjdr7OZZ+BEoJYn+0upywoctXmwM8BAiEA+KG26RADqJfAdoRn640UrT9E",
"pfF3drDrQg0WrKAf3N0CIQDQpOZa0pV2GL28u2NaU85uJCDeKDWhTnvFEqlLu/S4",
"YQIhAPY+0/WIUtdLVOcMxA/bLrtXihoASR1Yo+hLJkXaYTRRAiB3Rh1txD6vEXu+",
"Hb2xUIGNE1g6x+/ItA4rXfysD9nZYQIhAKYn3IdG55JwiwSKv8gVAEdX8xiUfEjY",
"pnvk3p52VHHI",
"-----END PRIVATE KEY-----"
]
ha_proxy_frontend_ssl_port: 8090
ha_proxy_ssl_dh_param: 1024
ha_proxy_ssl_ciphers: ECDH+AESGCM:!MD5
ha_proxy_ssl_options: no-sslv3
haproxy_container_image: haproxy:2.4-dev3-alpine
keepalived_container_image: arcts/keepalived:1.2.2
# apply the new config
[ceph: root@cn1 ~]# ceph orch apply -i rgwha.yaml
Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument
'virtual_ip_interface'
Do you have any leads why it doesn't work?
[ceph: root@cn1 /]# ceph versions
{
"mon": {
"ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17)
octopus (stable)": 5
},
"mgr": {
"ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17)
octopus (stable)": 2
},
"osd": {
"ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17)
octopus (stable)": 8
},
"mds": {},
"rgw": {
"ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17)
octopus (stable)": 2
},
"overall": {
"ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17)
octopus (stable)": 17
}
}
Good morning all,
I'm experimenting with ceph orchestration and cephadm after using
ceph-deploy for several years, and I have a hopefully simple question.
I've converted a basic nautilus cluster over to cephadm+orchestration
and I tried adding, then removing a monitor. However, when I removed the
host using 'ceph orch host rm', it removed two mons. I may have missed
something in the adoption/upgrade that has left the cluster in a bad
state. Any advice/pointers/clarification would be of assistance.
Details:
A nautilus cluster with two mons (I know this is not correct for
quorum), a mgr, and a handful of osds. I went though the adoption
process and enabled the ceph orch backend.
[root@osdev-ctrl2 ~]# ceph orch ps
NAME HOST STATUS REFRESHED
AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
mgr.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f e73c19b51a09
mon.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f a6bfc27221f0
mon.osdev-net1 osdev-net1 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f f66e2bef3d44
osd.0 osdev-stor1 running (17h) 50s ago 17h
15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f ac59dbdc267c
...
[root@osdev-ctrl2 ~]# ceph orch status
Backend: cephadm
Available: True
[root@osdev-ctrl2 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
osdev-ctrl2 osdev-ctrl2 mon mgr
osdev-net1 osdev-net1 mon
osdev-stor1 osdev-stor1 osd
[root@osdev-ctrl2 ~]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME
IMAGE ID
mgr 1/1 9m ago 20h label:mgr docker.io/ceph/ceph:v15.2.10
5b724076c58f
mon 2/2 9m ago 20h label:mon docker.io/ceph/ceph:v15.2.10
5b724076c58f
I then added a new mon host:
[root@osdev-ctrl2 ~]# ceph orch host add osdev-ctrl3 mon
It did not spawn a mon container on osdev-ctrl3 until I defined the
public network in the config:
[root@osdev-ctrl2 ~]# ceph config set global public_network 10.10.10.0/24
At this point all is good with three running mon as expected. Now I
wanted to delete the mon using
[root@osdev-ctrl2 ~]# ceph orch host rm osdev-ctrl3
This had the effect of:
1. removing the osdev-ctrl3 mon from 'ceph orch ls' and 'ceph orch ps'
2. the mon on osdev-ctrl3 is still running, and is part of 'ceph
-s' but reported as not managed by cephadm
3. (Big issue) the mon running on osdev-net1 was completely destroyed.
Any ideas what is going on? Sorry for the long post, but I tried to be
as clear as possible.
--
Gary Molenkamp Computer Science/Science Technology Services
Systems Administrator University of Western Ontario
molenkam(a)uwo.ca http://www.csd.uwo.ca
(519) 661-2111 x86882 (519) 661-3566
Hello,
I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra.
1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04 instead or should I stick with 18.04?
2) The documentation recommends using Cephadm for new deployments, so I will use that but I read that with Cephadm everything is running in containers, so is this the new way to go? Or is Ceph in containers kind of still experimental?
3) As I will be needing cephfs I will also need MDS servers so with a total of 6 nodes I am planning the following layout:
Node 1: MGR+MON+MDS
Node 2: MGR+MON+MDS
Node 3: MGR+MON+MDS
Node 4: OSD
Node 5: OSD
Node 6: OSD
Does this make sense? I am mostly interested in stability and HA with this setup.
4) Is there any special kind of demand in terms of disks on the MGR+MON+MDS nodes? Or can I use have my OS disks on these nodes? As far as I understand the MDS will create a metadata pool on the OSDs.
Thanks for the hints.
Best,
Mabi
Hello everyone,
I have a small ceph cluster consisting of 4 Ubuntu 20.04 osd servers mainly serving rbd images to Cloudstack kvm cluster. The ceph version is 15.2.9. The network is done in such a way that all storage cluster is ran over infiniband qdr links (ipoib). We've got the management network for our ceph servers and kvm over ethernet (192.168.1.1/24) and the ipoib storage network 192.168.2.1/24. We are in the process of updating our cluster with new hardware and planning to scrap the infiniband connectivity altogether and replace it with 10gbit ethernet. We are also going to replace the kvm host servers too. We were hoping to have minimal or preferably no downtime in this process.
I was wondering if we could run the ceph services (mon, osd, radosgw) concurrently over two networks after we've added the 10G ethernet? While the upgrades and migration taking place, we need to have the ceph running over the current ipoib 192.168.2.1/24 as well as the 10G 192.168.3.1/24. Could you please help me with this?
Cheers
Andrei