Hi all,
I have a Ceph cluster with a standard setup :
- the public network : MONs and OSDs conected in the same agg switch
with ports in the same access vlan
- private network : OSDs connected in another switch with a second eth
connected in another access vlan
I need to change the public vlan on the first switch and the private
vlan and the second switch.
Although it should be a trivial operation (just change the vlan range
ports in a single command), it means that all the OSDs and MONs will not
be able to communicate with each other for a few seconds . (first on the
public network, then on the private network). Do you know if this very
short period of downtime will mess up the cluster somehow ? Is there a
best practice on how to do this safely ?
Thank you ,
Adrian.
Hi
I had a cluster on v13 (mimic) and have converted it to Octopus (15.2.3) and using Cephadm. In the dashboard is showing as all v15
What do I need to do with the Ceph rpms that are installed as they are all Ceph version 13.
Do I remove them and install Ceph rpms with version 15 ?
Regards
Andy
Hello,
I think I missunderstood the internal / public network concepts in the docs https://docs.ceph.com/docs/master/rados/configuration/network-config-ref/.
Now there are two questions:
- Is it somehow possible to bind the MON daemon to 0.0.0.0?
I tried it with manually add the ip in /var/lib/ceph/{UUID}/mon.node01/config
[mon.node01]
public bind addr = 0.0.0.0
But that does not work, in netstat I can see, the mon still binds to it's internal ip. Is this an expected behaviour?
If I set this value to the public ip, the other nodes cannot communicate with it, so this leads to the next question:
- What's the Right way to correct the problem with the orchestrator?
So the correct way to configure the ip's, would be to set every mon, mds and so on, to the public ip and just let the osd's stay on their internal ip. (described here https://docs.ceph.com/docs/master/rados/configuration/network-config-ref/)
Do I have to remove every daemon and redeploy them with "ceph orch daemon rm" / "ceph orch apply"?
Or do I have to go to every node and manually apply the settings in the daemon config file?
Thanks in advance,
Simon
Hi all,
I found these messages today:
2020-06-04 17:07:57.471 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error -2 reading object 14:e4c5ebb6:::1000203c59b.00000002:head
2020-06-04 17:08:04.236 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error -2 reading object 14:e4c9a1a1:::1000203ad7f.00000000:head
in one of our OSD logs. The disk is healthy according to smartctl. Should I worry about that?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
After having to revert back to ceph-fuse upgrading to nautilus, I have
also that the nfs-ganesha mount stalls/breaks every day. Probably caused
by:
1 clients failing to respond to capability release
2 clients failing to respond to cache pressure
1 MDSs report slow requests
How to fix this?
Hi,
I have 15628 misplaced objects that are currently backfilling as follows:
1. pgid:14.3ce1 from:osd.1321 to:osd.3313
2. pgid:14.4dd9 from:osd.1693 to:osd.2980
3. pgid:14.680b from:osd.362 to:osd.3313
These are remnant backfills from a pg-upmap/rebalance campaign after we've
added 2 new racks worth of osds to our cluster.
Our mon db is bloated so I'm wanting to trim the mon db before continuing
the next pg-upmap/rebalance campaign.
So, my question is:
Is there any way I can speed up the backfill process on these individual
osds?
Or hints to trace out why these are so slow?
Regards
Hi,
I've 15.2.1 installed on all machines. On primary machine I executed ceph upgrade command:
$ ceph orch upgrade start --ceph-version 15.2.2
When I check ceph -s I see this:
progress:
Upgrade to docker.io/ceph/ceph:v15.2.2 (30m)
[=...........................] (remaining: 8h)
It says 8 hours. It is already ran for 3 hours. No upgrade processed. It get stuck at this point.
Is there any way to know why this has stuck?
Thanks,
Gencer.