June 2021 - ceph-users - lists.ceph.io

by Jorge JP

Hello, I have a ceph cluster with 5 nodes (1 hdd each node). I want to add 5 more drives (hdd) to expand my cluster. What is the best strategy for this? I will add each drive in each node but is a good strategy add one drive and wait to rebalance the data to new osd for add new osd? or maybe.. I should be add the 5 drives without wait rebalancing and ceph rebalancing the data to all new osd? Thank you.

2 years, 11 months

5
4
0 0

Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

by Joe Comeau

We also run with Dell VLT switches (40 GB) everything is active/active, so multiple paths as Andrew describes in his config Our config allows us: bring down one of the switches for upgrades bring down an iscsi gatway for patching all the while at least one path is up and servicing Thanks Joe >>> Andrew Walker-Brown <andrew_jbrown(a)hotmail.com> 6/15/2021 10:26 AM >>> With an unstable link/port you could see the issues you describe. Ping doesn’t have the packet rate for you to necessarily have a packet in transit at exactly the same time as the port fails temporarily. Iperf on the other hand could certainly show the issue, higher packet rate and more likely to have packets in flight at the time of a link fail...combined with packet loss/retries gives poor throughput. Depending on what you want to happen, there are a number of tuning options both on the switches and Linux. If you want the LAG to be down if any link fails, the you should be able to config this on the switches and/or Linux (minimum number of links = 2 if you have 2 links in the lag). You can also tune the link monitoring, how frequently the links are checked (e.g. miimon) etc. Bringing this value down from the default of 100ms may allow you to detect a link failure more quickly. But you then run into the chance if detecting a transient failure that wouldn’t have caused any issues....and the LAG becoming more unstable. Flapping/unstable links are the worst kind of situation. Ideally you’d pick that up quickly from monitoring/alerts and either fix immediately or take the link down until you can fix it. I run 2x10G from my hosts into separate switches (Dell S series – VLT between switches). Pulling a single interface has no impact on Ceph, any packet loss is tiny and we’re not exceeding 10G bandwidth per host. If you’re running 1G links and the LAG is already busy, a link failure could be causing slow writes to the host, just down to congestion...which then starts to impact the wider cluster based on how Ceph works. Just caveating the above with - I’m relatively new to Ceph myself.... Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: huxiaoyu(a)horebdata.cn<mailto:huxiaoyu@horebdata.cn> Sent: 15 June 2021 17:52 To: Serkan Çoban<mailto:cobanserkan@gmail.com> Cc: ceph-users<mailto:ceph-users@ceph.io> Subject: [ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG When i pull out the cable, then the bond is working properly. Does it mean that the port is somehow flapping? Ping can still work, but the iperf test yields very low results. huxiaoyu(a)horebdata.cn From: Serkan Çoban Date: 2021-06-15 18:47 To: huxiaoyu(a)horebdata.cn CC: ceph-users Subject: Re: [ceph-users] Issues with Ceph network redundancy using L2 MC-LAG Do you observe the same behaviour when you pull a cable? Maybe a flapping port might cause this kind of behaviour, other than that you should't see any network disconnects. Are you sure about LACP configuration, what is the output of 'cat /proc/net/bonding/bond0' On Tue, Jun 15, 2021 at 7:19 PM huxiaoyu(a)horebdata.cn <huxiaoyu(a)horebdata.cn> wrote: > > Dear Cephers, > > I encountered the following networking issue several times, and i wonder whether there is a solution for networking HA solution. > > We build ceph using L2 multi chassis link aggregation group (MC-LAG ) to provide switch redundancy. On each host, we use 802.3ad, LACP > mode for NIC redundancy. However, we observe several times, when a single network port, either the cable, or the SFP+ optical module fails, Ceph cluster is badly affected by networking, although in theory it should be able to tolerate. > > Did i miss something important here? and how to really achieve networking HA in Ceph cluster? > > best regards, > > Samuel > > > > > huxiaoyu(a)horebdata.cn > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

4
4
0 0

Re: Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

by Serkan Çoban

You cannot do much if the link is flapping or the cable is bad. Maybe you can write some rules to shut the port down on the switch if the error packet ratio goes up. I also remember there are some config on the switch side for link flapping. On Wed, Jun 16, 2021 at 10:57 AM huxiaoyu(a)horebdata.cn <huxiaoyu(a)horebdata.cn> wrote: > > Is it true that MC-LAG and 803.2ad, by its default, are working on active-active. > > What else should i take care to ensure fault tolerance when one path is bad? > > best regards, > > samuel > > > > huxiaoyu(a)horebdata.cn > > From: Joe Comeau > Date: 2021-06-15 23:44 > To: ceph-users(a)ceph.io > Subject: [ceph-users] Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG > We also run with Dell VLT switches (40 GB) > everything is active/active, so multiple paths as Andrew describes in > his config > Our config allows us: > bring down one of the switches for upgrades > bring down an iscsi gatway for patching > all the while at least one path is up and servicing > Thanks Joe > > > >>> Andrew Walker-Brown <andrew_jbrown(a)hotmail.com> 6/15/2021 10:26 AM > >>> > With an unstable link/port you could see the issues you describe. Ping > doesn’t have the packet rate for you to necessarily have a packet in > transit at exactly the same time as the port fails temporarily. Iperf > on the other hand could certainly show the issue, higher packet rate and > more likely to have packets in flight at the time of a link > fail...combined with packet loss/retries gives poor throughput. > > Depending on what you want to happen, there are a number of tuning > options both on the switches and Linux. If you want the LAG to be down > if any link fails, the you should be able to config this on the switches > and/or Linux (minimum number of links = 2 if you have 2 links in the > lag). > > You can also tune the link monitoring, how frequently the links are > checked (e.g. miimon) etc. Bringing this value down from the default of > 100ms may allow you to detect a link failure more quickly. But you then > run into the chance if detecting a transient failure that wouldn’t have > caused any issues....and the LAG becoming more unstable. > > Flapping/unstable links are the worst kind of situation. Ideally you’d > pick that up quickly from monitoring/alerts and either fix immediately > or take the link down until you can fix it. > > I run 2x10G from my hosts into separate switches (Dell S series – VLT > between switches). Pulling a single interface has no impact on Ceph, > any packet loss is tiny and we’re not exceeding 10G bandwidth per host. > > If you’re running 1G links and the LAG is already busy, a link failure > could be causing slow writes to the host, just down to > congestion...which then starts to impact the wider cluster based on how > Ceph works. > > Just caveating the above with - I’m relatively new to Ceph myself.... > > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > From: huxiaoyu(a)horebdata.cn<mailto:huxiaoyu@horebdata.cn> > Sent: 15 June 2021 17:52 > To: Serkan Çoban<mailto:cobanserkan@gmail.com> > Cc: ceph-users<mailto:ceph-users@ceph.io> > Subject: [ceph-users] Re: Issues with Ceph network redundancy using L2 > MC-LAG > > When i pull out the cable, then the bond is working properly. > > Does it mean that the port is somehow flapping? Ping can still work, > but the iperf test yields very low results. > > > > > > huxiaoyu(a)horebdata.cn > > From: Serkan Çoban > Date: 2021-06-15 18:47 > To: huxiaoyu(a)horebdata.cn > CC: ceph-users > Subject: Re: [ceph-users] Issues with Ceph network redundancy using L2 > MC-LAG > Do you observe the same behaviour when you pull a cable? > Maybe a flapping port might cause this kind of behaviour, other than > that you should't see any network disconnects. > Are you sure about LACP configuration, what is the output of 'cat > /proc/net/bonding/bond0' > > On Tue, Jun 15, 2021 at 7:19 PM huxiaoyu(a)horebdata.cn > <huxiaoyu(a)horebdata.cn> wrote: > > > > Dear Cephers, > > > > I encountered the following networking issue several times, and i > wonder whether there is a solution for networking HA solution. > > > > We build ceph using L2 multi chassis link aggregation group (MC-LAG ) > to provide switch redundancy. On each host, we use 802.3ad, LACP > > mode for NIC redundancy. However, we observe several times, when a > single network port, either the cable, or the SFP+ optical module fails, > Ceph cluster is badly affected by networking, although in theory it > should be able to tolerate. > > > > Did i miss something important here? and how to really achieve > networking HA in Ceph cluster? > > > > best regards, > > > > Samuel > > > > > > > > > > huxiaoyu(a)horebdata.cn > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

1
0
0 0

Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

by Ilya Dryomov

On Mon, May 3, 2021 at 12:27 PM Magnus Harlander <magnus(a)harlan.de> wrote: > > Am 03.05.21 um 12:25 schrieb Ilya Dryomov: > > ceph osd setmaxosd 10 > > Bingo! Mount works again. > > Veeeery strange things are going on here (-: > > Thanx a lot for now!! If I can help to track it down, please let me know. Good to know it helped! I'll think about this some more and probably plan to patch the kernel client to be less stringent and not choke on this sort of misconfiguration. Thanks, Ilya

2 years, 11 months

3
6
0 0

Docs on Containerized Mon Maintenance

by Phil Merricks

Hey folks, I'm working through some basic ops drills, and noticed what I think is an inconsistency in the Cephadm Docs. Some Googling appears to show this is a known thing, but I didn't find a clear direction on cooking up a solution yet. On a cluster with 5 mons, 2 were abruptly removed when their host OS decided to do scheduled maintenance without asking first. Those hosts only had mons running on them (and mds/crash/node exporter), so I still have 3 mon quorum and the cluster is happy. It's not clear to me how I add these hosts back in as mons though. In the troubleshooting docs it describes bringing all mons down, then extracting a monmap. I tried this through various iterations of bringing all down, bringing one back up and entering the container; bringing all down and trying to use ceph-mon from cephadm shell and so on. I either got rocksdb lock issues presumably because a mon node was running, or an error that the path to the mon data didn't exist, presumably for the opposite reason. Is there guidance on the container-friendly way to perform the monmap maintenance? I did think that because I still have quorum, I could simply do ceph orch apply mon label:mon instead, but I am nervous this might upset my remaining mons. Looking at the ceph orch ls output I see: root@kida:/# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager 1/1 7m ago 2h count:1 crash 5/5 9m ago 2h * grafana 1/1 7m ago 2h count:1 mds.media 3/3 9m ago 2h thebends;okcomputer;amnesiac mgr 2/2 9m ago 2h count:2 mon 3/5 9m ago 2h label:mon node-exporter 5/5 9m ago 2h * osd.all-available-devices 5/10 9m ago 2h * prometheus 1/1 7m ago 2h count:1 root@kida:/# So is it expecting 2 more mons, or has it autoscaled down cleverly? Looking at ceph orch ps I see: root@kida:/# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE VERSION IMAGE ID CONTAINER ID alertmanager.kida kida *:9093,9094 running (2h) 8m ago 2h 0.20.0 0881eb8f169f 89c604455194 crash.amnesiac amnesiac running (11h) 8m ago 11h 16.2.4 8d91d370c2b8 bff086c930db crash.kida kida running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 b0ac059be109 crash.kingoflimbs kingoflimbs running (13h) 8m ago 13h 16.2.4 8d91d370c2b8 b0955309a8b9 crash.okcomputer okcomputer running (2h) 10m ago 2h 16.2.4 8d91d370c2b8 a75cf65ef235 crash.thebends thebends running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 befe9c1015f3 grafana.kida kida *:3000 running (2h) 8m ago 2h 6.7.4 ae5c36c3d3cd f85747138299 mds.media.amnesiac.uujwlk amnesiac running (11h) 8m ago 2h 16.2.4 8d91d370c2b8 512a2fcc0f97 mds.media.okcomputer.nednib okcomputer running (2h) 10m ago 2h 16.2.4 8d91d370c2b8 10c6244a9308 mds.media.thebends.pqsfeb thebends running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 c1b75831a973 mgr.kida.kchysa kida *:9283 running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 602acc0d8df3 mgr.okcomputer.rjtrqw okcomputer *:8443,9283 running (2h) 10m ago 2h 16.2.4 8d91d370c2b8 605a8a25a604 mon.amnesiac amnesiac stopped 8m ago 2h <unknown> <unknown> <unknown> mon.kida kida running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 a441563a978d mon.kingoflimbs kingoflimbs stopped 8m ago 2h <unknown> <unknown> <unknown> mon.okcomputer okcomputer running (2h) 10m ago 2h 16.2.4 8d91d370c2b8 c4297efafe27 mon.thebends thebends running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 e2394d5f152b node-exporter.amnesiac amnesiac *:9100 running (11h) 8m ago 2h 0.18.1 e5a616e4b9cf da3c69057c4f node-exporter.kida kida *:9100 running (2h) 8m ago 2h 0.18.1 e5a616e4b9cf 5c9219a29257 node-exporter.kingoflimbs kingoflimbs *:9100 running (13h) 8m ago 2h 0.18.1 e5a616e4b9cf c2236491fb6e node-exporter.okcomputer okcomputer *:9100 running (2h) 10m ago 2h 0.18.1 e5a616e4b9cf 2e53a82eed32 node-exporter.thebends thebends *:9100 running (2h) 8m ago 2h 0.18.1 e5a616e4b9cf def6bdd359d6 osd.0 kida running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 c1419a29ddd8 osd.1 kida running (85m) 8m ago 2h 16.2.4 8d91d370c2b8 dcb172c628ec osd.2 thebends running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 4826e3da8d14 osd.3 okcomputer running (2h) 10m ago 2h 16.2.4 8d91d370c2b8 5424d437c270 osd.4 thebends running (2h) 8m ago 2h 16.2.4 8d91d370c2b8 47e682c3727d prometheus.kida kida *:9095 running (2h) 8m ago 2h 2.18.1 de242295e225 4c8e7fdd89a8 root@kida:/# So those mon containers are still there, stopped. ceph orch daemon restart mon.amnesiac gives notice that a restart is scheduled on that mon. The container status updates in ceph orch ps to running, but version, image ID and container ID are <unknown> and I don't see that mon unit in any status output or log. cephadm unit --name mon.amnesiac restart --fsid yadda-yadda-yadda errors with daemon not found, it seems like the cephadm cli command is scoped to the daemons running on the same host it's being executed on, rather than cluster-wide like ceph orch. Any clues offered to further investigation are welcomed. Best regards Phil

2 years, 11 months

1
0
0 0

Mon crash when client mounts CephFS

by Phil Merricks

Hey folks, I have deployed a 3 node dev cluster using cephadm. Deployment went smoothly and all seems well. If I try to mount a CephFS from a client node, 2/3 mons crash however. I've begun picking through the logs to see what I can see, but so far other than seeing the crash in the log itself, it's unclear what the cause of the crash is. Here's a log. <https://termbin.com/isaz>. You can see where the crash is occurring around the line that begins with "Jun 08 18:56:04 okcomputer podman[790987]:" I would welcome any advice on either what the cause may be, or how I can advance the analysis of what's wrong. Best regards Phil

2 years, 11 months

3
3
0 0

JSON output schema

by Vladimir Prokofev

Good day. I'm writing some code for parsing output data for monitoring purposes. The data is that of "ceph status -f json", "ceph df -f json", "ceph osd perf -f json" and "ceph osd pool stats -f json". I also need support for all major CEPH releases, starting with Jewel till Pacific. What I've stumbled upon is that: - keys in JSON output are not present if there's no appropriate data. For example the key ['pgmap', 'read_bytes_sec'] will not be present in "ceph status" output if there's no read activity in the cluster; - some keys changed between versions. For example ['health']['status'] key is not present in Jewel, but is available in all the following versions; vice-versa, key ['osdmap', 'osdmap'] is not present in Pacific, but is in all the previous versions. So I need to get a list of all possible keys for all CEPH releases. Any ideas how this can be achieved? My only thought atm is to build a "failing" cluster with all the possible states and get a reference data out of it. Not only this is tedious work since it requires each possible cluster version, but it is also prone for error. Is there any publicly available JSON schema for output?

2 years, 11 months

1
0
0 0

Re: Issues with Ceph network redundancy using L2 MC-LAG

by Serkan Çoban

Do you observe the same behaviour when you pull a cable? Maybe a flapping port might cause this kind of behaviour, other than that you should't see any network disconnects. Are you sure about LACP configuration, what is the output of 'cat /proc/net/bonding/bond0' On Tue, Jun 15, 2021 at 7:19 PM huxiaoyu(a)horebdata.cn <huxiaoyu(a)horebdata.cn> wrote: > > Dear Cephers, > > I encountered the following networking issue several times, and i wonder whether there is a solution for networking HA solution. > > We build ceph using L2 multi chassis link aggregation group (MC-LAG ) to provide switch redundancy. On each host, we use 802.3ad, LACP > mode for NIC redundancy. However, we observe several times, when a single network port, either the cable, or the SFP+ optical module fails, Ceph cluster is badly affected by networking, although in theory it should be able to tolerate. > > Did i miss something important here? and how to really achieve networking HA in Ceph cluster? > > best regards, > > Samuel > > > > > huxiaoyu(a)horebdata.cn > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

4
4
0 0

How to orch apply single site rgw with custom front-end

by Vladimir Brik

Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s ssl_certificate=/etc/ceph/rgw.crt I would like to deploy 16.2.4 rgw daemons, but I don't know how to configure them. When I used "ceph orch apply rgw <name> <placement>", it created a new entry in the monitor configuration database instead of using existing rgw_frontends entry. I am guessing that I need to name the config db entry correctly, but I don't know what name to use. Currently I have $ ceph config get client rgw_frontends civetweb num_threads=5000 port=443s ssl_certificate=/etc/ceph/rgw.crt Can anybody help? Thanks, Vlad

2 years, 11 months

1
0
0 0

problem using gwcli; package dependancy lockout

by Philip Brown

I'm trying to update a ceph octopus install, to add an iscsi gateway, using ceph-ansible, and gwcli wont run for me. The ansible run went well.. but when I try to actually use gwcli, I get (blahblah) ImportError: No module named rados which isnt too surprising, since "python-rados" is not installed. HOWEVER. The ceph repos installed by ceph-ansible (5.0, the octopus release) are http://download.ceph.com/ceph-iscsi/3/rpm/el7/noarch which provides python3-rados This supposedly "obsoletes" python-rados. Except it doesnt, because python-rados is for python2. But even if it didnt.. there is no provided python v2 rados module in the ceph-iscsi repo, or the main ceph repo. and gwcli is still python 2. So, what am I supposed to do now? It seems like I need a python2 version of python rados, but the ceph repos dont provide one, so they basically provide a broken gwcli? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

2 years, 11 months

1
0
1 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021