June 2021 - ceph-users - lists.ceph.io

by Jan-Philipp Litza

Hi everyone, recently I'm noticing that starting OSDs for the first time takes ages (like, more than an hour) before they are even picked up by the monitors as "up" and start backfilling. I'm not entirely sure if this is a new phenomenon or if it always was that way. Either way, I'd like to understand why. When I execute `ceph daemon osd.X status`, it says "state: preboot" and I can see the "newest_map" increase slowly. Apparently, a new OSD doesn't fetch the latest OSD map and gets to work, but instead fetches hundreds of thousands of OSD maps from the mon, burning CPU while parsing them. I wasn't able to find any good documentation on the OSDMap, in particular why its historical versions need to be kept and why the OSD seemingly needs so many of them. Can anybody point me in the right direction? Or is something wrong with my cluster? Best regards, Jan-Philipp Litza

2 years, 10 months

3
6
0 0

Octopus support

by Shafiq Momin

Hi all, I see octopus is having limited Suport on Centos 7 I have prod cluster with 1.2 PTB data with nautilus 14.2 Can we upgrade on Centos 7 from nautilus to octopus or we foresee issue We have erasure coded pool Please guide on recommended approach and document if any Will yum upgrade will upgrade nautilus to octopus Regards -- Regards Shafiq 9029056566/9223316304

2 years, 10 months

2
1
0 0

Fwd: In "ceph health detail", what's the diff between MDS_SLOW_METADATA_IO and MDS_SLOW_REQUEST?

by opengers

---------- Forwarded message --------- 发件人： opengers <zijian1012(a)gmail.com> Date: 2021年6月22日周二上午11:12 Subject: Re: [ceph-users] In "ceph health detail", what's the diff between MDS_SLOW_METADATA_IO and MDS_SLOW_REQUEST? To: Patrick Donnelly <pdonnell(a)redhat.com> Thanks for the answer, I still have some confusion when I see the explanation of "MDS_SLOW_REQUEST" from the document , as follows ------ MDS_SLOW_REQUEST Message “N slow requests are blocked” Description One or more client requests have not been completed promptly, indicating that the MDS is either running very slowly, or that the RADOS cluster is not acknowledging journal writes promptly, or that there is a bug. Use the ops admin socket command to list outstanding metadata operations. This message appears if any client requests have taken longer than mds_op_complaint_time (default 30s). FROM: https://docs.ceph.com/en/latest/cephfs/health-messages/ ------ "or that the RADOS cluster is not acknowledging journal writes promptly", from this sentence, it seems that "MDS_SLOW_REQUEST" also contains OSD operations by the MDS? Patrick Donnelly <pdonnell(a)redhat.com> 于2021年6月22日周二上午3:23写道： > Hello, > > On Mon, Jun 21, 2021 at 8:54 AM opengers <zijian1012(a)gmail.com> wrote: > > > > *$ *ceph health detail > > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow > > requests MDS_SLOW_METADATA_IO > > 1 MDSs report slow metadata IOs > > mds.fs-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, > oldest > > blocked for 51123 secs MDS_SLOW_REQUEST 1 MDSs report slow requests > > MDS_SLOW_REQUEST: RPCs from the client to the MDS are "slow", i.e. not > complete in less than 30 seconds. > MDS_SLOW_METADATA_IO: OSD operations by the MDS are not yet complete > after 30 seconds. > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Principal Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > >

2 years, 10 months

1
0
0 0

mark_unfound_lost delete not deleting unfound objects

by Brian Andrus

I am wondering if anyone has experience with the mark_unfound_lost delete command seemingly not doing what it is supposed to, or if perhaps I have unreasonable expectations about its function. We have a EC pool making up a rgw data pool, and we have had a data loss scenario. I've attempted to manually recover shards from all listed peers in might_have_unfound, and had some success, but after extensive searching, I believe the time has come to let go of the data we are still missing in hopes of getting the cluster back to healthy and restoring service functionality. When I run "ceph pg 21.258e mark_unfound_lost delete", the command runs for some time, until a few minutes in the primary OSD drops out of the cluster but is still running. The logs would suggest this is because it is doing some intensive iterative operations and is unresponsive to other OSDs. Given we have tens of thousands of objects being marked lost, it would make sense this might take some time... but in the meantime, the OSD is marked out, another OSD takes its place, and the number of unfound objects for the PG increases over the next few hours back to the original amount. It seems so far, the primary OSD has not come back in every time I've tried this operation. My initial reaction was to restart the OSD when it dropped from the cluster (and its PG went DOWN state) in an attempt to keep the RGW functioning, but I realize that could have been counterproductive once I observed the logs of the primary iterating over objects. Yet even leaving the OSD to complete the iterative process, it doesn't seem to rejoin cluster without an intervention in the form of daemon restart. I'm wondering if anyone has experience deleting unfound objects at this scale, and if it is an asynchronous operation that eventually completes, or if we are encountering some unexpected behavior that warrants a bug report? I am also wondering if ceph-objectstore-tool might be employed to work on all shards of the PG at once and just start them back up together, minus the unfound objects? I haven't seen much useful documented use of the "fix-lost" operation, so I have hesitated to try it without a full understanding of what it does. Thank you to anyone who might be able to provide some information. -- Brian Andrus | Cloud Systems Engineer | DreamHost brian.andrus(a)DreamHost.com | www.dreamhost.com

2 years, 10 months

1
0
0 0

In "ceph health detail", what's the diff between MDS_SLOW_METADATA_IO and MDS_SLOW_REQUEST?

by opengers

*$ *ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mds.fs-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, oldest blocked for 51123 secs MDS_SLOW_REQUEST 1 MDSs report slow requests mds.fs-01(mds.0): 5 slow requests are blocked > 30 secs Do these two refer to cephfs metadata IO?

2 years, 10 months

2
1
0 0

Re: Why you might want packages not containers for Ceph deployments

by Marc

> -----Original Message----- > Sent: Sunday, 20 June 2021 21:34 > To: ceph-users(a)ceph.io > Subject: *****SPAM***** [ceph-users] Re: Why you might want packages not > containers for Ceph deployments > > > > 3. Why is in this cephadm still being talked about systemd? Your > orchestrator should handle restarts,namespaces and failed tasks not? > There should be no need to have a systemd dependency, at least I have > not seen any container images relying on this. > > Podman uses systemd to manage containers so that it is daemonless, > contrast with Docker where one has to maintain a separate daemon and use > docker-specific tools to mange containers. If you assert that Podman > should not exist, please take that up with the Podman folks. If your OC uses systemd that means your OC is dependent on systemd and ceph not. Nobody here is discussing OC specifics. > > 4. Ok found the container images[2] (I think). Sorry but this has > ‘nothing’ to do with container thinking. I expected to find container > images for osd, msd, rgw separately and smaller. This looks more like an > OS deployment. > > Bundling all the daemons together into one container is *genius*. Much > simpler to build and maintain, one artifact vs a bunch. I wouldn’t be > surprised if there are memory usage efficiencies too. 😃 what a non-sense. If building container images is a problem, do not even get involved with containers. > > 7. If you are not setting cpu and memory limits on your cephadm > containers, then again there is an argument why even use containers. > > This seems like a non-sequitor. As many have written, CPU and memory > limits aren’t the reason for containerizing Ceph daemons. If there are > other container applications where doing so makes sense, that’s fine for > those applications. Indeed, so now we have concluded, cephadm does not use container functionality. > I suspect that artificial CPU limiting of Ceph daemons would have a > negative impact on latency, peering/flapping, and slow requests. Ceph > is a distributed system, not a massively parallel one. OSDs already > have a memory target that can be managed natively, vs an external > mechanism that arbitrarily cuts them off at the knees when they need it > the most. That approach would be addressing the symptoms, not the root > cause. Having been through a multi-day outage that was substantially > worsened by the OOMkiller (*), I personally want nothing to do with > blind external mechanisms deciding that they know better than Ceph > daemons whether or not they should be running. If your availability and > performance needs favor rigidly defined areas of doubt and uncertainty, > that’s your own lookout. Agreed, no real need for a using containers. > > * A release that didn’t have OSD memory target setting yet. Having that > would have helped dramatically. > > > 8. I still see lots of comments on the mailing list about accessing > logs. I have all my containers log to a remote syslog server, if you > still have your ceph daemons that can not do this (correctly). What > point is it even going to containers. > > With all possible respect, that’s another non-sequitor, or at the very > least, an assumption that your needs are everyone’s needs. Centralized > logging makes sense in some contexts. But not always, and not to > everyone. Back around the Hammer or Jewel releases there was a bug > where logging to syslog resulted in daemon crashes. I haven’t tried it > with newer releases, but assume that’s long been fixed. > > I’m not a an [r]syslog[ng] expert by far, but I suspect that central- > only logging might not deal well with situations like an OSD spewing > entries when the debug subsystem level is elevated. Moreover, many of > the issues one sees with Ceph clusters are network related. So when > there’s a network problem, I want to rely on the network to be able to > see logs? I’ve seen syslog drop entries under load, something I > personally wouldn’t want for Ceph daemons. There are of course many > strategies between the extremes. So your arguing is, if it does not work in <5% let's not use it? > > 9. I am updating my small cluster something like this: > > I’m guessing you’ve never updated between major releases. That process > tends to have additional steps and nuances, which is one of the > compelling arguments in favor of orchestration: when it’s done well, > most operators don’t need to rev their own homebrew orchestration to set > the right flags at the right time, etc. But one of the great things > about OSS is that you have the flexibility to roll you own if you so > choose. > > > I am never going to run a ‘ceph orch upgrade start –ceph-version > 16.2.0’. I want to see if everything is ok after each command I issue. I > want to see if scrubbing stopped, I want to see if osd have correctly > accepted the new config. > > So you want to do all the things that an orchestrated rolling upgrade > does for you. Check. > > > I have a small cluster so I do not see this procedure as a waste of > time. If I look at your telemetry data[3]. I see 600 clusters with 35k > osd’s, that is an average of 60 osd per cluster. So these are quite > small clusters, I would think these admins have a similar point of view > as I have. > > Careful with those inferences. We are not operating here on some website cpanel project. Due to the size of these storage solutions one should expect lots of third party data is stored there. So I would argue you should not want to have 'I only know Kubernetes commands' sysadmins operating ceph. > > * Operators who submit telemetry may not be a representative sample > * Sites may have many more than one cluster, If one has a 20 OSD lab > cluster and a 1000 OSD production cluster, perspectives and processes > are going to be different than someone with a single 60 OSD cluster. > * Average != median Whatever, prove me wrong in that not the vast majority of clusters are small. Logics just dictates this. > > I am rather getting the impression you need to have an easy deployment > tool for ceph than you want to really utilize containers. First there > was this ceph-deploy and ceph-ansible which I luckily skipped both > > That’s more than a little harsh. A lot of people get a lot of value out > of those tools. > > The ceph daemons seem to be not prepared for container use, ceph > containers can’t use cpu/memory limits > > They don’t make julienne fries either. What of it? So the argument for cephadm to move to containers is non-sense. If there would be a true container aspiration, you would have to want to apply at least a few of the suggestions. > > And last but not least you totally bypass that the (ceph) admin should > choose the OC platform and not you, because he probably has more than > just ceph nodes. > > Nobody’s stopping you from rolling your own containers, using > traditional packages, or heck even deploying with tarballs. That’s the > beauty of OSS. Let’s leave Orange County out of it though. I agree, there should be no strong relationship between ceph and any OC. I tend to think that Kubernetes or other OC's should be responsible for offering a ceph implementation. > > So my question to you: What problem is it actually that your cephadm > dev team is trying to solve? That is not clear to me. > > Asked and answered, sir Could you write it to me like I am six year old? I do not get from your text what problem the cephadm team is trying to solve.

2 years, 10 months

1
0
0 0

how to set rgw parameters in Pacific

by Adrian Nicolae

Hi, I have some doubts regarding the best way to change some rgw parameters in Pacific. Let's say I want to change rgw_max_put_size and some rgw_gc default values like rgw_gc_max_concurrent_io. What is the recommended way to do it : - via 'ceph config set global' or - via 'ceph config set client.rgw' ? Thanks.

2 years, 10 months

2
1
0 0

Ceph Managers dieing?

by Peter Childs

Lets try to stop this message turning into a mass moaning session about Ceph and try and get this newbie able to use it. I've got a Ceph Octopus cluster, its relatively new and deployed using cephadm. It was working fine, but now the managers start up run for about 30 seconds and then die, until systemctl gives up and I have to reset-fail them to get them to try again, when they fail. How do I work out why and get them working again? I've got 21 nodes and was looking to take it up to 32 over the next few weeks, but that is going to be difficult if the managers are not working. I did try Pacific and I'm happy to upgrade but that failed to deploy more than 6 osd's and I gave up and went back to Octopus. I'm about to give up on Ceph because it looks like its really really "fragile" and debugging what's going wrong is really difficult. I guess I could give up on cephadm and go with a different provisioning method but I'm not sure where to start on that. Thanks in advance. Peter.

2 years, 10 months

4
4
0 0

Pulling Ceph Data Into Grafana

by Alcatraz

Hello All, I recently installed Ceph (v 16.2.4 pacific stable). I know Ceph creates and exposes two Prometheus instances (from what I've witnessed). To that end, I installed Grafana in a docker container, and am attempting to pull metrics from Ceph (Cluster Health, OSD information, etc), but I'm running into issues. If I setup a Prometheus data source to $ServerIP:9100 (Ceph internal metrics, CPU/RAM usage, etc), no issue arises. It automatically parses the URL to http://$ServerIP:9100/api/v1/query, which is expected. If I go to that URL and look at the /metrics endpoint, I can see the raw metrics. If I setup a Prometheus data source to $ServerIP:9283 (Ceph Cluster Health, OSD info, etc), Grafana produces an error in the Web UI stating "Unknown error during query transaction. Please check JS console logs.". Now, I decided to enable debug logging in Grafana, and checked the logs during setup of both the data source into Ceph internal metrics, and Ceph Cluster Health metrics. The only difference is that it shoots out a 301 status, which doesn't make sense, because the URLs are the same, as are the /metrics endpoints. Log Excerpts are below: *URL pointing to Ceph Node Metrics:* t=2021-06-17T16:09:08+0000 lvl=dbug msg="Received command to update data source" logger=datasources url=http://$ServerIP:9100/ t=2021-06-17T16:09:08+0000 lvl=dbug msg="Applying default URL parsing for this data source type" logger=datasource type=prometheus url=http://$ServerIP:9100/ t=2021-06-17T16:09:08+0000 lvl=dbug msg="Querying for data source via SQL store" logger=datasources id=1 orgId=1 t=2021-06-17T16:09:08+0000 lvl=dbug msg="Applying default URL parsing for this data source type" logger=datasource type=prometheus url=http://$ServerIP:9100/ t=2021-06-17T16:09:08+0000 lvl=info msg=Requesting logger=data-proxy-log url=http://172.16.168.3:9100/api/v1/query *URL Pointing to Ceph Cluster Metrics:* sg="Received command to update data source" logger=datasources url=http://$ServerIP:9283 t=2021-06-17T16:06:32+0000 lvl=dbug msg="Applying default URL parsing for this data source type" logger=datasource type=prometheus url=http://$ServerIP:9283 t=2021-06-17T16:06:33+0000 lvl=dbug msg="Querying for data source via SQL store" logger=datasources id=1 orgId=1 t=2021-06-17T16:06:33+0000 lvl=dbug msg="Applying default URL parsing for this data source type" logger=datasource type=prometheus url=http://$ServerIP:9283 t=2021-06-17T16:06:33+0000 lvl=info msg=Requesting logger=data-proxy-log url=http://$ServerIP:9283/api/v1/query t=2021-06-17T16:06:33+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=POST path=/api/datasources/proxy/1/api/v1/query status=301 remote_addr=$LANIP time_ms=3 size=131 referer=http://$ServerIP:3001/datasources/edit/yh9QZAg7k I've ran out of ideas to sort this, because I'd really like to get Grafana to show Ceph cluster metrics. Thanks, Preston

2 years, 10 months

1
0
0 0

Re: radosgw - Etags suffixed with #x0e

by André Cruz

Hello Ingo. Did the problem actually went away after you upgraded everything to Nautilus? I’m seeing the same issue in a Luminous cluster where a Nautilus node was introduced (with the intent of upgrading the whole cluster to Nautilus). When the problem happened we had: Mons, Mgr - Nautilus OSDs, rgw - Most on Luminous, 1 on Nautilus Afterwards the Nautilus RGW was disabled, but still we left the Nautilus OSDs, and the problem has never happened again. There’s also this issue which seems related, but which implies that it can also happened on an all Nautilus cluster: https://tracker.ceph.com/issues/47451 <https://tracker.ceph.com/issues/47451> Best regards, André

2 years, 10 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021