May 2021 - ceph-users - lists.ceph.io

by Andres Rojas Guerrero

Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that when some OSD go down the cluster doesn't start recover. I have checked that the option noout is unset. What could be the reason for this behavior? -- ******************************************************* Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo Superior de Investigaciones Científicas (CSIC) Pinar 19 28006 - Madrid Tel: +34 915680059 -- Ext. 990059 email: a.rojas(a)csic.es ID comunicate.csic.es: @50852720l:matrix.csic.es *******************************************************

2 years, 11 months

7
27
0 0

Slow performance and many slow ops

by codignotto

Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72 OSD, I am using CEPH Octopos in its latest version, the deployment was done using ceph admin and containers according to the dosing, we are having some problems with performance of the cluster, I mount it on a proxmox cluster and on windows VMs I have the problem of the disks being 100% occupied with a simple browser opening, when I switch to another NFS storage for example everything goes back to normal, I have the CEPH cluster now mounted and with only 1 VM inside it, and we have the problem of slowness and slow ops, the network speed between the hosts in the cluster is 25Gb tested with iperf, between ceph and proxmox is 25Gb per host, someone already passed that? Many Tks

2 years, 11 months

2
2
0 0

v16.2.3 Pacific released

by David Galloway

This is the third backport release in the Pacific series. We recommend all users update to this release. Notable Changes --------------- * This release fixes a cephadm upgrade bug that caused some systems to get stuck in a loop restarting the first mgr daemon. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-16.2.3.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 381b476cb3900f9a92eb95d03b4850b953cfd79a

2 years, 11 months

1
0
0 0

How to find out why osd crashed with cephadm/podman containers?

by mabi

Hello, I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with cephadm and I added a second OSD to one of my 3 OSD nodes. I started then copying data to my ceph fs mounted with kernel mount but then both OSDs on that specific nodes crashed. To this topic I have the following questions: 1) How can I find out why the two OSD crashed? because everything is in podman containers I don't know where are the logs to find out the reason why this happened. From the OS itself everything looks ok, there was no out of memory error. 2) I would assume the two OSD container would restart on their own but this is not the case it looks like. How can I restart manually these 2 OSD containers on that node? I believe this should be a "cephadm orch" command? The health of the cluster right now is: CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded (33.333%), 65 pgs degraded, 65 pgs undersized Thank your for your hints. Best regards, Mabi

2 years, 11 months

2
2
0 0

Upgrade problem with cephadm

by fcid

Hello ceph community, I'm trying to upgrade a pacific (v16.2.0) cluster to the last version, but the upgrading process seems to be stuck. The mgr log (debug level) does not show any significant message regarding the upgrade, other than when it is started/paused/resumed/stopped. 2021-05-06T14:29:59.294725+0000 mgr.hostc.riclju (mgr.3935983) 35645 : cephadm [INF] Upgrade: Started with target docker.io/ceph/ceph:v16.2.2 2021-05-06T14:49:55.710023+0000 mgr.hostc.riclju (mgr.3935983) 36285 : cephadm [INF] Paused 2021-05-06T14:50:24.444742+0000 mgr.hostc.riclju (mgr.3935983) 36302 : cephadm [INF] Resumed 2021-05-06T14:51:36.888269+0000 mgr.hostc.riclju (mgr.3935983) 36349 : cephadm [INF] Upgrade: Paused upgrade to docker.io/ceph/ceph:v16.2.2 2021-05-06T14:51:50.411779+0000 mgr.hostc.riclju (mgr.3935983) 36357 : cephadm [INF] Upgrade: Resumed upgrade to docker.io/ceph/ceph:v16.2.2 2021-05-06T14:52:01.660682+0000 mgr.hostc.riclju (mgr.3935983) 36365 : cephadm [INF] Upgrade: Stopped It may be worth mentioning that last week I had trouble trying to deploy RGWs. It was not possible to deploy de RGWs using this command ceph orch apply rgw orbyta --realm=realma --zone=zonea --placement="2" So the following were used ceph orch daemon add rgw zonea --placement hostb ceph orch daemon add rgw zonea --placement hosta After those commands were issued the orchestrator would still not deploy the RGWs, unless the current MGR failed over to another standby MGR. After that, the RGWs where depoyed. Another problem I have is the refresh parameter of the orchestrator. The last time the daemons listed in ceph orch ps where refreshed is the last time a MGR was set to failed, and issuing ceph orch ps --refresh does not seem to update It looks like all those symptoms are related somehow, but I don't know how to dig further into the internals of the orchestrator to get more information. I greatly appreciate if you can point me in the right direction. Thank you, kind regards. -- AltaVoz <https://www.altavoz.net/> Fernando Cid Ingeniero de Operaciones www.altavoz.net <https://www.altavoz.net/> Ubicación AltaVoz Viña del Mar: 2 Poniente 355 of 53 <https://www.altavoz.net/altavoz/contacto> | +56 32 276 8060 <tel:+56322768060> Santiago: Antonio Bellet 292 of 701 <https://www.altavoz.net/altavoz/contacto> | +56 2 2585 4264 <tel:+562225854264>

2 years, 11 months

1
0
0 0

Out of Memory after Upgrading to Nautilus

by Christoph Adomeit

I manage a historical cluster of severak ceph nodes with each 128 GB Ram and 36 OSD each 8 TB size. The cluster ist just for archive purpose and performance is not so important. The cluster was running fine for long time using ceph luminous. Last week I updated it to Debian 10 and Ceph Nautilus. Now I can see that the memory usage of each osd grows slowly to 4 GB each and once the system has no memory left it will oom-kill processes I have already configured osd_memory_target = 1073741824 . This helps for some hours but then memory usage will grow from 1 GB to 4 GB per OSD. Any ideas what I can do to further limit osd memory usage ? It would be good to keep the hardware running some more time without upgrading RAM on all OSD machines. Any Ideas ? Thanks Christoph

2 years, 11 months

5
6
0 0

RGW Beast SSL version

by Glen Baars

Hello Ceph, Can you set the SSL min version? Such as TLS1.2? Glen This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately.

2 years, 11 months

1
0
0 0

Re: How to set bluestore_rocksdb_options_annex

by ceph＠elchaka.de

Hello Anthony, it was introduced in octopus 15.2.10 See: https://docs.ceph.com/en/latest/releases/octopus/ Do you know how you would set it in pacific? :) Guess, there shouldnt be much difference... Thank you Mehmet Am 28. April 2021 19:21:19 MESZ schrieb Anthony D'Atri <anthony.datri(a)gmail.com>: >I think that’s new with Pacific. > >> On Apr 28, 2021, at 1:26 AM, ceph(a)elchaka.de wrote: >> >> >> >> Hello, >> >> I have an octopus cluster and want to change some values - but i >cannot find any documentation on how to set values(multiple) with >> >> bluestore_rocksdb_options_annex >> >> Could someone give me some examples. >> I would like to do this like ceph config set ... >> >> Thanks in advice >> Mehmet >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

2
5
0 0

pgremapper released

by Josh Baergen

Hello all, I just wanted to let you know that DigitalOcean has open-sourced a tool we've developed called pgremapper. Originally inspired by CERN's upmap exception table manipulation scripts, pgremapper is a CLI written in Go which exposes a number of upmap-based algorithms for backfill-related usecases: Canceling backfill (like CERN's upmap-remapped.py, but with some extra tricks up its sleeve), draining PGs off of an OSD, undoing upmaps in a controlled and concurrent manor, and more. If you're interested, please read the details in the repo's README: https://github.com/digitalocean/pgremapper Josh

2 years, 11 months

2
1
0 0

Call For Submissions IO500 ISC21 List

by IO500 Committee

https://io500.org/cfs Stabilization Period: 05 - 14 May 2021 AoE Submission Deadline: 11 June 2021 AoE The IO500 is now accepting and encouraging submissions for the upcoming 8th IO500 list. Once again, we are also accepting submissions to the 10 Node Challenge to encourage the submission of small scale results. The new ranked lists will be announced via live-stream at a virtual session. We hope to see many new results. What's New Starting with ISC'21, the IO500 now follows a two-staged approach. First, there will be a two-week stabilization period during which we encourage the community to verify that the benchmark runs properly. During this period the benchmark will be updated based upon feedback from the community. The final benchmark will then be released on Monday, May 1st. We expect that runs compliant with the rules made during the stabilization period are valid as the final submission unless a significant defect is found. We are now creating a more detailed schema to describe the hardware and software of the system under test and provide the first set of tools to ease capturing of this information for inclusion with the submission. Further details will be released on the submission page. Background The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please note that submissions of all sizes are welcome; the site has customizable sorting, so it is possible to submit on a small system and still get a very good per-client score, for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017, published its first list at SC17, and has grown exponentially since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: Maximizing simplicity in running the benchmark suite Encouraging optimization and documentation of tuning parameters for performance Allowing submitters to highlight their "hero run" performance numbers Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that has historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: Gather historical data for the sake of analysis and to aid predictions of storage futures Collect tuning information to share valuable performance optimizations across the community Encourage vendors and designers to optimize for workloads beyond "hero runs" Establish bounded expectations for users, procurers, and administrators 10 Node I/O Challenge The 10 Node Challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly 10 client nodes must be used to run the benchmark. You may use any shared storage with, e.g., any number of servers. When submitting for the IO500 list, you can opt-in for "Participate in the 10 compute node challenge only", then we will not include the results into the ranked list. Other 10-node node submissions will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at io500.org. Birds-of-a-Feather Once again, we encourage you to submit to join our community, and to attend our virtual BoF "The IO500 and the Virtual Institute of I/O" at ISC 2021, (time to be announced), where we will announce the new IO500 and 10 node challenge lists. The current list includes results from BeeGFS, CephFS, DAOS, DataWarp, GekkoFS, GFarm, IME, Lustre, MadFS, Qumulo, Spectrum Scale, Vast, WekaIO, and YRCloudFile. We hope that the upcoming list grows even more. -- The IO500 Committee

2 years, 11 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021