June 2021 - ceph-users - lists.ceph.io

Index pool hasn't been cleaned up and caused large omap, safe to delete the index file?

by Szabo, Istvan (Agoda)

Hi, In my multisite setup 1 big bucket has been deleted and seems like hasn't been cleaned up on one of the secondary site. Is it safe to delete the 11 shard objects from the index pool which holding the omaps of that bucket files? Also a quick question, is it a problem if we use like this? Create a bucket which means create in all dc Don't create any bucket sync the user upload different files in different dcs. When a bucket deletion happens would this usage behavior cause issue that different files are in the buckets? If yes, how to prevent this? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com> --------------------------------------------------- ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 10 months

1
1
0 0

Re: OT: How to Build a poor man's storage with ceph

by Sebastian Knust

Hi Michael, On 08.06.21 11:38, Ml Ml wrote: > Now i was asked if i could also build a cheap 200-500TB Cluster > Storage, which should also scale. Just for Data Storage such as > NextCloud/OwnCloud. With similar requirements (server primarily for Samba and NextCloud, some RBD use, very limited budget) I am using HDD for data and SSD for system and CephFS metadata. Note that I am running NextCloud on CephFS storage. If you want to go with RGW/S3 as a storage backend instead, the following might not apply to your use case. My nodes (bought end of 2020) are: - 2U chassis with 12 3.5" SATA slots - Intel Xeon Silver 4208 - 128 GB RAM - 2 x 480 GB Samsung PM883 SSD -> 50 GB in MD-RAID1 for system -> 430 GB OSD (one per SSD) - initially 6 x 14 TB Enterprise HDD - 4 x 10 GBase-T (active/passive bonded, dedicated backend network) Each node with this configuration cost about 4k EUR net at the end of 2020. Due to increasing prices for storage, it will be a bit more expensive now. I am running five nodes now and have added a few more disks (ranging 8-14 TB), nearly filling up the nodes. My experience so far: - I had to throttle scrubbing (see below for details) - For purely NextCloud and Samba performance is sufficient for a few hundred concurrent users with a handful of power users - Migration of the mail server to this cluster was a disaster due to limited IOPS, had to add some more SSDs and place the mail server in an SSD-only pool. - MDS needs a lot of memory for larger CephFS installs, I will move it to a dedicated server probably next year. 128 GB per node works but I would not recommend any less. - Rebalancing takes an eternity (2-3 weeks), so make sure that your PG nums are okay from the start - I have all but given up on snapshots with CephFS due to severe performance degradation with kernel client during backup My scrubbing config looks like this: osd_backfill_scan_max 16 osd_backfill_scan_min 4 osd_deep_scrub_interval 2592000.000000 osd_deep_scrub_randomize_ratio 0.030000 osd_recovery_max_active_hdd 1 osd_recovery_max_active_ssd 5 osd_recovery_sleep_hdd 0.050000 osd_scrub_begin_hour 18 osd_scrub_end_hour 7 osd_scrub_chunk_max 1 osd_scrub_chunk_min 1 osd_scrub_max_interval 2419200.000000 osd_scrub_min_interval 172800.000000 osd_scrub_sleep 0.100000 My data is in a replicated pool with n=3 without compression. You might also consider EC and then want to aim for more nodes. Cheers Sebastian

2 years, 10 months

1
0
0 0

OT: How to Build a poor man's storage with ceph

by Ml Ml

Hello List, i used to build 3 Node Cluster with spinning Rust and later with (Enterprise) SSDs. All i did was to buy a 19" Server with 10/12 Slots, plug in the Disks and i was done. The Requirements were just 10/15TB Disk usage (30-45TB Raw). Now i was asked if i could also build a cheap 200-500TB Cluster Storage, which should also scale. Just for Data Storage such as NextCloud/OwnCloud. Buying 3x 24 Slot Server with 8TB Enterprise SSDs ends up at about 3x 45k EUR = 135k EUR. Where the SSDs are 90% of the price. (about 1.700EUR per 8TB SSD) How do the "big boys" do this? Just throw money at it? Would a mix of OSD SSD Metadata + Spinning Rust do the job? My experience so far is that each time i had a crash/problem it was always such a pain to wait for the spinning rust. Do you have any experience/hints on this? Maybe combine 3x 10TB HDDs to a 30TB Raid0/striping Disk => which would speed up the performance, but have a bigger impact on a dying disk. My requirements are more or less low IO Traffic but loads of disk space usage. Any hints/ideas/links are welcome. Cheers, Michael

2 years, 10 months

4
3
0 0

How to enable lazyio under kcephfs?

by opengers

ceph: 14.2.x kernel: 4.15 In cephfs, due to the need for cache consistency, When a client is executing buffer IO, another client will hang when reading and writing the same file It seems that lazyio can solve this problem, lazyio allows multiple clients to execute buffer IO at the same time(relax consistency), But I am not sure how to enable lazyio under the kernel mount, the test found that the "client_force_lazyio" parameter does not work My final requirement is to use lazyio to implement multiple clients to read and write the same file(buffer IO mode) Can someone explain how to enable lazyio under kcephfs, thanks

2 years, 10 months

2
1
0 0

Only 2/5 mon services running

by Jeremy Hansen

In an attempt to troubleshoot why only 2/5 mon services were running, I believe I’ve broke something: [ceph: root@cn01 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager 1/1 81s ago 9d count:1 crash 6/6 7m ago 9d * grafana 1/1 80s ago 9d count:1 mds.testfs 2/2 81s ago 9d cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2 mgr 2/2 81s ago 9d count:2 mon 2/5 81s ago 9d count:5 node-exporter 6/6 7m ago 9d * osd.all-available-devices 20/26 7m ago 9d * osd.unmanaged 7/7 7m ago - <unmanaged> prometheus 2/2 80s ago 9d count:2 I tried to stop and start the mon service, but now the cluster is pretty much unresponsive, I’m assuming because I stopped mon: [ceph: root@cn01 /]# ceph orch stop mon Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp' Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp' Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp' Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp' Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp' [ceph: root@cn01 /]# ceph orch start mon ^CCluster connection aborted Now even after a reboot of the cluster, it’s unresponsive. How do I get mon started again? I’m going through Ceph and breaking things left and right, so I apologize for all the questions. I learn best from breaking things and figuring out how to resolve the issues. Thank you -jeremy

2 years, 10 months

1
1
0 0

Global Recovery Event

by Jeremy Hansen

I’m seeing this in my health status: progress: Global Recovery Event (13h) [............................] (remaining: 5w) I’m not sure how this was initiated but this is a cluster with almost zero objects. Is there a way to halt this process? Why would it estimate 5 weeks to recover a cluster with almost zero data? [ceph: root@cn01 /]# ceph -s -w cluster: id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d health: HEALTH_OK services: mon: 2 daemons, quorum cn02,cn05 (age 13h) mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: cn02.arszct mds: 1/1 daemons up, 1 standby osd: 27 osds: 27 up (since 13h), 27 in (since 16h) data: volumes: 1/1 healthy pools: 3 pools, 65 pgs objects: 22.09k objects, 86 GiB usage: 261 GiB used, 98 TiB / 98 TiB avail pgs: 65 active+clean progress: Global Recovery Event (13h) [............................] (remaining: 5w) Thanks -jeremy

2 years, 10 months

2
2
0 0

Failed OSD has 29 Slow MDS Ops.

by Dave Hall

Hello, Nautilus 14.2.16 I had an OSD go bad about 10 days ago. Apparently as it was going down some MDS ops got hung up waiting for it to come back. I was out of town for a couple days and found the OSD 'Down and Out' when I checked in. (Also, oddly, the cluster did not appear to initiate recovery right away - it took until I rebooted the OSD node.) As of right now, the damaged OSD is 'safe-to-destroy' but the slow ops are still hanging around. Earlier today I quiesced the clients that were accessing the CephFS, then unmounted and re-mounted it. However, this did not clear the lingering ops. When I had the node offline I verified that the HDD and NVMe associated with the OSD seem to actually be healthy, so I plan to zap and re-deploy using the same hardware. I would also like to upgrade to 14.2.20 (latest Ceph for debian 10), but I'm hesitant to do any of this until I get rid of these 29 slow ops. Can anybody suggest a path forward? Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

2 years, 10 months

1
0
0 0

Connect ceph to proxmox

by Szabo, Istvan (Agoda)

Hi, Is there a way to connect from my nautilus ceph setup the pool that I created in ceph to proxmox? Or need a totally different ceph install? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com> --------------------------------------------------- ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 10 months

4
5
0 0

Re: Why you might want packages not containers for Ceph deployments

by Marc

Do you use rbd images in containers that are residing on osd nodes? Does this give any problems? I used to have kernel mounted cephfs on a osd node, after a specific luminous release this was giving me problems. > -----Original Message----- > From: Eneko Lacunza <elacunza(a)binovo.es> > Sent: Friday, 4 June 2021 15:49 > To: ceph-users(a)ceph.io > Subject: *****SPAM***** [ceph-users] Re: Why you might want packages > not containers for Ceph deployments > > Hi, > > We operate a few Ceph hyperconverged clusters with Proxmox, that > provides a custom ceph package repository. They do a great work; and > deployment is a brezee. > > So, even as currently we would rely on Proxmox packages/distribution and > not upstream, we have a number of other projects deployed with > containers and we even distribute some of our own development in deb and > container packages, so I will comment on our view: > > El 2/6/21 a las 23:26, Oliver Freyermuth escribió: > [...] > > > > If I operate services in containers built by developers, of course > > this ensures the setup works, and dependencies are well tested, and > > even upgrades work well — but it also means that, > > at the end of the day, if I run 50 services in 50 different containers > > from 50 different upstreams, I'll have up to 50 different versions of > > OpenSSL floating around my production servers. > > If a security issue is found in any of the packages used in all the > > container images, I now need to trust the security teams of all the 50 > > developer groups building these containers > > (and most FOSS projects won't have the ressources, understandably...), > > instead of the one security team of the disto I use. And then, I also > > have to re-pull all these containers, after finding out that a > > security fix has become available. > > Or I need to build all these containers myself, and effectively take > > over the complete job, and have my own security team. > > > > This may scale somewhat well, if you have a team of 50 people, and > > every person takes care of one service. Containers are often your > > friend in this case[1], > > since it allows to isolate the different responsibilities along with > > the service. > > > > But this is rarely the case outside of industry, and especially not in > > academics. > > So the approach we chose for us is to have one common OS everywhere, > > and automate all of our deployment and configuration management with > > Puppet. > > Of course, that puts is in one of the many corners out there, but it > > scales extremely well to all services we operate, > > and I can still trust the distro maintainers to keep the base OS safe > > on all our servers, automate reboots etc. > > > > For Ceph, we've actually seen questions about security issues already > > on the list[0] (never answered AFAICT). > > These are the two main issues I find with containers really: > > - Keeping hosts uptodate is more complex (apt-get update+apt-get > dist-upgrade and also some kind of docker pull+docker > restart/docker-compose up ...). Much of the time the second part is not > standard (just deployed a Harbor service, upgrade is quite simple but I > have to know how to do it as it's speciffic, maintenance would be much > easier if it was packaged in Debian). I won't say it's more difficult, > but it will be more diverse and complex. > > - Container image quality and security support quality, that will vary > from upstream to upstream. You have to research each of them to know > were they stand. A distro (specially a good one like Debian, Ubuntu, > RHEL or SUSE) has known, quality security support for the repositories. > They will even fix issues not fixed by upstream (o backport them to > distro's version...). This is more an upstream vs distro issue, really. > > About debugging issues reported with Ceph containers, I think those are > things waiting for a fix: why are logs writen in container image (or an > ephemeral volume, I don't know really how is that done right now) > instead of an external name volume o a local mapped dir in /var/log/ceph ? > > All that said, I think that it makes sense for an upstream project like > Ceph, to distribute container images, as it is the most generic way to > distribute (you can deploy on any system/distro supporting container > images) and eases development. But only distributing container images > could make more users depend on third party distribution (global or > specific distros), which would delay feeback/bugreport to upstream. > > Cheers and thanks for the great work! > > Eneko Lacunza > Zuzendari teknikoa | Director técnico > Binovo IT Human Project > > Tel. +34 943 569 206 | https://www.binovo.es > Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun > > https://www.youtube.com/user/CANALBINOVO > https://www.linkedin.com/company/37269706/ > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 10 months

3
2
0 0

Turning on "compression_algorithm" old pool with 500TB usage

by mhnx

Hello. I have a erasure pool and I didn't turn on compression at the beginning. Now I'm writing new type of very small data and overhead is becoming an issue. I'm thinking to turn on compression on the pool but in most filesystems it will effect only the new data. What is the behavior in ceph?

2 years, 10 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021