Hi,
In my multisite setup 1 big bucket has been deleted and seems like hasn't been cleaned up on one of the secondary site.
Is it safe to delete the 11 shard objects from the index pool which holding the omaps of that bucket files?
Also a quick question, is it a problem if we use like this?
Create a bucket which means create in all dc
Don't create any bucket sync the user upload different files in different dcs.
When a bucket deletion happens would this usage behavior cause issue that different files are in the buckets?
If yes, how to prevent this?
Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com>
---------------------------------------------------
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi Michael,
On 08.06.21 11:38, Ml Ml wrote:
> Now i was asked if i could also build a cheap 200-500TB Cluster
> Storage, which should also scale. Just for Data Storage such as
> NextCloud/OwnCloud.
With similar requirements (server primarily for Samba and NextCloud,
some RBD use, very limited budget) I am using HDD for data and SSD for
system and CephFS metadata.
Note that I am running NextCloud on CephFS storage. If you want to go
with RGW/S3 as a storage backend instead, the following might not apply
to your use case.
My nodes (bought end of 2020) are:
- 2U chassis with 12 3.5" SATA slots
- Intel Xeon Silver 4208
- 128 GB RAM
- 2 x 480 GB Samsung PM883 SSD
-> 50 GB in MD-RAID1 for system
-> 430 GB OSD (one per SSD)
- initially 6 x 14 TB Enterprise HDD
- 4 x 10 GBase-T (active/passive bonded, dedicated backend network)
Each node with this configuration cost about 4k EUR net at the end of
2020. Due to increasing prices for storage, it will be a bit more
expensive now. I am running five nodes now and have added a few more
disks (ranging 8-14 TB), nearly filling up the nodes.
My experience so far:
- I had to throttle scrubbing (see below for details)
- For purely NextCloud and Samba performance is sufficient for a few
hundred concurrent users with a handful of power users
- Migration of the mail server to this cluster was a disaster due to
limited IOPS, had to add some more SSDs and place the mail server in an
SSD-only pool.
- MDS needs a lot of memory for larger CephFS installs, I will move it
to a dedicated server probably next year. 128 GB per node works but I
would not recommend any less.
- Rebalancing takes an eternity (2-3 weeks), so make sure that your PG
nums are okay from the start
- I have all but given up on snapshots with CephFS due to severe
performance degradation with kernel client during backup
My scrubbing config looks like this:
osd_backfill_scan_max 16
osd_backfill_scan_min 4
osd_deep_scrub_interval 2592000.000000
osd_deep_scrub_randomize_ratio 0.030000
osd_recovery_max_active_hdd 1
osd_recovery_max_active_ssd 5
osd_recovery_sleep_hdd 0.050000
osd_scrub_begin_hour 18
osd_scrub_end_hour 7
osd_scrub_chunk_max 1
osd_scrub_chunk_min 1
osd_scrub_max_interval 2419200.000000
osd_scrub_min_interval 172800.000000
osd_scrub_sleep 0.100000
My data is in a replicated pool with n=3 without compression. You might
also consider EC and then want to aim for more nodes.
Cheers
Sebastian
Hello List,
i used to build 3 Node Cluster with spinning Rust and later with
(Enterprise) SSDs.
All i did was to buy a 19" Server with 10/12 Slots, plug in the Disks
and i was done.
The Requirements were just 10/15TB Disk usage (30-45TB Raw).
Now i was asked if i could also build a cheap 200-500TB Cluster
Storage, which should also scale. Just for Data Storage such as
NextCloud/OwnCloud.
Buying 3x 24 Slot Server with 8TB Enterprise SSDs ends up at about 3x
45k EUR = 135k EUR.
Where the SSDs are 90% of the price. (about 1.700EUR per 8TB SSD)
How do the "big boys" do this? Just throw money at it?
Would a mix of OSD SSD Metadata + Spinning Rust do the job?
My experience so far is that each time i had a crash/problem it was
always such a pain to wait for the spinning rust.
Do you have any experience/hints on this?
Maybe combine 3x 10TB HDDs to a 30TB Raid0/striping Disk => which
would speed up the performance, but have a bigger impact on a dying
disk.
My requirements are more or less low IO Traffic but loads of disk space usage.
Any hints/ideas/links are welcome.
Cheers,
Michael
ceph: 14.2.x
kernel: 4.15
In cephfs, due to the need for cache consistency, When a client is
executing buffer IO, another client will hang when reading and writing the
same file
It seems that lazyio can solve this problem, lazyio allows multiple clients
to execute buffer IO at the same time(relax consistency), But I am not sure
how to enable lazyio under the kernel mount, the test found that the
"client_force_lazyio" parameter does not work
My final requirement is to use lazyio to implement multiple clients to read
and write the same file(buffer IO mode)
Can someone explain how to enable lazyio under kcephfs, thanks
In an attempt to troubleshoot why only 2/5 mon services were running, I believe I’ve broke something:
[ceph: root@cn01 /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager 1/1 81s ago 9d count:1
crash 6/6 7m ago 9d *
grafana 1/1 80s ago 9d count:1
mds.testfs 2/2 81s ago 9d cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2
mgr 2/2 81s ago 9d count:2
mon 2/5 81s ago 9d count:5
node-exporter 6/6 7m ago 9d *
osd.all-available-devices 20/26 7m ago 9d *
osd.unmanaged 7/7 7m ago - <unmanaged>
prometheus 2/2 80s ago 9d count:2
I tried to stop and start the mon service, but now the cluster is pretty much unresponsive, I’m assuming because I stopped mon:
[ceph: root@cn01 /]# ceph orch stop mon
Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp'
Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp'
Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp'
Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp'
Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp'
[ceph: root@cn01 /]# ceph orch start mon
^CCluster connection aborted
Now even after a reboot of the cluster, it’s unresponsive. How do I get mon started again?
I’m going through Ceph and breaking things left and right, so I apologize for all the questions. I learn best from breaking things and figuring out how to resolve the issues.
Thank you
-jeremy
I’m seeing this in my health status:
progress:
Global Recovery Event (13h)
[............................] (remaining: 5w)
I’m not sure how this was initiated but this is a cluster with almost zero objects. Is there a way to halt this process? Why would it estimate 5 weeks to recover a cluster with almost zero data?
[ceph: root@cn01 /]# ceph -s -w
cluster:
id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
health: HEALTH_OK
services:
mon: 2 daemons, quorum cn02,cn05 (age 13h)
mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: cn02.arszct
mds: 1/1 daemons up, 1 standby
osd: 27 osds: 27 up (since 13h), 27 in (since 16h)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 22.09k objects, 86 GiB
usage: 261 GiB used, 98 TiB / 98 TiB avail
pgs: 65 active+clean
progress:
Global Recovery Event (13h)
[............................] (remaining: 5w)
Thanks
-jeremy
Hello,
Nautilus 14.2.16
I had an OSD go bad about 10 days ago. Apparently as it was going down
some MDS ops got hung up waiting for it to come back. I was out of town
for a couple days and found the OSD 'Down and Out' when I checked in.
(Also, oddly, the cluster did not appear to initiate recovery right away -
it took until I rebooted the OSD node.)
As of right now, the damaged OSD is 'safe-to-destroy' but the slow ops are
still hanging around. Earlier today I quiesced the clients that were
accessing the CephFS, then unmounted and re-mounted it. However, this did
not clear the lingering ops.
When I had the node offline I verified that the HDD and NVMe associated
with the OSD seem to actually be healthy, so I plan to zap and re-deploy
using the same hardware. I would also like to upgrade to 14.2.20 (latest
Ceph for debian 10), but I'm hesitant to do any of this until I get rid of
these 29 slow ops.
Can anybody suggest a path forward?
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Hi,
Is there a way to connect from my nautilus ceph setup the pool that I created in ceph to proxmox? Or need a totally different ceph install?
Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com>
---------------------------------------------------
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Do you use rbd images in containers that are residing on osd nodes? Does this give any problems? I used to have kernel mounted cephfs on a osd node, after a specific luminous release this was giving me problems.
> -----Original Message-----
> From: Eneko Lacunza <elacunza(a)binovo.es>
> Sent: Friday, 4 June 2021 15:49
> To: ceph-users(a)ceph.io
> Subject: *****SPAM***** [ceph-users] Re: Why you might want packages
> not containers for Ceph deployments
>
> Hi,
>
> We operate a few Ceph hyperconverged clusters with Proxmox, that
> provides a custom ceph package repository. They do a great work; and
> deployment is a brezee.
>
> So, even as currently we would rely on Proxmox packages/distribution and
> not upstream, we have a number of other projects deployed with
> containers and we even distribute some of our own development in deb and
> container packages, so I will comment on our view:
>
> El 2/6/21 a las 23:26, Oliver Freyermuth escribió:
> [...]
> >
> > If I operate services in containers built by developers, of course
> > this ensures the setup works, and dependencies are well tested, and
> > even upgrades work well — but it also means that,
> > at the end of the day, if I run 50 services in 50 different containers
> > from 50 different upstreams, I'll have up to 50 different versions of
> > OpenSSL floating around my production servers.
> > If a security issue is found in any of the packages used in all the
> > container images, I now need to trust the security teams of all the 50
> > developer groups building these containers
> > (and most FOSS projects won't have the ressources, understandably...),
> > instead of the one security team of the disto I use. And then, I also
> > have to re-pull all these containers, after finding out that a
> > security fix has become available.
> > Or I need to build all these containers myself, and effectively take
> > over the complete job, and have my own security team.
> >
> > This may scale somewhat well, if you have a team of 50 people, and
> > every person takes care of one service. Containers are often your
> > friend in this case[1],
> > since it allows to isolate the different responsibilities along with
> > the service.
> >
> > But this is rarely the case outside of industry, and especially not in
> > academics.
> > So the approach we chose for us is to have one common OS everywhere,
> > and automate all of our deployment and configuration management with
> > Puppet.
> > Of course, that puts is in one of the many corners out there, but it
> > scales extremely well to all services we operate,
> > and I can still trust the distro maintainers to keep the base OS safe
> > on all our servers, automate reboots etc.
> >
> > For Ceph, we've actually seen questions about security issues already
> > on the list[0] (never answered AFAICT).
>
> These are the two main issues I find with containers really:
>
> - Keeping hosts uptodate is more complex (apt-get update+apt-get
> dist-upgrade and also some kind of docker pull+docker
> restart/docker-compose up ...). Much of the time the second part is not
> standard (just deployed a Harbor service, upgrade is quite simple but I
> have to know how to do it as it's speciffic, maintenance would be much
> easier if it was packaged in Debian). I won't say it's more difficult,
> but it will be more diverse and complex.
>
> - Container image quality and security support quality, that will vary
> from upstream to upstream. You have to research each of them to know
> were they stand. A distro (specially a good one like Debian, Ubuntu,
> RHEL or SUSE) has known, quality security support for the repositories.
> They will even fix issues not fixed by upstream (o backport them to
> distro's version...). This is more an upstream vs distro issue, really.
>
> About debugging issues reported with Ceph containers, I think those are
> things waiting for a fix: why are logs writen in container image (or an
> ephemeral volume, I don't know really how is that done right now)
> instead of an external name volume o a local mapped dir in /var/log/ceph ?
>
> All that said, I think that it makes sense for an upstream project like
> Ceph, to distribute container images, as it is the most generic way to
> distribute (you can deploy on any system/distro supporting container
> images) and eases development. But only distributing container images
> could make more users depend on third party distribution (global or
> specific distros), which would delay feeback/bugreport to upstream.
>
> Cheers and thanks for the great work!
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello. I have a erasure pool and I didn't turn on compression at the beginning.
Now I'm writing new type of very small data and overhead is becoming an issue.
I'm thinking to turn on compression on the pool but in most
filesystems it will effect only the new data. What is the behavior in
ceph?