The question was posed, "What if we want to backup our RGW data to
tape?" Anyone doing this? Any suggestions? We could probably just
catch any PUT requests and queue them to be written to tape. Our
dataset is so large, that traditional backup solutions don't seem
feasible (GFS), so probably a single copy (or two copies on different
tapes at the same time) when the object is created.
Bonus points for being near-line.
Thanks,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Hey all,
Yesterday our cluster went in to HEALTH_WARN due to 1 large omap
object in the .usage pool (I've posted about this in the past). Last
time we resolved the issue by trimming the usage log below the alert
threshold but this time it seems like the alert wont clear even after
trimming and (this time) disabling the usage log entirely.
ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.usage'
Search the cluster log for 'Large omap object found' for more details.
I've bounced ceph-mon, ceph-mgr, radosgw and even issued osd scrub on
the two osd's that hold pg's for the .usage pool but the alert wont
clear.
It's been over 24 hours since I trimmed the usage log.
Any suggestions?
Jared Baker
Cloud Architect, OICR
Hi,
We are looking for a way to set timeout on requests to rados gateway. If a
request takes too long time, just kill it.
1. Is there a command that can set the timeout?
2. This parameter looks interesting. Can I know what the "open threads"
means?
rgw op thread timeout
Description: The timeout in seconds for open threads.
Type: Integer
Default: 600
*(from https://docs.ceph.com/docs/nautilus/radosgw/config-ref/
<https://docs.ceph.com/docs/nautilus/radosgw/config-ref/>)*
Thanks,
Hanyu
On Tue, Sep 10, 2019 at 1:11 PM Frank Schilder <frans(a)dtu.dk> wrote:
> Hi Robert,
>
> I have meta data on SSD (3xrep) and data on 8+2 EC on spinning disks, so
> the speed difference is orders of magnitudes. Our usage is quite meta data
> heavy, so this suits us well. In particular since EC pools are high
> throughput with large IO sizes.
>
> As long as one uses fio with direct=1 (probably also if using sync=1
> and/or fsync=1), everything is fine and behaves as you describe. IOPs
> fluctuate but adjust to media speed. No problems at all.
>
> As mentioned in my last update (I cut it out below), the destructive fio
> command runs with direct=0 and neither sync=1 nor fsync=1. This test just
> writes as fast as it can (to buffers) without waiting for acks. I would
> expect that a ceph client would translate that to synced or direct IO,
> which would be fine.
>
> But it doesn't. Instead, it pushes the IO also as fast as possible to the
> cluster. I have seen 40kops write on the EC pool (on 100+ HDDs) that can
> handle maybe 1kops write in total. The queues were constantly increasing at
> an incredible rate (several hundred ops per second). I hope with the change
> of cut_off=high that heartbeats will not get lost any more, but this will
> still destabilize our ceph cluster quite dramatically.
>
Changing the cut_off to high will not allow heartbeats to not get lost
(heartbeats have a priority far above the high mark). What cut_off = high
does is put replication ops into the main queue instead of the strict
priority queue. That way an OSD doesn't get DDOSed from it's peers and is
never able to service it's own clients.
When I did my fio testing, was on FireFly/Hammer and on RBD, so I can't
talk specifically to newer versions and CephFS. We haven't had time to set
up our test cluster, so I can't run benches at the moment.
> My problem is not so much that such an IO pattern could occur in
> reasonable software, but
> - that someone might try just for fun, and that
> - the number of 500+ clients might occasionally produce such a workload by
> aggregation.
>
> I find it somewhat alarming that a storage system that promises data
> integrity and reliability can be taken down with a publicly available
> benchmark tool in a matter of a few dozen seconds by ordinary users.
> Potentially with damaging effects. I guess something similar could be
> achieved with a modified rogue client.
>
> I would expect that a storage cluster should have basic self-defence
> mechanisms that prevent this kind of overload or DOS attack by throttling
> clients with crazy IO requests. Are there any settings that can be enabled
> to prevent this from happening?
>
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Hello,
We recently upgraded from Luminous to Nautilus, after the upgrade, we are seeing this sporadic "lock-up" behavior on the RGW side.
What I noticed from the log is that it seems to coincide with rgw realm reloader. What we are seeing is that realm reloader tries to pause frontends, and the last request will take 1 minute or 2 at worst cases, and for that time period RGW is completely locked up, unable to take new requests.
1. Is this an expected behavior?
2. is there a way to disable "rgw realm reloader" or reduce frequency? We are not using multi-site feature and we don't change our realm at all.
Thanks!
Is snapshot of cephfs in version nautilus is production ready?
When I take a snapshot, what will happen if somebody change the content in that directory before the snapshot finish.
Will there be any conflict or something to notice about when I take a snapshot on a busy directory (probably 20 clients r/w in in that directory)
Hello,
We currently have an existing 20 node Ceph cluster (17 osd nodes with 3
mon nodes). When this was originally configured, much of the OS install
was done manually and the cluster was mainly deployed using ceph-deploy.
We are going to be replacing 9 of the 1st gen nodes with 6 newer more
dense nodes in the near future.
This time around I would like to see about automating the process as
much as possible (both OS and Ceph installs).
I was wondering if anyone had any suggestions about the best tool to use
for this as we are not setting up a cluster from scratch and whatever
tool we decide to use, would have to work within the context of an
already in production cluster.
Some of these tools seem best suited to work when you are setting up a
cluster for the very first time, however we are not in that position at
this point and I want to make sure I am using something that is flexible
enough for our environment going forward.
Thanks in advance,
Shain
--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smiley(a)npr.org | 202.513.3649