Hi,
I am using ceph Nautilus cluster with below configuration.
3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are running
in shared mode.
The client mounted through ceph kernel client.
I was trying to emulate a node failure when a write and read were going on
(replica2) pool.
I was expecting read and write continue after a small pause due to a Node
failure but it halts and never resumes until the failed node is up.
I remember I tested the same scenario before in ceph mimic where it
continued IO after a small pause.
regards
Amudhan P
Hi,
I have the following Ceph Mimic setup :
- a bunch of old servers with 3-4 SATA drives each (74 OSDs in total)
- index/leveldb is stored on each OSD (so no SSD drives, just SATA)
- the current usage is :
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
542 TiB 105 TiB 437 TiB 80.67
POOLS:
NAME ID USED %USED MAX
AVAIL OBJECTS
.rgw.root 1 1.1 KiB 0 26
TiB 4
default.rgw.control 2 0 B 0 26
TiB 8
default.rgw.meta 3 20 MiB 0 26
TiB 75357
default.rgw.log 4 0 B 0 26
TiB 4271
default.rgw.buckets.data 5 290 TiB 85.05 51 TiB
78067284
default.rgw.buckets.non-ec 6 0 B 0 26
TiB 0
default.rgw.buckets.index 7 0 B 0 26
TiB 603008
- rgw_override_bucket_index_max_shards = 16. Clients are accessing RGW
via Swift, not S3.
- the replication schema is EC 4+2.
We are using this Ceph cluster as a secondary storage for another
storage infrastructure (which is more expensive) and we are offloading
cold data (big files with a low number of downloads/reads from our
customer). This way we can lower the TCO . So most of the files are big
( a few GB at least).
So far Ceph is doing well considering that I don't have big
expectations from current hardware. I'm a bit worried however that we
have 78 M objects with max_shards=16 and we will probably reach 100M in
the next few months. Do I need a increase the max shards to ensure the
stability of the cluster ? I read that storing more than 1 M of objects
in a single bucket can lead to OSD's flapping or having io timeouts
during deep-scrub or even to have ODS's failures due to the leveldb
compacting all the time if we have a large number of DELETEs.
Any advice would be appreciated.
Thank you,
Adrian Nicolae
Hi
We have some clusters which are rbd only. Each time someone uses
radosgw-admin by mistake on those clusters, rgw pools are auto created.
Is there a way to disable that? I mean the part:
"When radosgw first tries to operate on a zone pool that does not exist, it
will create that pool with the default values from osd pool default pg num
and osd pool default pgp num"
Thanks,
Kate
Hi,
I've 4 node cluster with 13x15TB 7.2k OSDs each and around 300TB data inside. I'm having issues with deep scrub/scrub not being done in time, any tips to handle these operations with large disks like this?
osd pool default size = 2
osd deep scrub interval = 2592000
osd scrub begin hour = 23
osd scrub end hour = 5
osd scrub sleep = 0.1
Cheers,
Kamil
Den mån 25 maj 2020 kl 10:03 skrev Marc Roos <M.Roos(a)f1-outsourcing.eu>:
>
> I am interested. I am always setting mtu to 9000. To be honest I cannot
> imagine there is no optimization since you have less interrupt requests,
> and you are able x times as much data. Every time there something
> written about optimizing the first thing mention is changing to the mtu
> 9000. Because it is quick and easy win.
>
>
This sort of assumes you are not using interrupt coalescing network cards,
because if you do, you can get something like hundreds of packets in one
single IRQ*, already checksummed and stripped and in recent cards
(10-25-40GE) even delivered into the cpu L3 cache by the time you get the
int, so if they were 1500 or 9000 on the wire doesn't matter much by then.
Even in the bad old days of software handling of all parts packet-related,
many things (like mbuf allocations) were optimized for 1500, so 9k packets
became just a multiple of a number of 1500 bytes chunks taken from a pool
of network buffers anyhow.
I'm not trying to shoot down the 9k-vs-1500 idea, but doing a benchmark
will give you lots more facts than airing things that are easy to imagine
but really doesn't have a huge impact because hw manufacturers worked
around things like this a long time ago. If your tests say you win x%, then
use it by all means. I'm just not thinking that 10/25/40G networks are so
filled that the frame overheads really matter as a matter of % of the
packet sizes and the cards offload most of the work to strip the overhead
out, so the computer won't notice it was ever there.
*) SysKonnect cards had this around 2003, just to get a feeling for what
"modern ethernet cards" means in this context.
--
May the most significant bit of your life be positive.
Hi all,
I have a Nautilus cluster mostly used for RBD (openstack) and CephFS.
I have been using rbd perf command from time to time but it doesn't
work anymore. I have tried several images in different pools but
there's no output at all except for
client:~ $ rbd perf image iostat --format json
volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472
rbd: waiting for initial image stats
It never updates, no matter how long I wait. It stopped working while
we were using version 14.2.3, last Friday we updated to 14.2.9 but it
still doesn't work.
The only relevant mgr log output I'm seeing in debug mode (debug_mgr
5/5) is this:
---snip---
2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command decoded 4
2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command
prefix=rbd perf image stats
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
from='client.710971242 v1:192.168.103.13:0/693257394'
entity='client.admin' cmd=[
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : {
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
"prefix": "rbd perf image stats",
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
"pool_spec":
"volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472",
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
"sort_by": "write_ops",
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
"format": "json"
2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] :
}"]: dispatch
2020-05-25 10:53:07.072 7fedd675a700 4 mgr.server reply reply success
2020-05-25 10:53:07.104 7fedd5f59700 4 mgr.server handle_report from
0x555e60f31200 osd,33
2020-05-25 10:53:07.172 7fedd5f59700 4 mgr.server handle_report from
0x555e5c6c6d80 osd,15
2020-05-25 10:53:07.224 7fedf10c1700 4 mgr send_beacon active
---snip---
What I'm also wondering about is that the "format": "json" doesn't
change even if I choose to run --format plain or xml.
Does anyone experience the same? The missing output also applies to
rbd perf image iotop.
Any hints are appreciated.
Regards,
Eugen
Hi Manuel,
rgw_gc_obj_min_wait -- yes, this is how you control how long rgw waits
before removing the stripes of deleted objects
the following are more gc performance and proportion of available iops:
rgw_gc_processor_max_time -- controls how long gc runs once scheduled;
a large value might be 3600
rgw_gc_processor_period -- sets the gc cycle; smaller is more frequent
If you want to make gc more aggressive when it is running, set the
following (can be increased), which more than doubles the :
rgw_gc_max_concurrent_io = 20
rgw_gc_max_trim_chunk = 32
If you want to increase gc fraction of total rgw i/o, increase these
(mostly, concurrent_io).
regards,
Matt
On Sun, May 24, 2020 at 4:02 PM EDH - Manuel Rios
<mriosfer(a)easydatahost.com> wrote:
>
> Hi,
>
> Im looking for any experience optimizing garbage collector with the next configs:
>
> global advanced rgw_gc_obj_min_wait
> global advanced rgw_gc_processor_max_time
> global advanced rgw_gc_processor_period
>
> By default gc expire objects within 2 hours, we're looking to define expire in 10 minutes as our S3 cluster got heavy uploads and deletes.
>
> Are those params usable? For us doesn't have sense store delete objects 2 hours in a gc.
>
> Regards
> Manuel
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309