May 2020 - ceph-users - lists.ceph.io

by 陈旭

Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph. Here is my ceph.conf: [global] fsid = 53f4e1d5-32ce-4e9c-bf36-f6b54b009962 mon_initial_members = db-16-4-hzxs, db-16-5-hzxs, db-16-6-hzxs mon_host = 10.25.16.4,10.25.16.5,10.25.16.6 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd pool default size = 3 [client] debug rbd = 20 debug rbd mirror = 20 debug rbd replay = 20 log file = /var/log/ceph/client_rbd.log I can not get any logs in /var/log/ceph/client_rbd.log. I also try to execute 'ceph daemon osd.* config set debug_rbd 20’ and there is also no related logs in ceph-osd.log. How can I get useful logs about this question or How can I analyze this problem? Look forward to your reply. Thanks ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 声明：此邮件可能包含依图公司保密或特权信息，并且仅应发送至有权接收该邮件的收件人。如果您无权收取该邮件，您应当立即删除该邮件并通知发件人，您并被禁止传播、分发或复制此邮件以及附件。对于此邮件可能携带的病毒引起的任何损害，本公司不承担任何责任。此外，本公司不保证已正确和完整地传输此信息，也不接受任何延迟收件的赔偿责任。 ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Notice: This email may contain confidential or privileged information of Yitu and was sent solely to the intended recipients. If you are unauthorized to receive this email, you should delete the email and contact the sender immediately. Any unauthorized disclosing, distribution, or copying of this email and attachment thereto is prohibited. Yitu does not accept any liability for any loss caused by possibly viruses in this email. E-mail transmission cannot be guaranteed to be secure or error-free and Yitu is not responsible for any delayed transmission.

3 years, 10 months

2
1
0 0

Help! ceph-mon is blocked after shutting down and ip address changed

by occj＠qq.com

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) os :CentOS Linux release 7.7.1908 (Core) single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used. ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6 I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon. and the ceph-mon log: ... 2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s 2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s 2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s 2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s 2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s 2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s 2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s 2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s 2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s ... I changed IP back to 192.168.0.104 yeasterday, but all the same. # cat /etc/ceph/ceph.conf [client.libvirt] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [client.rgw.ceph-node1.rgw0] host = ceph-node1 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-node1.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-node1.rgw0.log rgw frontends = beast endpoint=192.168.1.6:8080 rgw thread pool size = 512 # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] cluster network = 192.168.1.0/24 fsid = e384e8e6-94d5-4812-bfbb-d1b0468bdef5 mon host = [v2:192.168.1.6:3300,v1:192.168.1.6:6789] mon initial members = ceph-node1 osd crush chooseleaf type = 0 osd pool default crush rule = -1 public network = 192.168.1.0/24 [osd] osd memory target = 7870655146

3 years, 10 months

2
1
0 0

Radosgw PubSub Traffic

by Dustin Guerrero

Hey all, We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods like so: [pid 22561] sendmsg(77, {msg_name(0)=NULL, msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"..., 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, {"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …> I’ve checked other OSDs and only a single OSD receives these messages. I suspect its creating a bottleneck. Does anyone have an idea on why these are being generated or how to stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing simple gets/puts/deletes. We’re running Ceph 14.2.5 nautilus Thank you!

3 years, 10 months

2
2
0 0

Cache pools at or near target size but no evict happen

by icy chan

Hi, I had configured a cache tier with max object counts 500k. But no evict happens when the object counts hit the configured maximum. Anyone experienced this issue? What should I do? $ ceph health detail HEALTH_WARN 1 cache pools at or near target size CACHE_POOL_NEAR_FULL 1 cache pools at or near target size cache pool 'cached-hdd-cache' with 887.11k objects at/near target max 500k objects $ ceph df | grep -e "POOL\|cached-hdd" POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cached-hdd 24 1.4 TiB 1.52M 1.4 TiB 0.60 78 TiB cached-hdd-cache 25 842 GiB 887.14k 842 GiB 15.97 1.4 TiB $ ceph osd pool get cached-hdd-cache all size: 3 min_size: 1 pg_num: 128 pgp_num: 128 crush_rule: nvme-repl-rule hashpspool: true nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false hit_set_type: bloom hit_set_period: 1200 hit_set_count: 4 hit_set_fpp: 0.05 use_gmt_hitset: 1 target_max_objects: 500000 target_max_bytes: 1099511627776 cache_target_dirty_ratio: 0 cache_target_dirty_high_ratio: 0.7 cache_target_full_ratio: 0.9 cache_min_flush_age: 0 cache_min_evict_age: 0 min_read_recency_for_promote: 1 min_write_recency_for_promote: 1 fast_read: 0 hit_set_grade_decay_rate: 20 hit_set_search_last_n: 1 pg_autoscale_mode: warn Regs, Icy

3 years, 10 months

2
4
0 0

cephfs - modifying the ceph.file.layout of existing files

by Andrej Filipcic

Hi, I have two directories, cache_fast and cache_slow, and I would like to move the least used files from fast to slow, aka, user side tiering. cache_fast is pinned to fast_data ssd pool, while cache_slow to hdd cephfs_data pool. $ getfattr -n ceph.dir.layout /ceph/grid/cache_fast getfattr: Removing leading '/' from absolute path names # file: ceph/grid/cache_fast ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=fast_data" $ getfattr -n ceph.dir.layout /ceph/grid/cache_slow getfattr: Removing leading '/' from absolute path names # file: ceph/grid/cache_slow ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_data" "mv" from cache_fast dir to cache_slow dir only renames the file in mds, but does not involve migration to a different pool and changing the file layout. The only option I see at this point is to "cp" the file to a new dir and removing it from the old one, but this would involve client side operations and can be very slow. Is there any better way, that would work ceph server side? Best regards, Andrej -- _____________________________________________________________ prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-425-7074 -------------------------------------------------------------

3 years, 10 months

2
3
0 0

CEPH daemons crashed continously

by Mosharaf Hossain

I have build CEPH Cluster using "ceph version 14.2.8 nautilus". We observe continuous crash of "MON & MGR" service and healing automatically and the following error has been generated for MON service. Though the problem heals automatically but we need to know the reason of the problem. "utsname_hostname": "mon1", "assert_msg": "/build/ceph-14.2.8/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fbee0e3c700 time 2020-05-29 05:51:40.385068\n/build/ceph-14.2.8/src/common/ceph_time.h: 485: FAILED ceph_assert(z >= signedspan::zero())\n", "crash_id": "2020-05-28_23:51:40.391577Z_9fcf058c-d03a-41a2-91bb-293f0a7763bd", "assert_func": "ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)", "ceph_version": "14.2.8" } -- Regards, *Mosharaf Hossain* Product Development IT Division Bangladesh Export Import IT Division (BEXIMCO)

3 years, 10 months

1
0
0 0

Re: RGW orphans search

by Andrei Mikhailovsky

Hi Manuel, Thanks for the tip. Do you know if the latest code has this bug fixed? I was planning to upgrade to the latest major. Cheers ----- Original Message ----- > From: "EDH" <mriosfer(a)easydatahost.com> > To: "Andrei Mikhailovsky" <andrei(a)arhont.com>, "ceph-users" <ceph-users(a)ceph.io> > Sent: Saturday, 30 May, 2020 14:45:44 > Subject: RE: RGW orphans search > Hi Andrei, > > Orphans find code is not running. Will be deprecated in next reléase maybe > 14.2.10 > > Check: https://docs.ceph.com/docs/master/radosgw/orphans/ > > Stop progress is bugged. > > You got the same issue than us, multiparts are not being clean due a sharding > bugs. > > Or fast solution for recover 100TB , s3cmd sync to a other bucket and them > delete the old bucket. > > Not transparent at all but Works. > > Other recomendation: disable Dynamic shard and put a fixed shard number at your > config. > > Regards > Manuel > > > -----Mensaje original----- > De: Andrei Mikhailovsky <andrei(a)arhont.com> > Enviado el: sábado, 30 de mayo de 2020 13:12 > Para: ceph-users <ceph-users(a)ceph.io> > Asunto: [ceph-users] RGW orphans search > > Hello, > > I am trying to clean up some wasted space (about 1/3 of used space in the rados > pool is currently unaccounted for including the replication level). I've > started the search command 20 days ago ( radosgw-admin orphans find > --pool=.rgw.buckets --job-id=ophans_clean1 --yes-i-really-mean-it ) and it's > still showing me the same thing: > > [ > { > "orphan_search_state": { > "info": { > "orphan_search_info": { > "job_name": "ophans_clean1", > "pool": ".rgw.buckets", > "num_shards": 64, > "start_time": "2020-05-10 21:39:28.913405Z" > } > }, > "stage": { > "orphan_search_stage": { > "search_stage": "iterate_bucket_index", > "shard": 0, > "marker": "" > } > } > } > } > ] > > > The output of the command keeps showing this (hundreds of thousands of lines): > > storing 1 entries at orphan.scan.ophans_clean1.linked.60 > > The total size of the pool is around 30TB and the buckets usage is just under > 10TB. The replica is 2. The activity on the cluster has spiked up since I've > started the command (currently seeing between 10-20K iops compared to a typical > 2-5k iops). > > Has anyone experienced this behaviour? It seems like the command should have > finished by now with only 30TB of used up space. I am running 13.2.10-1xenial > version of ceph. > > Cheers > > Andrei > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to > ceph-users-leave(a)ceph.io

3 years, 10 months

1
0
0 0

Re: [ceph-users]: Ceph Nautius not working after setting MTU 9000

by Dave Hall

Hello. A few days ago I offered to share the notes I've compiled on network tuning. Right now it's a Google Doc: https://docs.google.com/document/d/1nB5fzIeSgQF0ti_WN-tXhXAlDh8_f8XF9GhU7J1… I've set it up to allow comments and I'd be glad for questions and feedback. If Google Docs not an acceptable format I'll try to put it up somewhere as HTML or Wiki. Disclosure: some sections were copied verbatim from other sources. Regarding the current discussion about iperf, the likely bottleneck is buffering. There is a per-NIC output queue set with 'ip link' and a per CPU core input queue set with 'sysctl'. Both should be set to some multiple of the frame size based on calculations related to link speed and latency. Jumping from 1500 to 9000 could negatively impact performance because one buffer or the other might be 1500 bytes short of a low multiple of 9000. It would be interesting to see the iperf tests repeated with corresponding buffer sizing. I will perform this experiment as soon as I complete some day-job tasks. -Dave Dave Hall Binghamton University kdhall(a)binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On 5/27/2020 6:51 AM, EDH - Manuel Rios wrote: > Anyone can share their table with other MTU values? > > Also interested into Switch CPU load > > KR, > Manuel > > -----Mensaje original----- > De: Marc Roos <M.Roos(a)f1-outsourcing.eu> > Enviado el: miércoles, 27 de mayo de 2020 12:01 > Para: chris.palmer <chris.palmer(a)pobox.com>; paul.emmerich <paul.emmerich(a)croit.io> > CC: amudhan83 <amudhan83(a)gmail.com>; anthony.datri <anthony.datri(a)gmail.com>; ceph-users <ceph-users(a)ceph.io>; doustar <doustar(a)rayanexon.ir>; kdhall <kdhall(a)binghamton.edu>; sstkadu <sstkadu(a)gmail.com> > Asunto: [ceph-users] Re: [External Email] Re: Ceph Nautius not working after setting MTU 9000 > > > Interesting table. I have this on a production cluster 10gbit at a > datacenter (obviously doing not that much). > > > [@]# iperf3 -c 10.0.0.13 -P 1 -M 9000 > Connecting to host 10.0.0.13, port 5201 > [ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 1.14 GBytes 9.77 Gbits/sec 0 690 KBytes > [ 4] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.08 MBytes > [ 4] 2.00-3.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > [ 4] 3.00-4.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > [ 4] 4.00-5.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > [ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.21 MBytes > [ 4] 6.00-7.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > [ 4] 7.00-8.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.21 MBytes > [ 4] 8.00-9.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > [ 4] 9.00-10.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec 0 > sender > [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec > receiver > > > -----Original Message----- > Subject: Re: [ceph-users] Re: [External Email] Re: Ceph Nautius not > working after setting MTU 9000 > > To elaborate on some aspects that have been mentioned already and add > some others:: > > > * Test using iperf3. > > * Don't try to use jumbos on networks where you don't have complete > control over every host. This usually includes the main ceph network. > It's just too much grief. You can consider using it for limited-access > networks (e.g. ceph cluster network, hypervisor migration network, etc) > where you know every switch & host is tuned correctly. (This works even > when those nets share a vlan trunk with non-jumbo vlans - just set the > max value on the trunk itself, and individual values on each vlan.) > > * If you are pinging make sure it doesn't fragment otherwise you > will get misleading results: e.g. ping -M do -s 9000 x.x.x.x > * Do not assume that 9000 is the best value. It depends on your > NICs, your switch, kernel/device parameters, etc. Try different values > (using iperf3). As an example the results below are using a small cheap > Mikrotek 10G switch and HPE 10G NICs. It highlights how in this > configuration 9000 is worse than 1500, but that 5139 is optimal yet 5140 > is worst. The same pattern (obviously with different values) was > apparent when multiple tests were run concurrently. Always test your own > network in a controlled manner. And of course if you introduce anything > different later on, test again. With enterprise-grade kit this might not > be so common, but always test if you fiddle. > > > MTU Gbps (actual data transfer values using iperf3) - one particular > configuration only > > 9600 8.91 (max value) > 9000 8.91 > 8000 8.91 > 7000 8.91 > 6000 8.91 > 5500 8.17 > 5200 7.71 > 5150 7.64 > 5140 7.62 > 5139 9.81 (optimal) > 5138 9.81 > 5137 9.81 > 5135 9.81 > 5130 9.81 > 5120 9.81 > 5100 9.81 > 5000 9.81 > 4000 9.76 > 3000 9.68 > 2000 9.28 > 1500 9.37 (default) > > > Whether any of this will make a tangible difference for ceph is moot. I > just spend a little time getting the network stack correct as above, > then leave it. That way I know I am probably getting some benefit, and > not doing any harm. If you blindly change things you may well do harm > that can manifest itself in all sorts of ways outside of Ceph. Getting > some test results for this using Ceph will be easy; getting MEANINGFUL > results that way will be hard. > > > Chris > > > On 27/05/2020 09:25, Marc Roos wrote: > > > > > I would not call a ceph page, a random tuning tip. At least I hope > they > are not. NVMe-only with 100Gbit is not really a standard setup. I > assume > with such setup you have the luxury to not notice many > optimizations. > > What I mostly read is that changing to mtu 9000 will allow you to > better > saturate the 10Gbit adapter, and I expect this to show on a low end > busy > cluster. Don't you have any test results of such a setup? > > > > > -----Original Message----- > > Subject: Re: [ceph-users] Re: [External Email] Re: Ceph Nautius not > > working after setting MTU 9000 > > Don't optimize stuff without benchmarking *before and after*, don't > > apply random tuning tipps from the Internet without benchmarking > them. > > My experience with Jumbo frames: 3% performance. On a NVMe-only > setup > with 100 Gbit/s network. > > Paul > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at > https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Tue, May 26, 2020 at 7:02 PM Marc Roos > <M.Roos(a)f1-outsourcing.eu> <mailto:M.Roos@f1-outsourcing.eu> > wrote: > > > > > Look what I have found!!! :) > https://ceph.com/geen-categorie/ceph-loves-jumbo-frames/ > > > > -----Original Message----- > From: Anthony D'Atri [mailto:anthony.datri@gmail.com] > Sent: maandag 25 mei 2020 22:12 > To: Marc Roos > Cc: kdhall; martin.verges; sstkadu; amudhan83; ceph-users; > doustar > Subject: Re: [ceph-users] Re: [External Email] Re: Ceph > Nautius not > > working after setting MTU 9000 > > Quick and easy depends on your network infrastructure. > Sometimes > it is > difficult or impossible to retrofit a live cluster without > disruption. > > > > On May 25, 2020, at 1:03 AM, Marc Roos > <M.Roos(a)f1-outsourcing.eu> <mailto:M.Roos@f1-outsourcing.eu> > > wrote: > > > > > > I am interested. I am always setting mtu to 9000. To be > honest I > > cannot imagine there is no optimization since you have less > interrupt > > requests, and you are able x times as much data. Every time > there > > > something written about optimizing the first thing mention > is > changing > > > to the mtu 9000. Because it is quick and easy win. > > > > > > > > > > -----Original Message----- > > From: Dave Hall [mailto:kdhall@binghamton.edu] > > Sent: maandag 25 mei 2020 5:11 > > To: Martin Verges; Suresh Rama > > Cc: Amudhan P; Khodayar Doustar; ceph-users > > Subject: [ceph-users] Re: [External Email] Re: Ceph Nautius > not > > working after setting MTU 9000 > > > > All, > > > > Regarding Martin's observations about Jumbo Frames.... > > > > I have recently been gathering some notes from various > internet > > sources regarding Linux network performance, and Linux > performance in > > general, to be applied to a Ceph cluster I manage but also > to the > rest > > > of the Linux server farm I'm responsible for. > > > > In short, enabling Jumbo Frames without also tuning a number > of > other > > kernel and NIC attributes will not provide the performance > increases > > we'd like to see. I have not yet had a chance to go through > the > rest > > of the testing I'd like to do, but I can confirm (via > iperf3) > that > > only enabling Jumbo Frames didn't make a significant > difference. > > > > Some of the other attributes I'm referring to are incoming > and > > outgoing buffer sizes at the NIC, IP, and TCP levels, > interrupt > > coalescing, NIC offload functions that should or shouldn't > be > turned > > on, packet queuing disciplines (tc), the best choice of TCP > slow-start > > > algorithms, and other TCP features and attributes. > > > > The most off-beat item I saw was something about adding > IPTABLES > rules > > > to bypass CONNTRACK table lookups. > > > > In order to do anything meaningful to assess the effect of > all of > > > these settings I'd like to figure out how to set them all > via > Ansible > > - so more to learn before I can give opinions. > > > > --> If anybody has added this type of configuration to Ceph > > Ansible, > > I'd be glad for some pointers. > > > > I have started to compile a document containing my notes. > It's > rough, > > > but I'd be glad to share if anybody is interested. > > > > -Dave > > > > Dave Hall > > Binghamton University > > > >> On 5/24/2020 12:29 PM, Martin Verges wrote: > >> > >> Just save yourself the trouble. You won't have any real > benefit > from > > MTU > >> 9000. It has some smallish, but it is not worth the effort, > > problems, > > and > >> loss of reliability for most environments. > >> Try it yourself and do some benchmarks, especially with > your > regular > >> workload on the cluster (not the maximum peak performance), > then > drop > > the > >> MTU to default ;). > >> > >> Please if anyone has other real world benchmarks showing > huge > > differences > >> in regular Ceph clusters, please feel free to post it here. > >> > >> -- > >> Martin Verges > >> Managing director > >> > >> Mobile: +49 174 9335695 > >> E-Mail: martin.verges(a)croit.io > >> Chat: https://t.me/MartinVerges > >> > >> croit GmbH, Freseniusstr. 31h, 81247 Munich > >> CEO: Martin Verges - VAT-ID: DE310638492 Com. register: > Amtsgericht > >> Munich HRB 231263 > >> > >> Web: https://croit.io > >> YouTube: https://goo.gl/PGE1Bx > >> > >> > >>> Am So., 24. Mai 2020 um 15:54 Uhr schrieb Suresh Rama > >> <sstkadu(a)gmail.com> <mailto:sstkadu@gmail.com> : > >> > >>> Ping with 9000 MTU won't get response as I said and it > should > be > > 8972. Glad > >>> it is working but you should know what happened to avoid > this > issue > > later. > >>> > >>>> On Sun, May 24, 2020, 3:04 AM Amudhan P > <amudhan83(a)gmail.com> <mailto:amudhan83@gmail.com> > wrote: > >>> > >>>> No, ping with MTU size 9000 didn't work. > >>>> > >>>> On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar > > <doustar(a)rayanexon.ir> <mailto:doustar@rayanexon.ir> > >>>> wrote: > >>>> > >>>>> Does your ping work or not? > >>>>> > >>>>> > >>>>> On Sun, May 24, 2020 at 6:53 AM Amudhan P > <amudhan83(a)gmail.com> <mailto:amudhan83@gmail.com> > > wrote: > >>>>> > >>>>>> Yes, I have set setting on the switch side also. > >>>>>> > >>>>>> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, > > <doustar(a)rayanexon.ir> <mailto:doustar@rayanexon.ir> > >>>>>> wrote: > >>>>>> > >>>>>>> Problem should be with network. When you change MTU it > > should be > >>>> changed > >>>>>>> all over the network, any single hup on your network > should > > >>>>>>> speak > > and > >>>>>>> accept 9000 MTU packets. you can check it on your > hosts > with > >>> "ifconfig" > >>>>>>> command and there is also equivalent commands for > other > >>>> network/security > >>>>>>> devices. > >>>>>>> > >>>>>>> If you have just one node which it not correctly > configured > for > > MTU > >>>> 9000 > >>>>>>> it wouldn't work. > >>>>>>> > >>>>>>> On Sat, May 23, 2020 at 2:30 PM sinan(a)turka.nl > <sinan(a)turka.nl> <mailto:sinan@turka.nl> > >>> wrote: > >>>>>>>> Can the servers/nodes ping eachother using large > packet > sizes? > >>>>>>>> I > >>> guess > >>>>>>>> not. > >>>>>>>> > >>>>>>>> Sinan Polat > >>>>>>>> > >>>>>>>>> Op 23 mei 2020 om 14:21 heeft Amudhan P > <amudhan83(a)gmail.com> <mailto:amudhan83@gmail.com> > > het > >>>>>>>> volgende geschreven: > >>>>>>>>> In OSD logs "heartbeat_check: no reply from OSD" > >>>>>>>>> > >>>>>>>>>> On Sat, May 23, 2020 at 5:44 PM Amudhan P > > <amudhan83(a)gmail.com> <mailto:amudhan83@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I have set Network switch with MTU size 9000 and > also in > my > >>> netplan > >>>>>>>>>> configuration. > >>>>>>>>>> > >>>>>>>>>> What else needs to be checked? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander > < > >>> wido(a)42on.com > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 5/23/20 12:02 PM, Amudhan P wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I am using ceph Nautilus in Ubuntu 18.04 working > fine > wit > > MTU > >>>> size > >>>>>>>> 1500 > >>>>>>>>>>>> (default) recently i tried to update MTU size to > 9000. > >>>>>>>>>>>> After setting Jumbo frame running ceph -s is > timing > out. > >>>>>>>>>>> Ceph can run just fine with an MTU of 9000. But > there > is > >>> probably > >>>>>>>>>>> something else wrong on the network which is > causing > this. > >>>>>>>>>>> > >>>>>>>>>>> Check the Jumbo Frames settings on all the > switches as > well > > to > >>>> make > >>>>>>>> sure > >>>>>>>>>>> they forward all the packets. > >>>>>>>>>>> > >>>>>>>>>>> This is definitely not a Ceph issue. > >>>>>>>>>>> > >>>>>>>>>>> Wido > >>>>>>>>>>> > >>>>>>>>>>>> regards > >>>>>>>>>>>> Amudhan P > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io To > >>>>>>>>>>>> unsubscribe send an email to > ceph-users-leave(a)ceph.io > >>>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > > >>>>>>>>>>> send an email to ceph-users-leave(a)ceph.io > >>>>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > >>>>>>>>> send an email to ceph-users-leave(a)ceph.io > >>>>>>>> _______________________________________________ > >>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > >>>>>>>> send an email to ceph-users-leave(a)ceph.io > >>>>>>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > send > >>>> an email to ceph-users-leave(a)ceph.io > >>>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > send an > > >>> email to ceph-users-leave(a)ceph.io > >>> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io To > unsubscribe > send an > >> email to ceph-users-leave(a)ceph.io > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe > send > an > > email to ceph-users-leave(a)ceph.io > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe > send > an > > email to ceph-users-leave(a)ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 10 months

3
4
0 0

RGW orphans search

by Andrei Mikhailovsky

Hello, I am trying to clean up some wasted space (about 1/3 of used space in the rados pool is currently unaccounted for including the replication level). I've started the search command 20 days ago ( radosgw-admin orphans find --pool=.rgw.buckets --job-id=ophans_clean1 --yes-i-really-mean-it ) and it's still showing me the same thing: [ { "orphan_search_state": { "info": { "orphan_search_info": { "job_name": "ophans_clean1", "pool": ".rgw.buckets", "num_shards": 64, "start_time": "2020-05-10 21:39:28.913405Z" } }, "stage": { "orphan_search_stage": { "search_stage": "iterate_bucket_index", "shard": 0, "marker": "" } } } } ] The output of the command keeps showing this (hundreds of thousands of lines): storing 1 entries at orphan.scan.ophans_clean1.linked.60 The total size of the pool is around 30TB and the buckets usage is just under 10TB. The replica is 2. The activity on the cluster has spiked up since I've started the command (currently seeing between 10-20K iops compared to a typical 2-5k iops). Has anyone experienced this behaviour? It seems like the command should have finished by now with only 30TB of used up space. I am running 13.2.10-1xenial version of ceph. Cheers Andrei

3 years, 11 months

2
1
0 0

warn if acting set violates failure domain

by Dan van der Ster

Hi, (In nautilus) I noticed that the acting set for a pg in acting+remapped+backfilling (or backfill_wait) can violate the failure domain rule. We have a 3x pool replicated across racks. Following some host outage I noticed several PGs like: 75.200 7570 0 7570 0 31583745536 0 0 3058 active+remapped+backfill_wait 55m 3208334'13480873 3208334:23611310 [596,502,717]p596 [596,502,424]p596 2020-05-28 05:44:42.604307 2020-05-28 05:44:42.604307 Checking those up and acting sets, I have: OK: PG 75.200 has no failure domain problem in up set [596, 502, 717] with racks ['BA09', 'BA10', 'BA12'] WARN: PG 75.200 has a failure domain problem in acting set [596, 502, 424] with racks ['BA09', 'BA10', 'BA09'] If BA09 were to go down, the pg would drop below min_size. (The full pg query output is at https://pastebin.com/rrEQMnyC) Is this https://tracker.ceph.com/issues/3360 ? # ceph osd dump | grep 75.200 pg_temp 75.200 [596,502,424] Shouldn't we mark a PG degraded or similar to raise HEALTH_WARN if the failure domain is violated? Cheers, Dan

3 years, 11 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2020