- ceph-users - lists.ceph.io

by Amudhan P

Hi, I am using ceph Nautilus cluster with below configuration. 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are running in shared mode. The client mounted through ceph kernel client. I was trying to emulate a node failure when a write and read were going on (replica2) pool. I was expecting read and write continue after a small pause due to a Node failure but it halts and never resumes until the failed node is up. I remember I tested the same scenario before in ceph mimic where it continued IO after a small pause. regards Amudhan P

3 years, 11 months

3
8
0 0

May Ceph Science User Group Virtual Meeting

by Kevin Hrpcek

Hello, We will be having a Ceph science/research/big cluster call on Wednesday May 27th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members mostly from hpc/htc/research environments where we discuss whatever is on our minds regarding ceph. Updates, outages, features, maintenance, etc...there is no set presenter but I do attempt to keep the conversation lively. https://pad.ceph.com/p/Ceph_Science_User_Group_20200527 We try to keep it to an hour or less. Ceph calendar event details: May 27, 2020 14:00 UTC 4pm Central European 9am Central US (I think I got timezones right this time) Description:Main pad for discussions: https://pad.ceph.com/p/Ceph_Science_User_Group_Index <https://www.google.com/url?q=https://pad.ceph.com/p/Ceph_Science_User_Group…> Meetings will be recorded and posted to the Ceph Youtube channel. To join the meeting on a computer or mobile phone: https://bluejeans.com/908675367?src=calendarLink <https://www.google.com/url?q=https://bluejeans.com/908675367?src%3Dcalendar…> To join from a Red Hat Deskphone or Softphone, dial: 84336. Connecting directly from a room system? 1.) Dial: 199.48.152.152 or bjn.vc <https://www.google.com/url?q=http://bjn.vc&sa=D&ust=1579363980705000&usg=AO…> 2.) Enter Meeting ID: 908675367 Just want to dial in on your phone? 1.) Dial one of the following numbers: 408-915-6466 (US) See all numbers: https://www.redhat.com/en/conference-numbers <https://www.google.com/url?q=https://www.redhat.com/en/conference-numbers&s…> 2.) Enter Meeting ID: 908675367 3.) Press # Want to test your video connection? https://bluejeans.com/111 <https://www.google.com/url?q=https://bluejeans.com/111&sa=D&ust=15793639807…> Kevin -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS Space Science & Engineering Center University of Wisconsin-Madison

3 years, 11 months

1
0
0 0

RGW resharding

by Adrian Nicolae

Hi, I have the following Ceph Mimic setup : - a bunch of old servers with 3-4 SATA drives each (74 OSDs in total) - index/leveldb is stored on each OSD (so no SSD drives, just SATA) - the current usage is : GLOBAL: SIZE AVAIL RAW USED %RAW USED 542 TiB 105 TiB 437 TiB 80.67 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 1 1.1 KiB 0 26 TiB 4 default.rgw.control 2 0 B 0 26 TiB 8 default.rgw.meta 3 20 MiB 0 26 TiB 75357 default.rgw.log 4 0 B 0 26 TiB 4271 default.rgw.buckets.data 5 290 TiB 85.05 51 TiB 78067284 default.rgw.buckets.non-ec 6 0 B 0 26 TiB 0 default.rgw.buckets.index 7 0 B 0 26 TiB 603008 - rgw_override_bucket_index_max_shards = 16. Clients are accessing RGW via Swift, not S3. - the replication schema is EC 4+2. We are using this Ceph cluster as a secondary storage for another storage infrastructure (which is more expensive) and we are offloading cold data (big files with a low number of downloads/reads from our customer). This way we can lower the TCO . So most of the files are big ( a few GB at least). So far Ceph is doing well considering that I don't have big expectations from current hardware. I'm a bit worried however that we have 78 M objects with max_shards=16 and we will probably reach 100M in the next few months. Do I need a increase the max shards to ensure the stability of the cluster ? I read that storing more than 1 M of objects in a single bucket can lead to OSD's flapping or having io timeouts during deep-scrub or even to have ODS's failures due to the leveldb compacting all the time if we have a large number of DELETEs. Any advice would be appreciated. Thank you, Adrian Nicolae

3 years, 11 months

2
4
0 0

Disable auto-creation of RGW pools

by Katarzyna Myrek

Hi We have some clusters which are rbd only. Each time someone uses radosgw-admin by mistake on those clusters, rgw pools are auto created. Is there a way to disable that? I mean the part: "When radosgw first tries to operate on a zone pool that does not exist, it will create that pool with the default values from osd pool default pg num and osd pool default pgp num" Thanks, Kate

3 years, 11 months

1
0
0 0

Issue adding mon after upgrade to 15.2.2

by Erwin Bogaard

In our lab-setup, I'm simulating the future migration of centos 7 + ceph 14.2.x to cepntos 8 + ceph 15.2.x. At the moment, I upgraded one of the nodes, which is a combined mon+mgr+mds+osd, to el8 + 15.2.2. The other node (also a combined one) is still on el7 + 14.2.9. The osd was detected and re-added easily, as were the mgr and mds. The mon won't add. I use the following (manual) method: 1. ceph auth get mon. -o /tmp/kr (exports correctly) 2. ceph mon getmap -o /tmp/mm (exports correctly) 3. sudo ceph-mon -i ontw-ceph01 --mkfs --monmap /tmp/mm --keyring /tmp/kr (creates file system + rocksdb correctly in "/var/lib/ceph/mon/ceph-ontw-ceph01") 4. systemctl start ceph-mon(a)ontw-ceph01.service ceph-mon starts up successfully, but won't enter the cluster. In the other (14.2.9) ceph-mon log, I see loads of this message (multiple per : -------------- 2020-05-25 13:07:38.075 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.075 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.076 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.076 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.077 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints etc... -------------- The 3300 and 6789 ports are open and reacheable on both nodes and they can reach and connect to each other. 192.168.100.60 = ontw-ceph01, upgraded 15.2.2-node 192.168.100.61 = ontw-ceph02, active 14.2.9-node When I start ceph-mon in debug mode @ ceph01, I see the following happening: -------------- 2020-05-25T13:07:35.031+0200 7f712e1d46c0 0 starting mon.ontw-ceph01 rank -1 at public addrs [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] at bind addrs [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] mon_data /var/lib/ceph/mon/ceph-ontw-ceph01 fsid f3c3f099-2940-4074-a7fe-1aea6259f67b 2020-05-25T13:07:35.033+0200 7f712e1d46c0 1 mon.ontw-ceph01@-1(???) e1 preinit fsid f3c3f099-2940-4074-a7fe-1aea6259f67b 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 check_fsid cluster_uuid contains 'f3c3f099-2940-4074-a7fe-1aea6259f67b' 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?)} 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 calc_quorum_requirements required_features 2449958197560098820 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 required_features 2449958197560098820 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 has_ever_joined = 0 2020-05-25T13:07:35.033+0200 7f712e1d46c0 1 mon.ontw-ceph01@-1(???) e1 initial_members ontw-ceph02,ontw-ceph01, filtering seed monmap 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 monmap is e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 extra probe peers 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 sync_last_committed_floor 0 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 init_paxos 2020-05-25T13:07:35.033+0200 7f712e1d46c0 5 mon.ontw-ceph01(a)-1(???).mds e0 Unable to load 'last_metadata' 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).health init 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).config init 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 refresh_from_paxos 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 refresh_from_paxos no cluster_fingerprint 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).log v0 update_from_paxos 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).log v0 update_from_paxos version 0 summary v 0 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).auth v0 update_from_paxos 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).config load_config got 0 keys 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgrstat 0 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgrstat check_subs 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).health update_from_paxos 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 loading initial keyring to bootstrap authentication for mkfs 2020-05-25T13:07:35.035+0200 7f712e1d46c0 2 mon.ontw-ceph01@-1(???) e1 init 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgr e0 prime_mgr_client 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 bootstrap 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 sync_reset_requester 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 unregister_cluster_logger - not registered 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 monmap e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 _reset 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 timecheck_finish 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 scrub_event_cancel 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 scrub_reset 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 reset_probe_timeout 0x563148b1eb60 after 2 seconds 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 probing other monitors 2020-05-25T13:07:35.035+0200 7f712e1d46c0 0 -- [v2: 192.168.100.60:3300/0,v1:192.168.100.60:6789/0] send_to message mon_probe(probe f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph01 new mon_release octopus) v7 with empty dest 2020-05-25T13:07:35.036+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:35.238+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:35.640+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:36.442+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:37.035+0200 7f711a308700 4 mon.ontw-ceph01@-1(probing) e1 probe_timeout 0x563148b1eb60 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 bootstrap 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 sync_reset_requester 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 unregister_cluster_logger - not registered 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 monmap e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 _reset 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 timecheck_finish 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 scrub_event_cancel 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 scrub_reset 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 reset_probe_timeout 0x563148b1f7c0 after 2 seconds 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 probing other monitors 2020-05-25T13:07:37.035+0200 7f711a308700 0 -- [v2: 192.168.100.60:3300/0,v1:192.168.100.60:6789/0] send_to message mon_probe(probe f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph01 new mon_release octopus) v7 with empty dest 2020-05-25T13:07:38.045+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 _ms_dispatch new session 0x563149893680 MonSession(mon.0 [v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0] is open , features 0x3f01cfb8ffacffff (luminous)) features 0x3f01cfb8ffacffff 2020-05-25T13:07:38.047+0200 7f7117b03700 5 mon.ontw-ceph01@-1(probing) e1 _ms_dispatch setting monitor caps on this connection 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 handle_probe mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 handle_probe_reply mon.0 v2:192.168.100.61:3300/0 mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 monmap is e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 got newer/committed monmap epoch 3, mine was 1 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 bootstrap 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 sync_reset_requester 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 unregister_cluster_logger - not registered 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout 0x563148b1f7c0 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 monmap e3: 1 mons at {ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 _reset 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 timecheck_finish 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 scrub_event_cancel 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 scrub_reset 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 reset_probe_timeout 0x563148b1f7c0 after 2 seconds 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 probing other monitors 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 handle_probe mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 handle_probe_reply mon.0 v2:192.168.100.61:3300/0 mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 monmap is e3: 1 mons at {ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 got newer/committed monmap epoch 3, mine was 3 -------------- I'm a bit lost at what's (not) happening. Hope you can help!

3 years, 11 months

1
0
0 0

Handling scrubbing/deep scrubbing

by Kamil Szczygieł

Hi, I've 4 node cluster with 13x15TB 7.2k OSDs each and around 300TB data inside. I'm having issues with deep scrub/scrub not being done in time, any tips to handle these operations with large disks like this? osd pool default size = 2 osd deep scrub interval = 2592000 osd scrub begin hour = 23 osd scrub end hour = 5 osd scrub sleep = 0.1 Cheers, Kamil

3 years, 11 months

2
1
0 0

Re: [External Email] Re: Ceph Nautius not working after setting MTU 9000

by Janne Johansson

Den mån 25 maj 2020 kl 10:03 skrev Marc Roos <M.Roos(a)f1-outsourcing.eu>: > > I am interested. I am always setting mtu to 9000. To be honest I cannot > imagine there is no optimization since you have less interrupt requests, > and you are able x times as much data. Every time there something > written about optimizing the first thing mention is changing to the mtu > 9000. Because it is quick and easy win. > > This sort of assumes you are not using interrupt coalescing network cards, because if you do, you can get something like hundreds of packets in one single IRQ*, already checksummed and stripped and in recent cards (10-25-40GE) even delivered into the cpu L3 cache by the time you get the int, so if they were 1500 or 9000 on the wire doesn't matter much by then. Even in the bad old days of software handling of all parts packet-related, many things (like mbuf allocations) were optimized for 1500, so 9k packets became just a multiple of a number of 1500 bytes chunks taken from a pool of network buffers anyhow. I'm not trying to shoot down the 9k-vs-1500 idea, but doing a benchmark will give you lots more facts than airing things that are easy to imagine but really doesn't have a huge impact because hw manufacturers worked around things like this a long time ago. If your tests say you win x%, then use it by all means. I'm just not thinking that 10/25/40G networks are so filled that the frame overheads really matter as a matter of % of the packet sizes and the cards offload most of the work to strip the overhead out, so the computer won't notice it was ever there. *) SysKonnect cards had this around 2003, just to get a feeling for what "modern ethernet cards" means in this context. -- May the most significant bit of your life be positive.

3 years, 11 months

1
0
0 0

No output from rbd perf image iotop/iostat

by Eugen Block

Hi all, I have a Nautilus cluster mostly used for RBD (openstack) and CephFS. I have been using rbd perf command from time to time but it doesn't work anymore. I have tried several images in different pools but there's no output at all except for client:~ $ rbd perf image iostat --format json volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472 rbd: waiting for initial image stats It never updates, no matter how long I wait. It stopped working while we were using version 14.2.3, last Friday we updated to 14.2.9 but it still doesn't work. The only relevant mgr log output I'm seeing in debug mode (debug_mgr 5/5) is this: ---snip--- 2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command decoded 4 2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command prefix=rbd perf image stats 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : from='client.710971242 v1:192.168.103.13:0/693257394' entity='client.admin' cmd=[ 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : { 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "prefix": "rbd perf image stats", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "pool_spec": "volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "sort_by": "write_ops", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "format": "json" 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : }"]: dispatch 2020-05-25 10:53:07.072 7fedd675a700 4 mgr.server reply reply success 2020-05-25 10:53:07.104 7fedd5f59700 4 mgr.server handle_report from 0x555e60f31200 osd,33 2020-05-25 10:53:07.172 7fedd5f59700 4 mgr.server handle_report from 0x555e5c6c6d80 osd,15 2020-05-25 10:53:07.224 7fedf10c1700 4 mgr send_beacon active ---snip--- What I'm also wondering about is that the "format": "json" doesn't change even if I choose to run --format plain or xml. Does anyone experience the same? The missing output also applies to rbd perf image iotop. Any hints are appreciated. Regards, Eugen

3 years, 11 months

1
0
1 0

remove secondary zone from multisite

by Zhenshi Zhou

Hi all, I'm gonna make my secondary zone offline. How to remove the secondary zone from a mutisite?

3 years, 11 months

1
1
0 0

Re: RGW Garbage Collector

by Matt Benjamin

Hi Manuel, rgw_gc_obj_min_wait -- yes, this is how you control how long rgw waits before removing the stripes of deleted objects the following are more gc performance and proportion of available iops: rgw_gc_processor_max_time -- controls how long gc runs once scheduled; a large value might be 3600 rgw_gc_processor_period -- sets the gc cycle; smaller is more frequent If you want to make gc more aggressive when it is running, set the following (can be increased), which more than doubles the : rgw_gc_max_concurrent_io = 20 rgw_gc_max_trim_chunk = 32 If you want to increase gc fraction of total rgw i/o, increase these (mostly, concurrent_io). regards, Matt On Sun, May 24, 2020 at 4:02 PM EDH - Manuel Rios <mriosfer(a)easydatahost.com> wrote: > > Hi, > > Im looking for any experience optimizing garbage collector with the next configs: > > global advanced rgw_gc_obj_min_wait > global advanced rgw_gc_processor_max_time > global advanced rgw_gc_processor_period > > By default gc expire objects within 2 hours, we're looking to define expire in 10 minutes as our S3 cluster got heavy uploads and deletes. > > Are those params usable? For us doesn't have sense store delete objects 2 hours in a gc. > > Regards > Manuel > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

3 years, 11 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users