May 2020 - ceph-users - lists.ceph.io

Issue adding mon after upgrade to 15.2.2

by Erwin Bogaard

In our lab-setup, I'm simulating the future migration of centos 7 + ceph 14.2.x to cepntos 8 + ceph 15.2.x. At the moment, I upgraded one of the nodes, which is a combined mon+mgr+mds+osd, to el8 + 15.2.2. The other node (also a combined one) is still on el7 + 14.2.9. The osd was detected and re-added easily, as were the mgr and mds. The mon won't add. I use the following (manual) method: 1. ceph auth get mon. -o /tmp/kr (exports correctly) 2. ceph mon getmap -o /tmp/mm (exports correctly) 3. sudo ceph-mon -i ontw-ceph01 --mkfs --monmap /tmp/mm --keyring /tmp/kr (creates file system + rocksdb correctly in "/var/lib/ceph/mon/ceph-ontw-ceph01") 4. systemctl start ceph-mon(a)ontw-ceph01.service ceph-mon starts up successfully, but won't enter the cluster. In the other (14.2.9) ceph-mon log, I see loads of this message (multiple per : -------------- 2020-05-25 13:07:38.075 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.075 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.076 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.076 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints 2020-05-25 13:07:38.077 7fb38366d700 1 mon.ontw-ceph02@0(leader) e3 adding peer [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] to list of hints etc... -------------- The 3300 and 6789 ports are open and reacheable on both nodes and they can reach and connect to each other. 192.168.100.60 = ontw-ceph01, upgraded 15.2.2-node 192.168.100.61 = ontw-ceph02, active 14.2.9-node When I start ceph-mon in debug mode @ ceph01, I see the following happening: -------------- 2020-05-25T13:07:35.031+0200 7f712e1d46c0 0 starting mon.ontw-ceph01 rank -1 at public addrs [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] at bind addrs [v2:192.168.100.60:3300/0,v1:192.168.100.60:6789/0] mon_data /var/lib/ceph/mon/ceph-ontw-ceph01 fsid f3c3f099-2940-4074-a7fe-1aea6259f67b 2020-05-25T13:07:35.033+0200 7f712e1d46c0 1 mon.ontw-ceph01@-1(???) e1 preinit fsid f3c3f099-2940-4074-a7fe-1aea6259f67b 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 check_fsid cluster_uuid contains 'f3c3f099-2940-4074-a7fe-1aea6259f67b' 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?)} 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 calc_quorum_requirements required_features 2449958197560098820 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 required_features 2449958197560098820 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 has_ever_joined = 0 2020-05-25T13:07:35.033+0200 7f712e1d46c0 1 mon.ontw-ceph01@-1(???) e1 initial_members ontw-ceph02,ontw-ceph01, filtering seed monmap 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 monmap is e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 extra probe peers 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 sync_last_committed_floor 0 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 init_paxos 2020-05-25T13:07:35.033+0200 7f712e1d46c0 5 mon.ontw-ceph01(a)-1(???).mds e0 Unable to load 'last_metadata' 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).health init 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).config init 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 refresh_from_paxos 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 refresh_from_paxos no cluster_fingerprint 2020-05-25T13:07:35.033+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).log v0 update_from_paxos 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).log v0 update_from_paxos version 0 summary v 0 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).auth v0 update_from_paxos 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).config load_config got 0 keys 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgrstat 0 2020-05-25T13:07:35.034+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgrstat check_subs 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).health update_from_paxos 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(???) e1 loading initial keyring to bootstrap authentication for mkfs 2020-05-25T13:07:35.035+0200 7f712e1d46c0 2 mon.ontw-ceph01@-1(???) e1 init 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(???).mgr e0 prime_mgr_client 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 bootstrap 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 sync_reset_requester 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 unregister_cluster_logger - not registered 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 monmap e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 _reset 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 timecheck_finish 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 scrub_event_cancel 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 scrub_reset 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 reset_probe_timeout 0x563148b1eb60 after 2 seconds 2020-05-25T13:07:35.035+0200 7f712e1d46c0 10 mon.ontw-ceph01@-1(probing) e1 probing other monitors 2020-05-25T13:07:35.035+0200 7f712e1d46c0 0 -- [v2: 192.168.100.60:3300/0,v1:192.168.100.60:6789/0] send_to message mon_probe(probe f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph01 new mon_release octopus) v7 with empty dest 2020-05-25T13:07:35.036+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:35.238+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:35.640+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:36.442+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:37.035+0200 7f711a308700 4 mon.ontw-ceph01@-1(probing) e1 probe_timeout 0x563148b1eb60 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 bootstrap 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 sync_reset_requester 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 unregister_cluster_logger - not registered 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 monmap e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 _reset 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 timecheck_finish 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 scrub_event_cancel 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 scrub_reset 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 reset_probe_timeout 0x563148b1f7c0 after 2 seconds 2020-05-25T13:07:37.035+0200 7f711a308700 10 mon.ontw-ceph01@-1(probing) e1 probing other monitors 2020-05-25T13:07:37.035+0200 7f711a308700 0 -- [v2: 192.168.100.60:3300/0,v1:192.168.100.60:6789/0] send_to message mon_probe(probe f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph01 new mon_release octopus) v7 with empty dest 2020-05-25T13:07:38.045+0200 7f711bb0b700 10 mon.ontw-ceph01@-1(probing) e1 get_authorizer for mon 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 _ms_dispatch new session 0x563149893680 MonSession(mon.0 [v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0] is open , features 0x3f01cfb8ffacffff (luminous)) features 0x3f01cfb8ffacffff 2020-05-25T13:07:38.047+0200 7f7117b03700 5 mon.ontw-ceph01@-1(probing) e1 _ms_dispatch setting monitor caps on this connection 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 handle_probe mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 handle_probe_reply mon.0 v2:192.168.100.61:3300/0 mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 monmap is e1: 2 mons at {ontw-ceph01=,ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e1 got newer/committed monmap epoch 3, mine was 1 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 bootstrap 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 sync_reset_requester 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 unregister_cluster_logger - not registered 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout 0x563148b1f7c0 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 monmap e3: 1 mons at {ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 _reset 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01(a)-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 timecheck_finish 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 scrub_event_cancel 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 scrub_reset 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 cancel_probe_timeout (none scheduled) 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 reset_probe_timeout 0x563148b1f7c0 after 2 seconds 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 probing other monitors 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 handle_probe mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 handle_probe_reply mon.0 v2:192.168.100.61:3300/0 mon_probe(reply f3c3f099-2940-4074-a7fe-1aea6259f67b name ontw-ceph02 quorum 0 paxos( fc 304966 lc 305658 ) mon_release nautilus) v7 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 monmap is e3: 1 mons at {ontw-ceph02=[v2: 192.168.100.61:3300/0,v1:192.168.100.61:6789/0]} 2020-05-25T13:07:38.047+0200 7f7117b03700 10 mon.ontw-ceph01@-1(probing) e3 got newer/committed monmap epoch 3, mine was 3 -------------- I'm a bit lost at what's (not) happening. Hope you can help!

3 years, 11 months

1
0
0 0

Handling scrubbing/deep scrubbing

by Kamil Szczygieł

Hi, I've 4 node cluster with 13x15TB 7.2k OSDs each and around 300TB data inside. I'm having issues with deep scrub/scrub not being done in time, any tips to handle these operations with large disks like this? osd pool default size = 2 osd deep scrub interval = 2592000 osd scrub begin hour = 23 osd scrub end hour = 5 osd scrub sleep = 0.1 Cheers, Kamil

3 years, 11 months

2
1
0 0

Re: [External Email] Re: Ceph Nautius not working after setting MTU 9000

by Janne Johansson

Den mån 25 maj 2020 kl 10:03 skrev Marc Roos <M.Roos(a)f1-outsourcing.eu>: > > I am interested. I am always setting mtu to 9000. To be honest I cannot > imagine there is no optimization since you have less interrupt requests, > and you are able x times as much data. Every time there something > written about optimizing the first thing mention is changing to the mtu > 9000. Because it is quick and easy win. > > This sort of assumes you are not using interrupt coalescing network cards, because if you do, you can get something like hundreds of packets in one single IRQ*, already checksummed and stripped and in recent cards (10-25-40GE) even delivered into the cpu L3 cache by the time you get the int, so if they were 1500 or 9000 on the wire doesn't matter much by then. Even in the bad old days of software handling of all parts packet-related, many things (like mbuf allocations) were optimized for 1500, so 9k packets became just a multiple of a number of 1500 bytes chunks taken from a pool of network buffers anyhow. I'm not trying to shoot down the 9k-vs-1500 idea, but doing a benchmark will give you lots more facts than airing things that are easy to imagine but really doesn't have a huge impact because hw manufacturers worked around things like this a long time ago. If your tests say you win x%, then use it by all means. I'm just not thinking that 10/25/40G networks are so filled that the frame overheads really matter as a matter of % of the packet sizes and the cards offload most of the work to strip the overhead out, so the computer won't notice it was ever there. *) SysKonnect cards had this around 2003, just to get a feeling for what "modern ethernet cards" means in this context. -- May the most significant bit of your life be positive.

3 years, 11 months

1
0
0 0

No output from rbd perf image iotop/iostat

by Eugen Block

Hi all, I have a Nautilus cluster mostly used for RBD (openstack) and CephFS. I have been using rbd perf command from time to time but it doesn't work anymore. I have tried several images in different pools but there's no output at all except for client:~ $ rbd perf image iostat --format json volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472 rbd: waiting for initial image stats It never updates, no matter how long I wait. It stopped working while we were using version 14.2.3, last Friday we updated to 14.2.9 but it still doesn't work. The only relevant mgr log output I'm seeing in debug mode (debug_mgr 5/5) is this: ---snip--- 2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command decoded 4 2020-05-25 10:53:07.072 7fedd5f59700 4 mgr.server _handle_command prefix=rbd perf image stats 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : from='client.710971242 v1:192.168.103.13:0/693257394' entity='client.admin' cmd=[ 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : { 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "prefix": "rbd perf image stats", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "pool_spec": "volumes-ssd/volume-358cd6c5-6fb0-424f-93d9-990ea1963472", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "sort_by": "write_ops", 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : "format": "json" 2020-05-25 10:53:07.072 7fedd5f59700 0 log_channel(audit) log [DBG] : }"]: dispatch 2020-05-25 10:53:07.072 7fedd675a700 4 mgr.server reply reply success 2020-05-25 10:53:07.104 7fedd5f59700 4 mgr.server handle_report from 0x555e60f31200 osd,33 2020-05-25 10:53:07.172 7fedd5f59700 4 mgr.server handle_report from 0x555e5c6c6d80 osd,15 2020-05-25 10:53:07.224 7fedf10c1700 4 mgr send_beacon active ---snip--- What I'm also wondering about is that the "format": "json" doesn't change even if I choose to run --format plain or xml. Does anyone experience the same? The missing output also applies to rbd perf image iotop. Any hints are appreciated. Regards, Eugen

3 years, 11 months

1
0
1 0

remove secondary zone from multisite

by Zhenshi Zhou

Hi all, I'm gonna make my secondary zone offline. How to remove the secondary zone from a mutisite?

3 years, 11 months

1
1
0 0

Re: RGW Garbage Collector

by Matt Benjamin

Hi Manuel, rgw_gc_obj_min_wait -- yes, this is how you control how long rgw waits before removing the stripes of deleted objects the following are more gc performance and proportion of available iops: rgw_gc_processor_max_time -- controls how long gc runs once scheduled; a large value might be 3600 rgw_gc_processor_period -- sets the gc cycle; smaller is more frequent If you want to make gc more aggressive when it is running, set the following (can be increased), which more than doubles the : rgw_gc_max_concurrent_io = 20 rgw_gc_max_trim_chunk = 32 If you want to increase gc fraction of total rgw i/o, increase these (mostly, concurrent_io). regards, Matt On Sun, May 24, 2020 at 4:02 PM EDH - Manuel Rios <mriosfer(a)easydatahost.com> wrote: > > Hi, > > Im looking for any experience optimizing garbage collector with the next configs: > > global advanced rgw_gc_obj_min_wait > global advanced rgw_gc_processor_max_time > global advanced rgw_gc_processor_period > > By default gc expire objects within 2 hours, we're looking to define expire in 10 minutes as our S3 cluster got heavy uploads and deletes. > > Are those params usable? For us doesn't have sense store delete objects 2 hours in a gc. > > Regards > Manuel > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

3 years, 11 months

2
1
0 0

RGW Garbage Collector

by EDH - Manuel Rios

Hi, Im looking for any experience optimizing garbage collector with the next configs: global advanced rgw_gc_obj_min_wait global advanced rgw_gc_processor_max_time global advanced rgw_gc_processor_period By default gc expire objects within 2 hours, we're looking to define expire in 10 minutes as our S3 cluster got heavy uploads and deletes. Are those params usable? For us doesn't have sense store delete objects 2 hours in a gc. Regards Manuel

3 years, 11 months

1
0
0 0

RGW REST API failed request with status code 403

by apely agamakou

Hi, Since my upgrade from 15.2.1 to 15.2.2 i've got this error message at the "Object Gateway" section of the dashboard. RGW REST API failed request with status code 403 (b'{"Code":"InvalidAccessKeyId","RequestId":"tx000000000000000000017-005ecac06c' b'-e349-eu-west-1","HostId":"e349-eu-west-1-default"}') I did try to change my secret-key and access-key without success. I made a tcpdump, i didn't see any special thing like json escape character etc .. Somebody had the same issue ?? Regards.

3 years, 11 months

1
0
0 0

Re: question on ceph node count

by tim taler

yep, my fault I meant replication = 3 .... > > but aren't PGs checksummed so from the remaining PG (given the > > checksum would be right) two new copies could be created? > > Assuming again 3R on 5 nodes, failure domain of host, if 2 nodes go down, there will be 1/3 copies available. Normally a 3R pool has min_size set to 2. > > You can set min_size to 1 temporarily, then those PGs will become active and copies will be created to restore redundancy, but if that remaining OSD is damaged, if there’s a DIMM flake, a cosmic ray, if the wrong OSD crashes or restarts at the wrong time, you can find yourself without the most recent copy of data and be unable to recover. It’s Russian Roulette. I see, but wouldn't ceph try to recreate redundancy by it's own (unless I'm explicitly tell it not to do so)? And if the I/O and load on the cluster isn't too high disk speed good net connectivity good it would recover fairly quickly into healthy redundancy state? Anyhow, I'm not planing on crashing two nodes ;-) I just wanted to get a feeling of how much more secure/robust a setup with five nodes compared to four nodes is.

3 years, 11 months

1
0
0 0

PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

by Peter Lewis

Hello, I came across a section of the documentation that I don't quite understand. In the section about inconsistent PGs it says if one of the shards listed in `rados list-inconsistent-obj` has a read_error the disk is probably bad. Quote from documentation: https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/… `If read_error is listed in the errors attribute of a shard, the inconsistency is likely due to disk errors. You might want to check your disk used by that OSD.` I determined that the disk is bad by looking at the output of smartctl. I would think that replacing the disk by removing the OSD from the cluster and allowing the cluster to recover would fix this inconsistency error without having to run `ceph pg repair`. Can I just replace the OSD and the inconsistency will be resolved by the recovery? Or would it be better to run `ceph pg repair` and then replace the OSD associated with that bad disk? Thanks!

3 years, 11 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2020