- ceph-users - lists.ceph.io

by Marcel Kuiper

Hi Dominic, This cluster is running 14.2.8 (nautilus) There's 172 osds divided over 19 nodes. There are currently 10 pools. All pools have 3 replica's of data There are 3968 PG's (the cluster is not yet fully in use. The number of PGs is expected to grow) Marcel > Marcel; > > Short answer; yes, it might be expected behavior. > > PG placement is highly dependent on the cluster layout, and CRUSH rules. > So... Some clarifying questions. > > What version of Ceph are you running? > How many nodes do you have? > How many pools do you have, and what are their failure domains? > > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International, Inc. > DHilsbos(a)PerformAir.com > www.PerformAir.com > > > -----Original Message----- > From: Marcel Kuiper [mailto:ceph@mknet.nl] > Sent: Tuesday, July 21, 2020 6:52 AM > To: ceph-users(a)ceph.io > Subject: [ceph-users] osd out vs crush reweight > > Hi list, > > I ran a test with marking an osd out versus setting its crush weight to 0. > I compared to what osds pages were send. The crush map has 3 rooms. This > is what happened. > > On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were > send to the following osds > > NR PG's OSD > 2 1 > 1 4 > 1 5 > 1 6 > 1 7 > 2 8 > 1 31 > 1 34 > 1 35 > 1 56 > 2 57 > 1 58 > 1 61 > 1 83 > 1 84 > 1 88 > 1 99 > 1 100 > 2 107 > 1 114 > 2 117 > 1 118 > 1 119 > 1 121 > > All PG's were send to osds on other nodes in the same room, except for 1 > PG on osd 114. I think this works as expected > > Now I marked the osd in and wait until all stabilized. Then I set the > crush weight to 0. ceph osd crush reweight osd.111 0. I thought this > lowers the crush weight of the node so even less chances that PG's end up > on an osd of the same node. However the result are > > NR PG's OSD > 1 61 > 1 83 > 1 86 > 3 108 > 4 109 > 5 110 > 2 112 > 5 113 > 7 114 > 5 115 > 2 116 > > except for 3 PG's all other PG's ended up on an osd belonging to the same > node :-O. Is this expected behaviour? Can someone explain?? This is on > nautilus 14.2.8. > > Thanks > > Marcel > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

3 years, 9 months

2
6
0 0

Problem with OSD::osd_op_tp thread had timed out and other connected issues

by Jan Pekař - Imatic

Hello, I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW. Cluster is now used for testing so no client was using it during my admin operations mentioned below After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup this operation, I've set ceph config set mgr target_max_misplaced_ratio 1 so that whole cluster rebalance as quickly as it can. I have 3 issues/questions below: 1) I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150 after a while OSD's were back so I continued after a while with my tests. My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version. 2) Rebuild was still slow so I increased number of backfills ceph tell osd.* injectargs "--osd-max-backfills 10" and reduced recovery sleep time ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01" and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see ... |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy | I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated). So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing heartbeats to other OSD's and MON. It looks like it is not true. 3) After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that. Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and cluster was healthy within seconds. Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now. With regards Jan Pekar -- ============ Ing. Jan Pekař jan.pekar(a)imatic.cz ---- Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 ============ --

3 years, 9 months

4
7
0 0

osd out vs crush reweight

by Marcel Kuiper

Hi list, I ran a test with marking an osd out versus setting its crush weight to 0. I compared to what osds pages were send. The crush map has 3 rooms. This is what happened. On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were send to the following osds NR PG's OSD 2 1 1 4 1 5 1 6 1 7 2 8 1 31 1 34 1 35 1 56 2 57 1 58 1 61 1 83 1 84 1 88 1 99 1 100 2 107 1 114 2 117 1 118 1 119 1 121 All PG's were send to osds on other nodes in the same room, except for 1 PG on osd 114. I think this works as expected Now I marked the osd in and wait until all stabilized. Then I set the crush weight to 0. ceph osd crush reweight osd.111 0. I thought this lowers the crush weight of the node so even less chances that PG's end up on an osd of the same node. However the result are NR PG's OSD 1 61 1 83 1 86 3 108 4 109 5 110 2 112 5 113 7 114 5 115 2 116 except for 3 PG's all other PG's ended up on an osd belonging to the same node :-O. Is this expected behaviour? Can someone explain?? This is on nautilus 14.2.8. Thanks Marcel

3 years, 9 months

2
1
0 0

Re: Thank you!

by Marc Roos

>> I'm happy user since 2014 and I never lost any data. When I remember >> how painfull was the firmware upgrade of emc, netapp, hp storage and the >> time passed to recover lost data ..... Ceph is just amazing ! Interesting I always wondered how ceph compares to propriatary solutions. I am getting the impression that closed source environments will not survive in the long run. If you see how eg CERN is handling this 'bug of the year'. It just shows the value of large support base, and having access to detailed info like this lz4 patch.

3 years, 9 months

2
1
0 0

[ceph] [nautilus][ceph-ansible] - Dynamic bucket resharding problem

by Erik Johansson

Hello! I've run into a bit of an issue with one of our radosgw production clusters.. Setup is two radosgw nodes behind haproxy loadbalancing, which in turn are connected to the ceph cluster. Everything running 14.2.2 so Nautilus. It's tied to a openstack cluster, so keystone as authentication backend (should really matter though). Today both rgw backends crashed. Checking logs it seems to be related to dynamic resharding of a bucket, causing Lock errors: Logs snippet: https://pastebin.com/uBCnhinF Checking http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021368.html (old), I performed a manual reshard of affected bucket with success (radosgw-admin bucket reshard --bucket="XXX/YYY" --num-shards=256) Checking the metadata for bucket, it now correctly shows 256, up from 128. HOWEVER, the dynamic resharding still kept happening and bringing down the backeds. I suspect it is because of the old reshard op hanging around when checking a `reshard list`: https://pastebin.com/dPChwBCT As the resharding seems to have been successful when running manually, I now want to remove that reshard op, but can't, getting this https://pastebin.com/071kfAsa error when trying.. Right now I had to resort to setting rgw_dynamic_resharding = false in ceph.conf to stop the problem from occuring. Ideas? Cheers Erik

3 years, 9 months

1
0
0 0

EC profile datastore usage - question

by Mateusz Skała

Hello Community, I would like to ask about help in explanation situation. There is Rados gateway with EC pool profile k=6 m=4. So it shoud take something about 1.4 - 2.0 data usage more from raw data if I’m correct. Rados df shows me: 116 TiB used and WR 26 TiB Can You explain this? It is about 4.5*WR used data. Why? Regards Mateusz Skała

3 years, 9 months

4
4
0 0

Ceph Dashboard and Firefox

by biohazd＠yahoo.com

Hi I'm using Ceph Dashboard on Firefox on Macbook, and lately it has been hanging with "A web page is loading slowly - stop it or wait". Some pages load load, but some show that warning and stop loading. I'm on "Firefox Extended Support release 68.10.0esr (64-bit)" Anyone else seen this issue?

3 years, 9 months

2
1
0 0

ceph (rhel) packages rebuilt without release change ?

by SCHAER Frederic

Hi, On a previously installed machine I get : # rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build Build Date : Thu 25 Jun 2020 08:08:52 PM CEST Build Host : braggi01.front.sepia.ceph.com # rpm -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux libselinux-utils selinux-policy-base >= 3.13.1-252.el7_7.6 On a VM setup the same way, but now failing the ceph install, this is what I get with the exact same repos (wich I sync from ceph ones) : # repoquery -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux libselinux-utils selinux-policy-base >= 3.13.1-266.el7_8.1 # After OS update and then ceph install: # rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build Build Date : Thu 09 Jul 2020 08:09:46 PM CEST Build Host : braggi03.front.sepia.ceph.com You'll notice the requirements changed for selinux-policy-base from 3.13.1-252.el7_7.6 to 3.13.1-266.el7_8.1 And the build info changed too. But the RPM package version and release did not change and I don't get it. I would have assumed an rpm change would at least infer a minor release bump ? Are the RPMs being rebuilt with the same versions/release but with different OSes version (frequently ??) ? Thanks && Regards

3 years, 9 months

1
0
0 0

Thank you!

by DHilsbos＠performair.com

I just want to thank the Ceph community, and the Ceph developers for such a wonderful product. We had a power outage on Saturday, and both Ceph clusters went offline, along with all of our other servers. Bringing Ceph back to full functionality was an absolute breeze, no problems, no hiccups, no nothing. Just start the servers, and watch everything sort itself out. Again; Thank you! Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International, Inc. DHilsbos(a)PerformAir.com www.PerformAir.com

3 years, 9 months

6
5
0 0

Fw:Re: "ceph daemon osd.x ops" shows different number from "ceph osd status <bucket>"

by rainning

aha, thanks very much for pointing out, Anthony! Just a summary for the screenshot pasted in my previous email. Based on my understanding, "ceph daemon osd.x ops" or "ceph daemon osd.x dump_ops_in_flight" shows the ops currently being processed in the osd.x. I also noticed that there is another ceph cli that shows the number of ops with particular osds, i.e. "ceph osd status {<bucket>}", and the number doesn't match with the daemon one, nor "ceph daemon osd.x dump_historic_ops". I haven't looked into the code yet, and just try to get a quick help from the community about how the ops number from "ceph osd status {<bucket>}" is calculated? Is it a statistical value? Thanks! ------------------ Original ------------------ From: "Anthony D'Atri" <anthony.datri(a)gmail.com>gt;; Date: Tue, Jul 21, 2020 02:01 AM To: "rainning"<tweetypie(a)qq.com>gt;; Subject: Re: [ceph-users] "ceph daemon osd.x ops" shows different number from "ceph osd status <bucket>" Your messages are very hard to read. On Jul 20, 2020, at 12:58 AM, rainning <tweetypie(a)qq.com> wrote: "ceph daemon osd.x ops" shows ops currently in flight, the number is different from "ceph osd status <bucket&gt;". Is the number of ops in "ceph osd status <bucket&gt;" also the ops currently in flight? root@osd021:~#&nbsp;ceph&nbsp;daemon&nbsp;osd.636&nbsp;ops { &nbsp;&nbsp;&nbsp;&nbsp;"ops":&nbsp;[], &nbsp;&nbsp;&nbsp;&nbsp;"num_ops":&nbsp;0 } root@osd021:~# ceph osd status osd021 +-----+--------+-------+-------+--------+---------+--------+---------+-----------+ |&nbsp; id |&nbsp; host&nbsp; |&nbsp; used | avail | wr ops | wr data | rd ops | rd data |&nbsp;&nbsp; state&nbsp;&nbsp; | +-----+--------+-------+-------+--------+---------+--------+---------+-----------+ |&nbsp; 37 | osd021 |&nbsp; 542G | 1134G |&nbsp;&nbsp; 22&nbsp;&nbsp; |&nbsp;&nbsp; 225k&nbsp; |&nbsp;&nbsp; 12&nbsp;&nbsp; |&nbsp;&nbsp; 274k&nbsp; | exists,up | |&nbsp; 91 | osd021 |&nbsp; 567G | 1109G |&nbsp;&nbsp; 38&nbsp;&nbsp; |&nbsp;&nbsp; 281k&nbsp; |&nbsp;&nbsp; 12&nbsp;&nbsp; |&nbsp; 1006k&nbsp; | exists,up | | 147 | osd021 |&nbsp; 551G | 1125G |&nbsp;&nbsp; 32&nbsp;&nbsp; |&nbsp;&nbsp; 307k&nbsp; |&nbsp;&nbsp;&nbsp; 4&nbsp;&nbsp; |&nbsp;&nbsp; 247k&nbsp; | exists,up | | 201 | osd021 |&nbsp; 549G | 1127G |&nbsp;&nbsp; 82&nbsp;&nbsp; |&nbsp; 1505k&nbsp; |&nbsp;&nbsp;&nbsp; 5&nbsp;&nbsp; |&nbsp; 42.4k&nbsp; | exists,up | | 264 | osd021 | 1503G | 2222G |&nbsp;&nbsp;&nbsp; 3&nbsp;&nbsp; |&nbsp; 55.2k&nbsp; |&nbsp;&nbsp;&nbsp; 4&nbsp;&nbsp; |&nbsp; 12.2k&nbsp; | exists,up | | 317 | osd021 | 1537G | 2188G |&nbsp;&nbsp; 13&nbsp;&nbsp; |&nbsp;&nbsp; 138k&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 6190&nbsp;&nbsp; | exists,up | | 375 | osd021 | 1369G | 2356G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp;&nbsp; 834k&nbsp; |&nbsp;&nbsp;&nbsp; 5&nbsp;&nbsp; |&nbsp; 67.4k&nbsp; | exists,up | | 430 | osd021 | 1424G | 2301G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 8192&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; 6&nbsp;&nbsp; |&nbsp; 1661k&nbsp; | exists,up | | 480 | osd021 | 1435G | 2289G |&nbsp;&nbsp;&nbsp; 3&nbsp;&nbsp; |&nbsp; 16.6k&nbsp; |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp; 42.4k&nbsp; | exists,up | | 535 | osd021 | 1494G | 2231G |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp; 12.0k&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 27.8k&nbsp; | exists,up | | 584 | osd021 | 1484G | 2241G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 5529&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 12.7k&nbsp; | exists,up | | 636 | osd021 | 1394G | 2331G |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp;&nbsp; 850k&nbsp; |&nbsp; 406&nbsp;&nbsp; |&nbsp; 78.1M&nbsp; | exists,up | +-----+--------+-------+-------+--------+---------+--------+---------+-----------+ _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 9 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users