Hi Dominic,
This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)
Marcel
> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> DHilsbos(a)PerformAir.com
> www.PerformAir.com
>
>
> -----Original Message-----
> From: Marcel Kuiper [mailto:ceph@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's OSD
> 2 1
> 1 4
> 1 5
> 1 6
> 1 7
> 2 8
> 1 31
> 1 34
> 1 35
> 1 56
> 2 57
> 1 58
> 1 61
> 1 83
> 1 84
> 1 88
> 1 99
> 1 100
> 2 107
> 1 114
> 2 117
> 1 118
> 1 119
> 1 121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's OSD
> 1 61
> 1 83
> 1 86
> 3 108
> 4 109
> 5 110
> 2 112
> 5 113
> 7 114
> 5 115
> 2 116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
Hello,
I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
4 nodes - each node 11 HDD, 1 SSD, 10Gbit network
Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW.
Cluster is now used for testing so no client was using it during my admin operations mentioned below
After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup
this operation, I've set
ceph config set mgr target_max_misplaced_ratio 1
so that whole cluster rebalance as quickly as it can.
I have 3 issues/questions below:
1)
I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged
heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150
after a while OSD's were back so I continued after a while with my tests.
My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it
this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version.
2)
Rebuild was still slow so I increased number of backfills
ceph tell osd.* injectargs "--osd-max-backfills 10"
and reduced recovery sleep time
ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01"
and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see
...
|2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780
7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888
7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888
7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574
start_waiting_for_healthy |
I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated).
So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing
heartbeats to other OSD's and MON. It looks like it is not true.
3)
After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that.
Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized
It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after
peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and
cluster was healthy within seconds.
Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now.
With regards
Jan Pekar
--
============
Ing. Jan Pekař
jan.pekar(a)imatic.cz
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326
============
--
Hi list,
I ran a test with marking an osd out versus setting its crush weight to 0.
I compared to what osds pages were send. The crush map has 3 rooms. This
is what happened.
On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
send to the following osds
NR PG's OSD
2 1
1 4
1 5
1 6
1 7
2 8
1 31
1 34
1 35
1 56
2 57
1 58
1 61
1 83
1 84
1 88
1 99
1 100
2 107
1 114
2 117
1 118
1 119
1 121
All PG's were send to osds on other nodes in the same room, except for 1
PG on osd 114. I think this works as expected
Now I marked the osd in and wait until all stabilized. Then I set the
crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
lowers the crush weight of the node so even less chances that PG's end up
on an osd of the same node. However the result are
NR PG's OSD
1 61
1 83
1 86
3 108
4 109
5 110
2 112
5 113
7 114
5 115
2 116
except for 3 PG's all other PG's ended up on an osd belonging to the same
node :-O. Is this expected behaviour? Can someone explain?? This is on
nautilus 14.2.8.
Thanks
Marcel
>> I'm happy user since 2014 and I never lost any data. When I remember
>> how painfull was the firmware upgrade of emc, netapp, hp storage and
the
>> time passed to recover lost data ..... Ceph is just amazing !
Interesting I always wondered how ceph compares to propriatary
solutions. I am getting the impression that closed source environments
will not survive in the long run. If you see how eg CERN is handling
this 'bug of the year'. It just shows the value of large support base,
and having access to detailed info like this lz4 patch.
Hello!
I've run into a bit of an issue with one of our radosgw production clusters..
Setup is two radosgw nodes behind haproxy loadbalancing, which in turn are connected to the ceph cluster. Everything running 14.2.2 so Nautilus. It's tied to a openstack cluster, so keystone as authentication backend (should really matter though).
Today both rgw backends crashed. Checking logs it seems to be related to dynamic resharding of a bucket, causing Lock errors:
Logs snippet: https://pastebin.com/uBCnhinF
Checking http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021368.html (old), I performed a manual reshard of affected bucket with success (radosgw-admin bucket reshard --bucket="XXX/YYY" --num-shards=256)
Checking the metadata for bucket, it now correctly shows 256, up from 128.
HOWEVER, the dynamic resharding still kept happening and bringing down the backeds. I suspect it is because of the old reshard op hanging around when checking a `reshard list`: https://pastebin.com/dPChwBCT
As the resharding seems to have been successful when running manually, I now want to remove that reshard op, but can't, getting this https://pastebin.com/071kfAsa error when trying..
Right now I had to resort to setting rgw_dynamic_resharding = false in ceph.conf to stop the problem from occuring.
Ideas?
Cheers
Erik
Hello Community,
I would like to ask about help in explanation situation.
There is Rados gateway with EC pool profile k=6 m=4. So it shoud take
something about 1.4 - 2.0 data usage more from raw data if I’m correct.
Rados df shows me:
116 TiB used and WR 26 TiB
Can You explain this? It is about 4.5*WR used data. Why?
Regards
Mateusz Skała
Hi
I'm using Ceph Dashboard on Firefox on Macbook, and lately it has been hanging with "A web page is loading slowly - stop it or wait". Some pages load load, but some show that warning and stop loading.
I'm on "Firefox Extended Support release 68.10.0esr (64-bit)"
Anyone else seen this issue?
Hi,
On a previously installed machine I get :
# rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build
Build Date : Thu 25 Jun 2020 08:08:52 PM CEST
Build Host : braggi01.front.sepia.ceph.com
# rpm -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux
libselinux-utils
selinux-policy-base >= 3.13.1-252.el7_7.6
On a VM setup the same way, but now failing the ceph install, this is what I get with the exact same repos (wich I sync from ceph ones) :
# repoquery -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux
libselinux-utils
selinux-policy-base >= 3.13.1-266.el7_8.1
# After OS update and then ceph install:
# rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build
Build Date : Thu 09 Jul 2020 08:09:46 PM CEST
Build Host : braggi03.front.sepia.ceph.com
You'll notice the requirements changed for selinux-policy-base from 3.13.1-252.el7_7.6 to 3.13.1-266.el7_8.1
And the build info changed too.
But the RPM package version and release did not change and I don't get it.
I would have assumed an rpm change would at least infer a minor release bump ?
Are the RPMs being rebuilt with the same versions/release but with different OSes version (frequently ??) ?
Thanks && Regards
I just want to thank the Ceph community, and the Ceph developers for such a wonderful product.
We had a power outage on Saturday, and both Ceph clusters went offline, along with all of our other servers.
Bringing Ceph back to full functionality was an absolute breeze, no problems, no hiccups, no nothing. Just start the servers, and watch everything sort itself out.
Again; Thank you!
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International, Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
aha, thanks very much for pointing out, Anthony!
Just a summary for the screenshot pasted in my previous email. Based on my understanding, "ceph daemon osd.x ops" or "ceph daemon osd.x dump_ops_in_flight" shows the ops currently being processed in the osd.x. I also noticed that there is another ceph cli that shows the number of ops with particular osds, i.e. "ceph osd status {<bucket>}", and the number doesn't match with the daemon one, nor "ceph daemon osd.x dump_historic_ops". I haven't looked into the code yet, and just try to get a quick help from the community about how the ops number from "ceph osd status {<bucket>}" is calculated? Is it a statistical value?
Thanks!
------------------ Original ------------------
From: "Anthony D'Atri" <anthony.datri(a)gmail.com>gt;;
Date: Tue, Jul 21, 2020 02:01 AM
To: "rainning"<tweetypie(a)qq.com>gt;;
Subject: Re: [ceph-users] "ceph daemon osd.x ops" shows different number from "ceph osd status <bucket>"
Your messages are very hard to read.
On Jul 20, 2020, at 12:58 AM, rainning <tweetypie(a)qq.com> wrote:
"ceph daemon osd.x ops" shows ops currently in flight, the number is different from "ceph osd status <bucket&gt;". Is the number of ops in "ceph osd status <bucket&gt;" also the ops currently in flight?
root@osd021:~#&nbsp;ceph&nbsp;daemon&nbsp;osd.636&nbsp;ops
{
&nbsp;&nbsp;&nbsp;&nbsp;"ops":&nbsp;[],
&nbsp;&nbsp;&nbsp;&nbsp;"num_ops":&nbsp;0
}
root@osd021:~# ceph osd status osd021
+-----+--------+-------+-------+--------+---------+--------+---------+-----------+
|&nbsp; id |&nbsp; host&nbsp; |&nbsp; used | avail | wr ops | wr data | rd ops | rd data |&nbsp;&nbsp; state&nbsp;&nbsp; |
+-----+--------+-------+-------+--------+---------+--------+---------+-----------+
|&nbsp; 37 | osd021 |&nbsp; 542G | 1134G |&nbsp;&nbsp; 22&nbsp;&nbsp; |&nbsp;&nbsp; 225k&nbsp; |&nbsp;&nbsp; 12&nbsp;&nbsp; |&nbsp;&nbsp; 274k&nbsp; | exists,up |
|&nbsp; 91 | osd021 |&nbsp; 567G | 1109G |&nbsp;&nbsp; 38&nbsp;&nbsp; |&nbsp;&nbsp; 281k&nbsp; |&nbsp;&nbsp; 12&nbsp;&nbsp; |&nbsp; 1006k&nbsp; | exists,up |
| 147 | osd021 |&nbsp; 551G | 1125G |&nbsp;&nbsp; 32&nbsp;&nbsp; |&nbsp;&nbsp; 307k&nbsp; |&nbsp;&nbsp;&nbsp; 4&nbsp;&nbsp; |&nbsp;&nbsp; 247k&nbsp; | exists,up |
| 201 | osd021 |&nbsp; 549G | 1127G |&nbsp;&nbsp; 82&nbsp;&nbsp; |&nbsp; 1505k&nbsp; |&nbsp;&nbsp;&nbsp; 5&nbsp;&nbsp; |&nbsp; 42.4k&nbsp; | exists,up |
| 264 | osd021 | 1503G | 2222G |&nbsp;&nbsp;&nbsp; 3&nbsp;&nbsp; |&nbsp; 55.2k&nbsp; |&nbsp;&nbsp;&nbsp; 4&nbsp;&nbsp; |&nbsp; 12.2k&nbsp; | exists,up |
| 317 | osd021 | 1537G | 2188G |&nbsp;&nbsp; 13&nbsp;&nbsp; |&nbsp;&nbsp; 138k&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 6190&nbsp;&nbsp; | exists,up |
| 375 | osd021 | 1369G | 2356G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp;&nbsp; 834k&nbsp; |&nbsp;&nbsp;&nbsp; 5&nbsp;&nbsp; |&nbsp; 67.4k&nbsp; | exists,up |
| 430 | osd021 | 1424G | 2301G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 8192&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; 6&nbsp;&nbsp; |&nbsp; 1661k&nbsp; | exists,up |
| 480 | osd021 | 1435G | 2289G |&nbsp;&nbsp;&nbsp; 3&nbsp;&nbsp; |&nbsp; 16.6k&nbsp; |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp; 42.4k&nbsp; | exists,up |
| 535 | osd021 | 1494G | 2231G |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp; 12.0k&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 27.8k&nbsp; | exists,up |
| 584 | osd021 | 1484G | 2241G |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 5529&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp; |&nbsp; 12.7k&nbsp; | exists,up |
| 636 | osd021 | 1394G | 2331G |&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp; |&nbsp;&nbsp; 850k&nbsp; |&nbsp; 406&nbsp;&nbsp; |&nbsp; 78.1M&nbsp; | exists,up |
+-----+--------+-------+-------+--------+---------+--------+---------+-----------+
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io