September 2019 - ceph-users

Slow peering caused by "wait for new map"

by Bryan Stillwell

Our test cluster is seeing a problem where peering is going incredibly slow shortly after upgrading it to Nautilus (14.2.2) from Luminous (12.2.12). From what I can tell it seems to be caused by "wait for new map" taking a long time. When looking at dump_historic_slow_ops on pretty much any OSD I see stuff like this: # ceph daemon osd.112 dump_historic_slow_ops [...snip...] { "description": "osd_pg_create(e180614 287.4b:177739 287.75:177739 287.1c3:177739 287.1cf:177739 287.1e1:177739 287.2dd:177739 287.2fc:177739 287.342:177739 287.382:177739)", "initiated_at": "2019-09-03 15:12:41.366514", "age": 4800.8847047119998, "duration": 4780.0579745630002, "type_data": { "flag_point": "started", "events": [ { "time": "2019-09-03 15:12:41.366514", "event": "initiated" }, { "time": "2019-09-03 15:12:41.366514", "event": "header_read" }, { "time": "2019-09-03 15:12:41.366501", "event": "throttled" }, { "time": "2019-09-03 15:12:41.366547", "event": "all_read" }, { "time": "2019-09-03 15:39:03.379456", "event": "dispatched" }, { "time": "2019-09-03 15:39:03.379477", "event": "wait for new map" }, { "time": "2019-09-03 15:39:03.522376", "event": "wait for new map" }, { "time": "2019-09-03 15:53:55.912499", "event": "wait for new map" }, { "time": "2019-09-03 15:59:37.909063", "event": "wait for new map" }, { "time": "2019-09-03 16:00:43.356023", "event": "wait for new map" }, { "time": "2019-09-03 16:20:50.575498", "event": "wait for new map" }, { "time": "2019-09-03 16:31:48.689415", "event": "started" }, { "time": "2019-09-03 16:32:21.424489", "event": "done" } ] } It always seems to be in osd_pg_create() with multiple "wait for new map" messages before it finally does something. What could be causing it so long to get the OSD map? The mons don't appear to be overloaded in any way. Thanks, Bryan

4 years, 7 months

2
5
0 0

v14.2.3 Nautilus released

by Abhishek Lekshmanan

This is the third bug fix release of Ceph Nautilus release series. This release fixes a security issue. We recommend all Nautilus users upgrade to this release. For upgrading from older releases of ceph, general guidelines for upgrade to nautilus must be followed Notable Changes --------------- * CVE-2019-10222 - Fixed a denial of service vulnerability where an unauthenticated client of Ceph Object Gateway could trigger a crash from an uncaught exception * Nautilus-based librbd clients can now open images on Jewel clusters. * The RGW `num_rados_handles` has been removed. If you were using a value of `num_rados_handles` greater than 1, multiply your current `objecter_inflight_ops` and `objecter_inflight_op_bytes` parameters by the old `num_rados_handles` to get the same throttle behavior. * The secure mode of Messenger v2 protocol is no longer experimental with this release. This mode is now the preferred mode of connection for monitors. * "osd_deep_scrub_large_omap_object_key_threshold" has been lowered to detect an object with large number of omap keys more easily. For a detailed changelog please refer to the official release notes entry at the ceph blog: https://ceph.io/releases/v14-2-3-nautilus-released/ Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.3.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 0f776cf838a1ae3130b2b73dc26be9c95c6ccc39 -- Abhishek Lekshmanan SUSE Software Solutions Germany GmbH

4 years, 7 months

3
2
0 0

Bucket policies with OpenStack integration and limiting access

by shubjero

Good day, We have a Ceph cluster and make use of object-storage and integrate with OpenStack. Each OpenStack project/tenant is given a radosgw user which allows all keystone users of that project to access the object-storage as that single radosgw user. The radosgw user is the project id of the OpenStack project/tenant. Sometimes we have use cases where we want to access the object-storage outside of the swift-api and use tools like the aws-cli or homebrew java applications to access the object storage. For this use case what we do is generate S3 access/secret key for the specific radosgw user and they have full access to the object storage for that OpenStack project/tenant. What we want to know is if it is possible to provide granular access to containers within a single OpenStack project using S3 access keys or S3 sub-users? I know that the Swift API has ACL's that can limit by keystone user but we are exploring the possibility of doing this using S3 and S3 bucket policies so that the tools our team are developing (open source) are more transferrable to AWS S3 and Rados GW. Thanks all, Jared Baker Cloud Architect, OICR

4 years, 7 months

1
0
0 0

Strange hardware behavior

by Fyodor Ustinov

Hi! I understand that this question is not quite for this mailing list, but nonetheless, experts who may be encountered this have gathered here. I have 24 servers, and on each, after six months of work, the following began to happen: [root@S-26-5-1-2 cph]# uname -a Linux S-26-5-1-2 5.2.11-1.el7.elrepo.x86_64 #1 SMP Thu Aug 29 08:10:52 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdc bs=1M count=1000 oflag=sync 1048576000 bytes (1.0 GB) copied, 3.76334 s, 279 MB/s [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdd bs=1M count=1000 oflag=sync 1048576000 bytes (1.0 GB) copied, 4.54834 s, 231 MB/s sdc - SSD disk. sdd - HDD. It can be seen that ssd works somehow slowly, and hdd - too quickly. Reboot - nothing changes. And only poweroff/poweron cycle change behavior to normal: [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdc bs=1M count=1000 oflag=sync 1048576000 bytes (1.0 GB) copied, 3.24042 s, 324 MB/s [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdd bs=1M count=1000 oflag=sync 1048576000 bytes (1.0 GB) copied, 13.7709 s, 76.1 MB/s Absoluteli nothing in system and ceph log (this servers used for OSD) about that. Perhaps someone has encountered similar behavior? WBR, Fyodor.

4 years, 7 months

8
12
0 0

Nautilus packaging on stretch

by mjclark.00＠gmail.com

Hello, I'm trying to install nautilus on stretch following the directions here https://docs.ceph.com/docs/master/install/get-packages/ . However, it seems the stretch repo only includes ceph-deploy. Are the rest of the packages missing on purpose or have I missed something obvious? Thanks

4 years, 7 months

3
2
0 0

slow requests with the ceph osd dead lock?

by linghucongsong

Hi all! My ceph version is 10.2.11 and I use rgw and EC(7+3). When I use muliti clients to read and write on one rgw bucket .The bucket have 200 shards. There look like dead look on osd? Thanks for all! The osd block ops all have the similiar below logs and the block osds not always on the same osd. "description": "osd_op(client.3147989.0:2782587465 13.36f21cde .dir.517af746-28f1-454c-ba41-0c4fd51af270.896917.11.99 [call rgw.bucket_complete_op] snapc 0=[] ack+ondisk+write+known_if_redirected e26026)", "initiated_at": "2019-09-04 13:38:48.089766", "age": 3520.459786, "duration": 3521.432087, "type_data": [ "delayed", { "client": "client.3147989", "tid": 2782587465 }, [ { "time": "2019-09-04 13:38:48.089766", "event": "initiated" }, { "time": "2019-09-04 13:38:48.089794", "event": "queued_for_pg" }, { "time": "2019-09-04 13:38:48.180941", "event": "reached_pg" }, { "time": "2019-09-04 13:38:48.180962", "event": "waiting for rw locks" }, { "time": "2019-09-04 13:38:48.430598", "event": "reached_pg" }, { "time": "2019-09-04 13:38:48.430616", "event": "waiting for rw locks" }, { "time": "2019-09-04 13:38:49.150673", "event": "reached_pg" }, { "time": "2019-09-04 13:38:49.150691", "event": "waiting for rw locks" }, { "time": "2019-09-04 13:38:51.421887", "event": "reached_pg" }, { "time": "2019-09-04 13:38:51.421904", "event": "waiting for rw locks" }, { "time": "2019-09-04 13:38:51.990943", "event": "reached_pg" }, { "time": "2019-09-04 13:38:51.990961", "event": "waiting for rw locks" }, { "time": "2019-09-04 13:38:54.343921", "event": "reached_pg" }, { "time": "2019-09-04 13:38:54.343938", "event": "waiting for rw locks" }

4 years, 7 months

2
1
0 0

rgw auth error with self region name

by 黄明友

hi,all: I use the aws s3 java sdk , when make a new bucket , with the hostname " s3.my-self.mydomain.com" ; will get a auth error. but , when I use the hostname " s3.us-east-1.mydomian.com" ,will be ok, why ? 黄明友 IT基础架构部经理 V.Photos 云摄影移动电话: +86 13540630430 客服电话：400 - 806 - 5775 电子邮件: hmy(a)v.photos 官方网址: www.v.photos 上海黄浦区中山东二路88号外滩SOHO3Q F栋 2层北京朝阳区光华路9号光华路SOHO二期南二门SOHO3Q 1层广州天河区林和中路136号天誉花园二期3Wcoffice 天誉青创社区深圳南山区蛇口网谷科技大厦二期A座102网谷双创街 1层成都成华区建设路世贸广场 7层

4 years, 7 months

2
1
0 0

ceph mons stuck in electing state

by Nick

Hello, I have an old ceph 0.94.10 cluster that had 10 storage nodes with one extra management node used for running commands on the cluster. Over time we'd had some hardware failures on some of the storage nodes, so we're down to 6, with ceph-mon running on the management server and 4 of the storage nodes. We attempted deploying a ceph.conf change and restarted ceph-mon and ceph-osd services, but the cluster went down on us. We found all the ceph-mons are stuck in the electing state, I can't get any response from any ceph commands but I found I can contact the daemon directly and get this information (hostnames removed for privacy reasons): root@<mgmt1>:~# ceph daemon mon.<mgmt1> mon_status { "name": "<mgmt1>", "rank": 0, "state": "electing", "election_epoch": 4327, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 10, "fsid": "69611c75-200f-4861-8709-8a0adc64a1c9", "modified": "2019-08-23 08:20:57.620147", "created": "0.000000", "mons": [ { "rank": 0, "name": "<mgmt1>", "addr": "[fdc4:8570:e14c:132d::15]:6789\/0" }, { "rank": 1, "name": "<mon1>", "addr": "[fdc4:8570:e14c:132d::16]:6789\/0" }, { "rank": 2, "name": "<mon2>", "addr": "[fdc4:8570:e14c:132d::28]:6789\/0" }, { "rank": 3, "name": "<mon3>", "addr": "[fdc4:8570:e14c:132d::29]:6789\/0" }, { "rank": 4, "name": "<mon4>", "addr": "[fdc4:8570:e14c:132d::151]:6789\/0" } ] } } Is there any way to force the cluster back into a quorum even if it's just one mon running to start it up? I've tried exporting the mgmt's monmap and injecting it into the other nodes, but it didn't make any difference. Thanks!

4 years, 7 months

4
5
0 0

Fwd: Applications slow in VMs running RBD disks

by Gesiel Galvão Bernardes

Hi, Em qui, 29 de ago de 2019 às 22:32, fengyd <fengyd81(a)gmail.com> escreveu: > Hi, > > The issue is still there? > Yes, yet. > I have met an IO peformance issue recently and found that the count of the > max fd for the Qemu/KVM was not bigger enough, the fd for Qemu/KVM was > exhausted, the issue was solved after increasing the count of the max fd. > > How check and increase max fd for qemu? Can you give-me way? Regards, Gesiel > > On Wed, 21 Aug 2019 at 20:53, Gesiel Galvão Bernardes < > gesiel.bernardes(a)gmail.com> wrote: > >> Hi Eliza, >> >> Em qua, 21 de ago de 2019 às 09:30, Eliza <eli(a)chinabuckets.com> >> escreveu: >> >>> Hi >>> >>> on 2019/8/21 20:25, Gesiel Galvão Bernardes wrote: >>> > I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I >>> > having problems with slowness in aplications that many times not >>> > consuming very CPU or RAM. This problem affect mostly Windows. >>> Appearly >>> > the problem is that normally the application load many short files >>> (ex: >>> > DLLs) and these files take a long time to load, generating a slowness. >>> >>> Did you check/test your network connection? >>> Do you have a fast network setup? >> >> >> I have a bond of two 10GB interfaces, with little use. >> >>> >>> >> regards. >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users(a)lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users(a)lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >

4 years, 7 months

2
1
0 0

Nautilus 14.2.3 packages appearing on the mirrors

by Sasha Litvak

Is there an actual release or an accident?

4 years, 7 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2019