November 2020 - ceph-users

by Szabo, Istvan (Agoda)

Hi, I haven't really find any documentation about how to size radosgw. One redhat doc says we need to decide the ratio like 1:50 or 1:100 osd / rgw. I had an issue earlier where I had a user who source loadbalanced so always went to the same radosgateway and 1 time just maxed out. So the question is, how to monitor rgw, what kind of values or ... ? How to size RGW? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 5 months

1
0
0 0

using fio tool in ceph development cluster (vstart.sh)

by Bobby

Hi, I am using the Ceph development cluster through vstart.sh script. I would like to measure/benchmark read and write performance (benchmark ceph at a low level). For that I want to use the fio tool. Can we use fio on the development cluster? AFAIK, we can..... I have seen the fio option in the CMakeLists.txt of the Ceph source code. Thanks in advance. BR

3 years, 5 months

2
1
0 0

v15.2.6 Octopus released

by David Galloway

This is the 6th backport release in the Octopus series. This releases fixes a security flaw affecting Messenger V2 for Octopus & Nautilus. We recommend users to update to this release. Notable Changes --------------- * CVE 2020-25660: Fix a regression in Messenger V2 replay attacks Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-15.2.6.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: cb8c61a60551b72614257d632a574d420064c17a

3 years, 5 months

3
2
0 0

question about rgw index pool

by Adrian Nicolae

Hi guys, I'll have a future Ceph deployment with the following setup : - 7 powerful nodes running Ceph 15.2.x with mon, rgw and osd daemons colocated - 100+ SATA drives with EC 4+2 - every OSD will have a large NVME partition (300GB) for rocksdb - the storage will be dedicated for rgw traffic using Swift (no cephfs, no rbd) - we will probably have a lot of 4MB ceph objects (more than 400k millions) after the first year. Does it still matter to have the rgw index pool on dedicated SSD/NVME drives or it's good enough to spread it on many SATA drives with 3x replication and a large PG number ? Thanks.

3 years, 5 months

1
0
0 0

Manual bucket resharding problem

by Mateusz Skała

Hello Community. I need Your help. Few days ago I started manual resharding of one bucket with large objects. Unfortunately I interrupted this by Ctrl+c. At now I can’t start this process again. There is message: # radosgw-admin bucket reshard --bucket objects --num-shards 2 ERROR: the bucket is currently undergoing resharding and cannot be added to the reshard list at this time But list of reshard process is empty: # radosgw-admin reshard list [] # radosgw-admin reshard status --bucket objects [ { "reshard_status": "not-resharding", "new_bucket_instance_id": "", "num_shards": -1 } ] How can I fix this situation ? How to restore possibility resharding this bucket? And BTW is resharding process locking writes/reads on bucket? Regards Mateusz Skała

3 years, 5 months

1
0
0 0

Problems with mon

by Mateusz Skała

Hello Community, I have problems with ceph-mons in docker. Docker pods are starting but I got a lot of messages "e6 handle_auth_request failed to assign global_id” in log. 2 mons are up but I can’t send any ceph commands. Regards Mateusz

3 years, 5 months

2
4
0 0

Multisite design details

by Girish Aher

Hello All, I am looking to understand some of the internal details on how multisite is architected. On the Ceph user list, I see mentions of metadata logs, bucket index shard logs etc. but there is just no documentation anywhere I could find on how multisite works using these. Could someone please point me in the right direction here? Apart from the code, is there any resource that could help me with understanding the multisite internals? --Girish

3 years, 5 months

1
0
0 0

The serious side-effect of rbd cache setting

by norman

Hi All, We're testing the rbd cache setting for openstack(Ceph 14.2.5 Bluestore 3-replica), and an odd problem found: 1. Setting librbd cache [client] rbd cache = true rbd cache size = 16777216 rbd cache max dirty = 12582912 rbd cache target dirty = 8388608 rbd cache max dirty age = 1 rbd cache writethrough until flush = true 2. Running rbd bench rbd -c /etc/ceph/ceph.conf \ -k /etc/ceph/keyring2 \ -n client.rbd-openstack-002 bench \ --io-size 4K \ --io-threads 1 \ --io-pattern seq \ --io-type read \ --io-total 100G \ openstack-volumes/image-you-can-drop-me 3. Start another test rbd -c /etc/ceph/ceph.conf \ -k /etc/ceph/keyring2 \ -n client.rbd-openstack-002 bench \ --io-size 4K \ --io-threads 1 \ --io-pattern rand \ --io-type write \ --io-total 100G \ openstack-volumes/image-you-can-drop-me2 Running for minutes, I found the read test almost hung for a while: 69 152069 2375.21 9728858.72 70 153627 2104.63 8620569.93 71 155748 1956.04 8011953.10 72 157665 1945.84 7970177.24 73 159661 1947.64 7977549.44 74 161522 1890.45 7743277.01 75 163583 1991.04 8155301.58 76 165791 2008.44 8226566.26 77 168433 2153.43 8820438.66 78 170269 2121.43 8689377.16 79 172511 2197.62 9001467.33 80 174845 2252.22 9225091.00 81 177089 2259.42 9254579.83 82 179675 2248.22 9208708.30 83 182053 2356.61 9652679.11 84 185087 2515.00 10301433.50 99 185345 550.16 2253434.96 101 185346 407.76 1670187.73 102 185348 282.44 1156878.38 103 185350 162.34 664931.53 104 185353 12.86 52681.27 105 185357 1.93 7916.89 106 185361 2.74 11235.38 107 185367 3.27 13379.95 108 185375 5.08 20794.43 109 185384 6.93 28365.91 110 185403 9.19 37650.06 111 185438 17.47 71544.17 128 185467 4.94 20243.53 129 185468 4.45 18210.82 131 185469 3.89 15928.44 132 185493 4.09 16764.16 133 185529 4.16 17037.21 134 185578 18.64 76329.67 135 185631 27.78 113768.65 Why this happened? It's a unacceptable performance for read.

3 years, 5 months

2
3
0 0

CephFS error: currently failed to rdlock, waiting. clients crashing and evicted

by Thomas Hukkelberg

Hi all! Hopefully some of you can shed some light on this. We have big problems with samba crashing when macOS smb clients access certain/random folders/files over vfs_ceph. When browsing cephfs folder in question directly on a cephnode where cephfs is mouted we experience some issues like slow dir listing. We suspect that maybe macOS fetching of xattr metadata creates a lot of traffic, but it should not lockup the cluster like this. In logs we see both rdlock and wrlock, but mostly rdlocks. End clients experience spurious disconnects when issue occurs, roughly up to a handfull times a day. Is this a config issue? Have we hit a bug? It's certainly not a feature :/ Any pointers on how to troubleshoot or rectify this problem is most welcome. ceph version 14.2.11 samba version 4.12.10-SerNet-Ubuntu-10.focal Supermicro X11, Intel Silver 4110, 9 ceph nodes, 2x40gbe network, 150OSD spinners, NVMe db/journal -- 2020-11-17 22:09:07.525706 [WRN] evicting unresponsive client bo-samba-03 (3887652779), after 301.746 seconds 2020-11-17 22:09:07.525580 [INF] Evicting (and blacklisting) client session 3877970532 (10.40.30.133:0/3971626932) 2020-11-17 22:09:07.525536 [WRN] evicting unresponsive client bo-samba-03 (3877970532), after 302.034 seconds 2020-11-17 22:07:23.915412 [INF] Cluster is now healthy 2020-11-17 22:07:23.915381 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2020-11-17 22:07:23.915330 [INF] Health check cleared: MDS_CLIENT_LATE_RELEASE (was: 1 clients failing to respond to capability release) 2020-11-17 22:07:23.064492 [INF] MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs 2020-11-17 22:07:23.064457 [INF] MDS health message cleared (mds.?): Client bo-samba-03 failing to respond to capability release 2020-11-17 22:07:17.524023 [WRN] client.3887663354 isn't responding to mclientcaps(revoke), ino 0x10001202b55 pending pAsLsXsFs issued pAsLsXsFsx, sent 63.325997 seconds ago 2020-11-17 22:07:17.523987 [INF] Evicting (and blacklisting) client session 3887663354 (10.40.30.133:0/3230547239) 2020-11-17 22:07:17.523967 [WRN] evicting unresponsive client bo-samba-03 (3887663354), after 64.5412 seconds 2020-11-17 22:07:17.523610 [WRN] slow request 63.325528 seconds old, received at 2020-11-17 22:06:14.197986: client_request(client.3878823430:4 lookup #0x100011f9a68/mappe uten navn 2020-11-17 22:06:14.197908 caller_uid=111139, caller_gid=110513{}) currently failed to rdlock, waiting 2020-11-17 22:07:17.523596 [WRN] 1 slow requests, 1 included below; oldest blocked for > 63.325529 secs 2020-11-17 22:07:19.255177 [WRN] Health check failed: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE) 2020-11-17 22:07:12.523453 [WRN] 1 slow requests, 0 included below; oldest blocked for > 58.325433 secs 2020-11-17 22:07:07.523382 [WRN] 1 slow requests, 0 included below; oldest blocked for > 53.325362 secs 2020-11-17 22:07:02.523360 [WRN] 1 slow requests, 0 included below; oldest blocked for > 48.325307 secs 2020-11-17 22:06:57.523218 [WRN] 1 slow requests, 0 included below; oldest blocked for > 43.325199 secs 2020-11-17 22:06:52.523203 [WRN] 1 slow requests, 0 included below; oldest blocked for > 38.325158 secs 2020-11-17 22:06:47.523105 [WRN] slow request 33.325065 seconds old, received at 2020-11-17 22:06:14.197986: client_request(client.3878823430:4 lookup #0x100011f9a68/mappe uten navn 2020-11-17 22:06:14.197908 caller_uid=111139, caller_gid=110513{}) currently failed to rdlock, waiting 2020-11-17 22:06:47.523100 [WRN] 1 slow requests, 1 included below; oldest blocked for > 33.325065 secs 2020-11-17 22:06:51.431745 [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 2020-11-17 22:06:20.045030 [INF] Cluster is now healthy 2020-11-17 22:06:20.045008 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2020-11-17 22:06:20.044960 [INF] Health check cleared: MDS_CLIENT_LATE_RELEASE (was: 1 clients failing to respond to capability release) 2020-11-17 22:06:19.062307 [INF] MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs 2020-11-17 22:06:19.062253 [INF] MDS health message cleared (mds.?): Client bo-samba-03 failing to respond to capability release 2020-11-17 22:06:15.936150 [WRN] Health check failed: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE) 2020-11-17 22:06:12.522624 [WRN] client.3869410498 isn't responding to mclientcaps(revoke), ino 0x10001202b55 pending pAsLsXsFs issued pAsLsXsFsx, sent 64.045677 seconds ago --thomas -- Thomas Hukkelberg thomas(a)hovedkvarteret.no +47 971 81 192 -- support(a)hovedkvarteret.no +47 966 44 999

3 years, 5 months

2
1
0 0

one osd down / rgw damoen wont start

by Bernhard Krieger

Hello, today i came across a strange behaviour. After stoppping an osd, im not able to restart or /stop/start a radosgw daemon. The boot proccess will stuck until i have started the osd again. Specs: 3 ceph nodes 2 radosgw nautilus 14.2.13 CentOS7 Steps: * stopping radosgw daemon on rgw * stopping one osd on a ceph-node * starting radosgw daemon on rgw * rgw daemon stucks in boot proccess 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework: civetweb 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework conf key: port, val: 10.220.196.31:80 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework conf key: num_threads, val: 100 2020-11-20 09:58:35.412 7f8a0b7c4900 0 deferred set uid:gid to 167:167 (ceph:ceph) 2020-11-20 09:58:35.412 7f8a0b7c4900 0 ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable), process radosgw, pid 8145 -- > STUCKS/no log entries * As soon as i starting the osd, the boot sequence continues and everything works. 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework: civetweb 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework conf key: port, val: 10.220.196.31:80 2020-11-20 09:58:35.412 7f8a0b7c4900 0 framework conf key: num_threads, val: 100 2020-11-20 09:58:35.412 7f8a0b7c4900 0 deferred set uid:gid to 167:167 (ceph:ceph) 2020-11-20 09:58:35.412 7f8a0b7c4900 0 ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable), process radosgw, pid 8145 2020-11-20 10:00:23.895 7f8a0b7c4900 0 starting handler: civetweb 2020-11-20 10:00:23.913 7f8a0b7c4900 1 mgrc service_daemon_register rgw.rgw1 metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable),ceph_version_short=14.2.13,cpu=Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS),distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=civetweb port=10.220.196.31:80 num_threads=100,frontend_type#0=civetweb,hostname=rgw1,kernel_description=#1 SMP Tue Oct 20 16:53:08 UTC 2020,kernel_version=3.10.0-1160.2.2.el7.x86_64,mem_swap_kb=1048572,mem_total_kb=1881836,num_handles=1,os=Linux,pid=8145,zone_id=ecf200a8-2c4a-4c96-96d8-4fcff5b2c8c3,zone_name=default,zonegroup_id=01d11ed1-6157-4c26-addf-ecba49820e20,zonegroup_name=default} 2020-11-20 10:00:25.551 7f89d3aac700 1 civetweb: 0x557f6a6f2000: 10.220.199.4 - - [20/Nov/2020:10:00:25 +0100] "GET / HTTP/1.0" 200 416 - - 2020-11-20 10:00:25.582 7f89d3aac700 1 civetweb: 0x557f6a6f2000: 10.220.199.3 - - [20/Nov/2020:10:00:25 +0100] "GET / HTTP/1.0" 200 416 - - 2020-11-20 10:00:27.555 7f89d3aac700 1 civetweb: 0x557f6a6f2000: 10.220.199.4 - - [20/Nov/2020:10:00:27 +0100] "GET / HTTP/1.0" 200 416 - - 2020-11-20 10:00:27.586 7f89d3aac700 1 civetweb: 0x557f6a6f2000: 10.220.199.3 - - [20/Nov/2020:10:00:27 +0100] "GET / HTTP/1.0" 200 416 - - regards Bernhard

3 years, 5 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2020