May 2020 - Dev - lists.ceph.io

by Josh Durgin

Hi folks, at this time we recommend pausing OSD upgrades to 15.2.2. There have been a couple reports of OSDs crashing due to rocksdb corruption after upgrading to 15.2.2 [1] [2]. It's safe to upgrade monitors and mgr, but OSDs and everything else should wait. We're investigating and will get a fix out as soon as we can. You can follow progress on this tracker: https://tracker.ceph.com/issues/45613 Josh [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CX5PRFGL6UB… [2] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CWN7BNPGSRB…

3 years, 11 months

2
2
0 0

On librados and libcephfs command functions

by John Mulligan

Both librados and libcephfs have functions that are named like *_command that generally take json formatted input "commands". Across all the functions the json `{"prefix: "get_command_descriptions"}` appears to work everywhere, returning a json formatted listing of the available commands. Related to this I have a few questions that I have not been able to answer myself just by looking at the sources: There appears to be a distinction between 'tell' commands and those that are not 'tell' commands but I couldn't quite figure it all out on my own. What's the difference and how can I determine which commands in the listing are tell commands or not? Sometimes certain commands appear in more than one listing. An example that is relevant to some upcoming work for the ceph-csi project is subvolume creation for cephfs. By tracing/instrumenting the `ceph` cli tool I can see that (on my system) it sends the command to create the subvolume via mon_command, but the command ("fs subvolume create") appears for both mon_command and mgr_command. I would like some advice about which command function we should prefer to use even if both work. I tend to prefer doing what the cli does, as I assume that gets a lot of real work testing. However, if someone on the list who understands the fundamentals of this system better could provide some advice I'd greatly appreciate it. Somewhat related to the previous item, is it safe to "hard code" the json structure or should we always prefer to dynmically fetch the parameters of the commands via get_command_descriptions like the cli does? What are the risks if a code base were to "hard code" the json for a command? When I was testing the rados_mon_command_target function, I noticed some varying behavior when the server side was running binaries based on the octopus release vs. client libraries from nautilus. When I use nautilus libs to query the octopus-based server things seem to look like many of the other results of "get_command_descriptions". However, using octopus client libs to query the octopus-based server I see output that looks a bit different. The json returned has extra white space and there are odd signatures like "0" and "1". Example: "cmd000": { "sig": [ "0" ], "help": "" }, Poking around I saw that this might be related to the following items: https://github.com/ceph/ceph/pull/30217 https://github.com/ceph/ceph/pull/30859 https://github.com/ceph/ceph/pull/31138 But I couldn't say for sure. The formatting of the JSON does not bother me but the "0", "1", etc signatures don't look right and I'm thinking that it might be a bug but didn't want to file something without a hint that this isn't desired behavior. As a relative newcomer to ceph I'm doing more research into how to use this family of functions works (is expected to work) versus having absorbed institutional knowledge. So, I'd be happy to record some of what I'm learning somewhere more permanent for future folks in my current situation. If I were to contribute documentation about these functions what's the preferred place to do so? I could see this going into the comments in librados.h or somewhere in the sphinx based docs but I'd appreciate a few pointers before I make patches/PRs. FWIW, If there's already docs for it, I've managed to overlook it despite looking a number of times for such a thing so apologies if it is noise. Lastly, and possibly leastly, does the team have a better name/label for this collection of functions/general approach? I've been calling them *_command functions or command json functions. I would be happy to more widely known name for them if one exists. :-) Thanks in advance! --John M.

3 years, 11 months

2
1
0 0

05/28/2020 perf meeting is cancelled

by Mark Nelson

Hi Folks, There's a conflict so perf meeting will be cancelled today. Have a great week! Mark

3 years, 11 months

1
0
0 0

Rados JNI client

by Tristan Tarrant

Hi all, before I begin, a little introduction: I lead the Infinispan project [1] which is an in-memory data grid written in Java. Infinispan stores data in memory in "caches" which are k/v data structures where the "v" can either be a raw binary blob or a structured object (we default to ProtoBuf). Caches can be backed by a persistent store. We have a number of such stores (file, rocksdb, database, etc). I've been working on a Ceph cache store. The initial intention was to use the rados-java client [2] but unfortunately this uses JNA. While JNA is very convenient because it doesn't require building the additional native bits that JNI requires, its performance overhead is ... terrible. I have therefore started working on a JNI Rados client which is looking quite promising. I hope to share my initial implementation with you soon, including some benchmark comparisons against the JNA counterpart. In the meantime, I need some insights into how librados handles the "aio" set of functions, in particular related to whether there is any blocking involved and how the threading model works (if there is any). Thanks Tristan [1] https://infinispan.org [2] https://github.com/ceph/rados-java

3 years, 11 months

1
0
0 0

canceling this week's rgw refactoring call

by Casey Bodley

Several of us have a conflict and won't make the call, so I've removed it from this week's calendar.

3 years, 11 months

1
1
0 0

Segmentation fault in rgw_rados.cc

by Abhinav Singh

Hello everyone, I m facing strange issue of segmentation fault in rados.cc inside `int RGWRados::put_linked_bucket_info()` when trying to create a new bucket. I m making a span inside this function with its parent_span as req_state->root_span (root_span is a data member of req_state).On checking the logs I found that my root_span is not initialized an it is throwing segmentation fault, but when I debug this same program with gdb I m not getting any segmentation fault, Strange. I dont find this function to be running parallel in which race condition might corrupt the variable. Side Note: Inside RGWRados I m accessing req_state by first getting RGWRadosStore and inside that I having req_state variable. And I m setting the RGWRadosStore req_state variable inside function RGWCreateBucket::init(). An idea why is this happening?

3 years, 11 months

1
0
0 0

RGW tracing

by Abhinav Singh

Hello everyone I m trying to implement jaeger tracing in rgw. I m storing my spans in struct req_state, but some files rgw_sal.cc and rgw_user.cc dont have anything relation to req_state so I thought of using extern variable to store req_state, it is wrong because req can run in parellel, so another way of using it I included data members req_state in RGWRadosStore and RGWCtrl class then using those to access the req_state and storing the spans Is it the right move of adding extra data members in classes?

3 years, 11 months

3
2
0 0

DocuBetter Meeting This Week -- 27 May 2020

by John Zachary Dover

There is a general documentation meeting called the "DocuBetter Meeting", and it is held every two weeks. The next DocuBetter Meeting will be on May 27, 2020 at 1800 PST, and will run for thirty minutes. Everyone with a documentation-related request or complaint is invited. The meeting will be held here: https://bluejeans.com/908675367 Send documentation-related requests and complaints to me by replying to this email and CCing me at zac.dover(a)gmail.com. This message will be sent to dev(a)ceph.io every Monday morning, North American time. The next DocuBetter meeting is scheduled for: 27 May 2020 1800 PST 28 May 2020 0100 UTC 28 May 2020 1100 AEST Etherpad: https://pad.ceph.com/p/Ceph_Documentation Meeting: https://bluejeans.com/908675367 Thanks, everyone. Zac Dover

3 years, 11 months

1
0
0 0

[Teuthology] Unable to setup machines to run tests - need help

by Prasad Krishnan

Hi All, I was trying to run tests required as part of merging my PR: https://github.com/ceph/ceph/pull/33492 and have obtained Sepia lab access. However, I'm unable to create the requisite VMs to run the tests thereafter and I'm quite lost w.r.t. how to do this. I was trying to follow the instructions listed in the following webpages: https://wiki.sepia.ceph.com/doku.php?id=testnodeaccess - to create VMs from the teuthology machine and to follow it up with pulpito setup described here: https://docs.ceph.com/teuthology/docs/LAB_SETUP.html#introduction. I am getting stuck at the following step shown below: prasad@teuthology:~$ ifconfig ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.21.0.51 netmask 255.255.240.0 broadcast 172.21.15.255 inet6 fe80::21a:4aff:feab:830c prefixlen 64 scopeid 0x20<link> ether 00:1a:4a:ab:83:0c txqueuelen 1000 (Ethernet) RX packets 21592314245 bytes 92499523754648 (92.4 TB) RX errors 0 dropped 49 overruns 0 frame 0 TX packets 15986355816 bytes 109276128813378 (109.2 TB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 <snipped> prasad@teuthology:~$ teuthology-lock --lock-many 1 --machine-type vpm --os-type ubuntu --os-version 18.04 2020-05-24 16:23:27,641.641 ERROR:teuthology.lock:Insufficient nodes available to lock 1 vpm nodes. 2020-05-24 16:23:27,642.642 ERROR:teuthology.lock:{"message": "only 0 nodes available"} prasad@teuthology:~$ teuthology-lock --lock-many 1 --machine-type vpm --os-type ubuntu --os-version 16.04 2020-05-24 16:23:39,671.671 ERROR:teuthology.lock:Insufficient nodes available to lock 1 vpm nodes. 2020-05-24 16:23:39,672.672 ERROR:teuthology.lock:{"message": "only 0 nodes available"} prasad@teuthology:~$ teuthology-lock --lock-many 1 --machine-type vps --os-type ubuntu --os-version 18.04 2020-05-24 16:23:48,521.521 ERROR:teuthology.lock:Insufficient nodes available to lock 1 vps nodes. 2020-05-24 16:23:48,521.521 ERROR:teuthology.lock:{"message": "only 0 nodes available"} prasad@teuthology:~$ What am I doing wrong? Thanks, K.Prasad -- *-----------------------------------------------------------------------------------------* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.***** **** *Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.***** **** *Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information *provided,* unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.* _-----------------------------------------------------------------------------------------_

3 years, 11 months

2
1
0 0

05/21/2020 perf meeting is on at 8AM PST!

by Mark Nelson

Hi Folks, Perf meeting starting in 5 minutes! Today we will like at some 2 OSD vs 1 OSD performance results on nautilus vs octopus vs master and also talk a little bit about bluestore memory usage growth in some situations. See you there! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark

3 years, 11 months

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev May 2020