October 2019 - Dev - lists.ceph.io

by Patrick Donnelly

For developers submitting jobs using teuthology, we now have recommendations on what priority level to use: https://docs.ceph.com/docs/master/dev/developer_guide/#testing-priority -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

1 year, 1 month

5
7
0 0

Need feedback for Ceph User Survey 2019

by Mike Perez

Hi all, We conduct yearly user surveys to better under how our users utilize Ceph. The Ceph Foundation collects the data under the Community Data License agreement [0]; which helps the community make more of an informed decision of where our efforts in the development of future releases should go. Back in August, I asked for the community to help draft the next survey [1]. I'm happy to provide a draft of the user survey for 2019. I'm sending this to the dev list in hopes of getting feedback before sending it to the Ceph users list. The first question I received was using something other than Survey monkey due to it not being available in some regions. I have been using another third-party service for our Ceph Days CFP forms, and luckily they offer a survey service that isn't blocked. A second question that came up was how to layout questions for multiple cluster deployments. An idea I had was having our general Ceph user survey [2] separate from the deployment questions [3]. The general questions only need to be answered once, and the deployment survey can be answered multiple times to capture the different configurations. I'm looking into a way to link the answers of both surveys together. Any feedback, corrections or ideas? [0] - https://cdla.io/sharing-1-0/ [1] - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Q3NCHOJN45D… [2] - https://ceph.io/wp-content/uploads/2019/10/Ceph-User-Survey-general.pdf [3] - https://ceph.io/wp-content/uploads/2019/10/Ceph-User-Survey-Clusters.pdf -- Mike Perez he/him Ceph Community Manager M: +1-951-572-2633 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee <https://twitter.com/thingee> Thingee <https://www.linkedin.com/thingee> <https://www.facebook.com/RedHatInc> <https://www.redhat.com>

4 years, 4 months

8
18
0 0

CfP for the Software Defined Storage devroom at FOSDEM 2020

by Jan Fajerski

FOSDEM is a free software event that offers open source communities a place to meet, share ideas and collaborate. It is renown for being highly developer- oriented and brings together 8000+ participants from all over the world. It is held in the city of Brussels (Belgium). FOSDEM 2020 will take place during the weekend of February 1st-2nd 2020. More details about the event can be found at http://fosdem.org/ ** Call For Participation The Software Defined Storage devroom will go into it's fourth round for talks around Open Source Software Defined Storage projects, management tools and real world deployments. Presentation topics could include but are not limited too: - Your work on a SDS project like Ceph, Gluster, OpenEBS or LizardFS - Your work on or with SDS related projects like SWIFT or Container Storage Interface - Management tools for SDS deployments - Monitoring tools for SDS clusters ** Important dates: - Nov 24th 2019: submission deadline for talk proposals - Dec 15th 2019: announcement of the final schedule - Feb 2nd 2020: Software Defined Storage dev room Talk proposals will be reviewed by a steering committee: - Niels de Vos (OpenShift Container Storage Developer - Red Hat) - Jan Fajerski (Ceph Developer - SUSE) - Kai Wagner (SUSE) - Mike Perez (Ceph Community Manager, Red Hat) Use the FOSDEM 'pentabarf' tool to submit your proposal: https://penta.fosdem.org/submission/FOSDEM20 - If necessary, create a Pentabarf account and activate it. Please reuse your account from previous years if you have already created it. https://penta.fosdem.org/user/new_account/FOSDEM20 - In the "Person" section, provide First name, Last name (in the "General" tab), Email (in the "Contact" tab) and Bio ("Abstract" field in the "Description" tab). - Submit a proposal by clicking on "Create event". - Important! Select the "Software Defined Storage devroom" track (on the "General" tab). - Provide the title of your talk ("Event title" in the "General" tab). - Provide a description of the subject of the talk and the intended audience (in the "Abstract" field of the "Description" tab) - Provide a rough outline of the talk or goals of the session (a short list of bullet points covering topics that will be discussed) in the "Full description" field in the "Description" tab - Provide an expected length of your talk in the "Duration" field. Please consider at least 5 minutes of discussion into your proposal plus allow 5 minutes for the handover to the next presenter. Suggested talk length would be 20+5+5 and 45+10+5 minutes. Note that short talks have a preference so that more topics can be presented during the day. ** Recording of talks The FOSDEM organizers plan to have live streaming and recording fully working, both for remote/later viewing of talks, and so that people can watch streams in the hallways when rooms are full. This requires speakers to consent to being recorded and streamed. If you plan to be a speaker, please understand that by doing so you implicitly give consent for your talk to be recorded and streamed. The recordings will be published under the same license as all FOSDEM content (CC-BY). Hope to hear from you soon! And please forward this announcement. If you have any further questions, please write to the mailinglist at storage-devroom(a)lists.fosdem.org and we will try to answer as soon as possible. Thanks!

4 years, 5 months

1
1
0 0

Static Analysis

by Brad Hubbard

Latest static analyser results are up on http://people.redhat.com/bhubbard/ Weekly Fedora Copr builds are at https://copr.fedorainfracloud.org/coprs/badone/ceph-weeklies/ -- Cheers, Brad

4 years, 5 months

2
5
0 0

build on RHEL8

by kefu chai

hi folks, just want to share my findings regarding to building ceph on RHEL8 here. today, i was trying to build ceph on RHEL8. but it seems we are missing some build dependencies on this distro, because quite a few packages were removed from RHEL8 [0], and these packages are still missing EPEL8: No matching package to install: 'gperftools-devel >= 2.6.1' No matching package to install: 'leveldb-devel > 1.2' No matching package to install: 'libbabeltrace-devel' No matching package to install: 'liboath-devel' No matching package to install: 'python3-cherrypy' No matching package to install: 'python3-coverage' No matching package to install: 'python3-pecan' No matching package to install: 'python3-routes' No matching package to install: 'python3-tox' No matching package to install: 'xmlstarlet' i've added EPEL8 repo by following [1]. probably packages being added to EPEL8 in future will help to ease the pain. but at this moment, some unnecessary dependencies are just missing. cheers, --- [0] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ht… [1] https://fedoraproject.org/wiki/EPEL -- Regards Kefu Chai

4 years, 5 months

2
3
0 0

Backports only to nautilus do not require tracker issues

by Nathan Cutler

Hi all, there seems to be some confusion concerning the backport tracker. The purpose of the backport tracker is to track backports of fixes that need to go into multiple stable branches, so they don't "fall through the cracks". If you need to backport something from master *only* to nautilus *and no farther*, there is no need to create tracker issues for backporting purposes. Just open your nautilus PR with the cherry-pick [1]. (If there is already a master tracker issue, you can just mention the URL of the nautilus backport in a comment on the tracker. You don't need to change the status to Pending Backport or fill in the Backport field.) Hope this helps, Nathan [1] of course, follow the cherry-picking rules https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#ch… -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037

4 years, 5 months

3
2
0 0

librados aysnc I/O takes considerably longer

by Ponnuvel Palaniyappan

Hi, Is anyone using librados AIO APIs? I seem to have a problem with that where the rados_aio_wait_for_complete() call just waits for a long period of time before it finishes without error. More info on my setup: I am using Ceph 14.2.4 and write 8MB objects. I run my AIO program on 24 nodes at the same time each writing a different data (splits into 8MB objects and ), each data is about 2G. Normally, it takes about 10 mins for all of them to complete. But often one or more nodes takes considerably longer to finish. When looking at the one of those, I mostly see that the IO requests have been submitted and waits at: #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00002aaaaad0c8fa in rados_aio_wait_for_complete () from /cgv/geovation/2/test/ceph/lib/librados.so.2 Then it eventually completes with no errors from rados_aio_wait_for_complete() call. The (pseudo) code looks like: while (data remains to be written) { size_t aio_ops_count = 0; rados_completion_t aio_comp[12]; for (size_t j = 0; j < 12; ++j) { int err = rados_aio_create_completion(NULL, NULL, NULL, &aio_comp[j]); if (err < 0) { cerr << "rados_aio_create_completion: " << strerror(-err) << endl; return 1; } string obj_ = getobjectid(); err = rados_aio_write_full(io, obj_.c_str(), aio_comp[j], read_buf[j], bytes); if (err < 0) { cerr << "rados_write_full: " << strerror(-err) << endl; return 1; } ++aio_ops_count; } for (size_t j = 0; j < aio_ops_count; ++j) { rados_aio_wait_for_complete(aio_comp[j]); int err = rados_aio_get_return_value(aio_comp[j]); // Considerably longer delay here ?? if (err < 0) { cerr << "rados_aio_get_return_value: " << strerror(-err) << endl; return 1; } rados_aio_release(aio_comp[j]); } } I ran under Valgrind and see no issues and also read the data back and checksum it to verify no corruption issues. So everything appears to "work" as expected except for longer delays at times. Wondering if anyone is using the AIO APIs to write objects and had experienced any similar problems. -- Regards, Ponnuvel P

4 years, 5 months

2
1
0 0

Re: RMDA Bug?

by Liu, Changcheng

On 07:31 Mon 28 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote: > I am using ceph version 12.2.8 > (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable). > > I have not checked the master branch do you think this is an issue in > luminous that has been removed in later versions? I haven't hit problem on master branch. Ceph/RDMA changed a lot from luminous to master branch. Is below configuration really needed in luminous/ceph.conf? > ms_async_rdma_local_gid = xxxx On master branch, this parameter is not needed at all. B.R. Changcheng > __________________________________________________________________ > > From: Liu, Changcheng <changcheng.liu(a)intel.com> > Sent: 25 October 2019 18:04 > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) > <gabryel.mason-williams(a)diamond.ac.uk> > Cc: ceph-users(a)ceph.com <ceph-users(a)ceph.com>; dev(a)ceph.io > <dev(a)ceph.io> > Subject: Re: RMDA Bug? > > What's your ceph version? Have you verified whether the problem could > be > reproduced on master branch? > On 08:33 Fri 25 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote: > > I am currently trying to run Ceph on RDMA, either RoCE 1 or 2. > However, > > I am experiencing issues with this. > > > > When using Ceph on RDMA I experience issues where OSD’s will > randomly > > become unreachable even if the cluster is left alone alone, it > also is > > not properly talking over RDMA and using Ethernet when the config > > states it should as shown by the same results in the bench marking > of > > the two setups. > > > > After reloading the cluster > > [cid:36020940-0085-40fc-bb5b-d91de6ace453] > > > > After 5m 9s the cluster went from being healthy to down. > > > > [cid:ed084bcc-0b97-44bd-9648-ce2e06859cd5] > > > > This problem even happens when running a bench mark test on the > > cluster, OSD’s will just fall over. Another curious issue is that > it is > > not properly talking over RDMA as well and instead using the > Ethernet. > > > > [cid:05e9dc68-075e-425d-b76b-ce7fa1d2f7a8] > > > > Next test: > > > > [cid:4183557e-b1da-41f3-afc3-f081b9fb4034] > > > > The config used for the RDMA is a so: > > > > [global] > > > > fsid = aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa > > > > mon_initial_members = node1, node2, node3 > > > > mon_host =xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx > > > > auth_cluster_required = cephx > > > > auth_service_required =cephx > > > > auth_client_required = cephx > > > > public_network = xxx.xxx.xxx.xxx/24 > > > > cluster_network = yyy.yyy.yyy.yyy/16 > > > > ms_cluster_type =async+rdma > > > > ms_public_type = async+posix > > > > ms_async_rdma_device_name = mlx4_0 > > > > [osd.0] > > > > ms_async_rdma_local_gid = xxxx > > > > [osd.1] > > > > ms_async_rdma_local_gid = xxxx > > > > [osd.2] > > > > ms_async_rdma_local_gid =xxxx > > > > Tests to check the system is using RDMA > > > > sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show > | > > grep ms_cluster > > > > OUTPUT > > > > "ms_cluster_type": "async+rdma", > > > > sudo ceph daemon osd.0 perf dump AsyncMessenger::RDMAWorker-1 > > > > OUTPUT > > > > { > > > > "AsyncMessenger::RDMAWorker-1": { > > > > "tx_no_mem": 0, > > > > "tx_parital_mem": 0, > > > > "tx_failed_post": 0, > > > > "rx_no_registered_mem": 0, > > > > "tx_chunks": 9, > > > > "tx_bytes": 2529, > > > > "rx_chunks": 0, > > > > "rx_bytes": 0, > > > > "pending_sent_conns": 0 > > > > } > > > > } > > > > When running over Ethernet I have a completely stable system with > the > > current benchmarks as so > > > > [cid:544ecbbc-10d9-43e6-ab2f-aa7c2bcd88c0] > > > > Config setup when using Ethernet is > > > > The Config setup when using Ethernet is > > > > [global] > > > > fsid = aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa > > > > mon_initial_members = node1, node2, node3 > > > > mon_host =xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx > > > > auth_cluster_required = cephx > > > > auth_service_required =cephx > > > > auth_client_required = cephx > > > > public_network = xxx.xxx.xxx.xxx/24 > > > > cluster_network = yyy.yyy.yyy.yyy/16 > > > > ms_cluster_type =async+posix > > > > ms_public_type = async+posix > > > > ms_async_rdma_device_name = mlx4_0 > > > > [osd.0] > > > > ms_async_rdma_local_gid = xxxx > > > > [osd.1] > > > > ms_async_rdma_local_gid = xxxx > > > > [osd.2] > > > > ms_async_rdma_local_gid =xxxx > > Tests to check the system is using async+posix > > > > sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show > | > > grep ms_cluster > > > > OUTPUT > > > > "ms_cluster_type": "async+posix" > > > > sudo ceph daemon osd.0 perf dump AsyncMessenger::RDMAWorker-1 > > > > OUTPUT > > > > {} > > > > This clearly a issue with RDMA and not with the OSD's shown by the > fact > > the system is completely fine over Ethernet and not with RDMA. > > > > Any guidance or ideas on how to approach this problem to make Ceph > work > > with RDMA would be greatly appreciated. > > > > Regards > > > > Gabryel Mason-Williams, Placement Student > > > > Address: Diamond Light Source Ltd., Diamond House, Harwell Science > & > > Innovation Campus, Didcot, Oxfordshire OX11 0DE > > > > Email: gabryel.mason-williams(a)diamond.ac.uk > > > > > > -- > > > > This e-mail and any attachments may contain confidential, > copyright and > > or privileged material, and are for the use of the intended > addressee > > only. If you are not the intended addressee or an authorised > recipient > > of the addressee please notify us of receipt by returning the > e-mail > > and do not use, copy, retain, distribute or disclose the > information in > > or attached to the e-mail. > > Any opinions expressed within this e-mail are those of the > individual > > and not necessarily of Diamond Light Source Ltd. > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > > attachments are free from viruses and we cannot accept liability > for > > any damage which you may sustain as a result of software viruses > which > > may be transmitted in or with the message. > > Diamond Light Source Limited (company no. 4375679). Registered in > > England and Wales with its registered office at Diamond House, > Harwell > > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, > United > > Kingdom > > _______________________________________________ > > Dev mailing list -- dev(a)ceph.io > > To unsubscribe send an email to dev-leave(a)ceph.io > > > -- > > This e-mail and any attachments may contain confidential, copyright and > or privileged material, and are for the use of the intended addressee > only. If you are not the intended addressee or an authorised recipient > of the addressee please notify us of receipt by returning the e-mail > and do not use, copy, retain, distribute or disclose the information in > or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual > and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for > any damage which you may sustain as a result of software viruses which > may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in > England and Wales with its registered office at Diamond House, Harwell > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United > Kingdom

4 years, 5 months

3
10
0 0

PRs for luminous 12.2.13 and mimic 13.2.7

by Yuri Weinstein

We are getting close to start QE valuation for next point releases. Dev leads, backport team - pls tag all PRs needed to be included with appropriate labels. Current plan to start QE in the middle of next week, unless I hear otherwise. Thx YuriW

4 years, 5 months

7
14
0 0

10/31/2019 perf meeting is on!

by Mark Nelson

Hi Folks, Perf meeting is on in ~20 minutes! Discussion topic for today is rocksdb sharding testing, bluestore trim update, and possibly rgw bucket sharding. Please feel free to add your own! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark

4 years, 5 months

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev October 2019