For developers submitting jobs using teuthology, we now have
recommendations on what priority level to use:
https://docs.ceph.com/docs/master/dev/developer_guide/#testing-priority
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hi all,
We conduct yearly user surveys to better under how our users utilize Ceph.
The Ceph Foundation collects the data under the Community Data License
agreement [0]; which helps the community make more of an informed decision
of where our efforts in the development of future releases should go.
Back in August, I asked for the community to help draft the next survey
[1]. I'm happy to provide a draft of the user survey for 2019. I'm sending
this to the dev list in hopes of getting feedback before sending it to the
Ceph users list.
The first question I received was using something other than Survey monkey
due to it not being available in some regions. I have been using another
third-party service for our Ceph Days CFP forms, and luckily they offer a
survey service that isn't blocked.
A second question that came up was how to layout questions for multiple
cluster deployments. An idea I had was having our general Ceph user survey
[2] separate from the deployment questions [3]. The general questions only
need to be answered once, and the deployment survey can be answered
multiple times to capture the different configurations. I'm looking into a
way to link the answers of both surveys together.
Any feedback, corrections or ideas?
[0] - https://cdla.io/sharing-1-0/
[1] -
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Q3NCHOJN45D…
[2] -
https://ceph.io/wp-content/uploads/2019/10/Ceph-User-Survey-general.pdf
[3] -
https://ceph.io/wp-content/uploads/2019/10/Ceph-User-Survey-Clusters.pdf
--
Mike Perez
he/him
Ceph Community Manager
M: +1-951-572-2633
494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee <https://twitter.com/thingee> Thingee
<https://www.linkedin.com/thingee> <https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
FOSDEM is a free software event that offers open source communities a place to
meet, share ideas and collaborate. It is renown for being highly developer-
oriented and brings together 8000+ participants from all over the world. It
is held in the city of Brussels (Belgium).
FOSDEM 2020 will take place during the weekend of February 1st-2nd 2020. More
details about the event can be found at http://fosdem.org/
** Call For Participation
The Software Defined Storage devroom will go into it's fourth round for
talks around Open Source Software Defined Storage projects, management tools
and real world deployments.
Presentation topics could include but are not limited too:
- Your work on a SDS project like Ceph, Gluster, OpenEBS or LizardFS
- Your work on or with SDS related projects like SWIFT or Container Storage
Interface
- Management tools for SDS deployments
- Monitoring tools for SDS clusters
** Important dates:
- Nov 24th 2019: submission deadline for talk proposals
- Dec 15th 2019: announcement of the final schedule
- Feb 2nd 2020: Software Defined Storage dev room
Talk proposals will be reviewed by a steering committee:
- Niels de Vos (OpenShift Container Storage Developer - Red Hat)
- Jan Fajerski (Ceph Developer - SUSE)
- Kai Wagner (SUSE)
- Mike Perez (Ceph Community Manager, Red Hat)
Use the FOSDEM 'pentabarf' tool to submit your proposal:
https://penta.fosdem.org/submission/FOSDEM20
- If necessary, create a Pentabarf account and activate it.
Please reuse your account from previous years if you have
already created it.
https://penta.fosdem.org/user/new_account/FOSDEM20
- In the "Person" section, provide First name, Last name
(in the "General" tab), Email (in the "Contact" tab)
and Bio ("Abstract" field in the "Description" tab).
- Submit a proposal by clicking on "Create event".
- Important! Select the "Software Defined Storage devroom" track
(on the "General" tab).
- Provide the title of your talk ("Event title" in the "General" tab).
- Provide a description of the subject of the talk and the
intended audience (in the "Abstract" field of the "Description" tab)
- Provide a rough outline of the talk or goals of the session (a short
list of bullet points covering topics that will be discussed) in the
"Full description" field in the "Description" tab
- Provide an expected length of your talk in the "Duration" field. Please
consider at least 5 minutes of discussion into your proposal plus allow
5 minutes for the handover to the next presenter.
Suggested talk length would be 20+5+5 and 45+10+5 minutes. Note that
short talks have a preference so that more topics can be presented during
the day.
** Recording of talks
The FOSDEM organizers plan to have live streaming and recording fully working,
both for remote/later viewing of talks, and so that people can watch streams
in the hallways when rooms are full. This requires speakers to consent to
being recorded and streamed. If you plan to be a speaker, please understand
that by doing so you implicitly give consent for your talk to be recorded and
streamed. The recordings will be published under the same license as all
FOSDEM content (CC-BY).
Hope to hear from you soon! And please forward this announcement.
If you have any further questions, please write to the mailinglist at
storage-devroom(a)lists.fosdem.org and we will try to answer as soon as
possible.
Thanks!
hi folks,
just want to share my findings regarding to building ceph on RHEL8
here. today, i was trying to build ceph on RHEL8. but it seems we are
missing some build dependencies on this distro, because quite a few
packages were removed from RHEL8 [0], and these packages are still
missing EPEL8:
No matching package to install: 'gperftools-devel >= 2.6.1'
No matching package to install: 'leveldb-devel > 1.2'
No matching package to install: 'libbabeltrace-devel'
No matching package to install: 'liboath-devel'
No matching package to install: 'python3-cherrypy'
No matching package to install: 'python3-coverage'
No matching package to install: 'python3-pecan'
No matching package to install: 'python3-routes'
No matching package to install: 'python3-tox'
No matching package to install: 'xmlstarlet'
i've added EPEL8 repo by following [1]. probably packages being added
to EPEL8 in future will help to ease the pain. but at this moment,
some unnecessary dependencies are just missing.
cheers,
---
[0] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ht…
[1] https://fedoraproject.org/wiki/EPEL
--
Regards
Kefu Chai
Hi all, there seems to be some confusion concerning the backport tracker.
The purpose of the backport tracker is to track backports of fixes that need to
go into multiple stable branches, so they don't "fall through the cracks".
If you need to backport something from master *only* to nautilus *and no
farther*, there is no need to create tracker issues for backporting purposes.
Just open your nautilus PR with the cherry-pick [1].
(If there is already a master tracker issue, you can just mention the URL of the
nautilus backport in a comment on the tracker. You don't need to change the
status to Pending Backport or fill in the Backport field.)
Hope this helps,
Nathan
[1] of course, follow the cherry-picking rules
https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#ch…
--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
Hi,
Is anyone using librados AIO APIs? I seem to have a problem with that where
the rados_aio_wait_for_complete() call just waits for a long period of time
before it finishes without error.
More info on my setup:
I am using Ceph 14.2.4 and write 8MB objects.
I run my AIO program on 24 nodes at the same time each writing a different
data (splits into 8MB objects and ), each data is about 2G.
Normally, it takes about 10 mins for all of them to complete. But often one
or more nodes takes considerably longer to finish. When looking at the one
of those, I mostly see that the IO requests have been submitted and waits
at:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaaad0c8fa in rados_aio_wait_for_complete () from
/cgv/geovation/2/test/ceph/lib/librados.so.2
Then it eventually completes with no errors from
rados_aio_wait_for_complete() call.
The (pseudo) code looks like:
while (data remains to be written) {
size_t aio_ops_count = 0;
rados_completion_t aio_comp[12];
for (size_t j = 0; j < 12; ++j) {
int err = rados_aio_create_completion(NULL, NULL, NULL,
&aio_comp[j]);
if (err < 0) {
cerr << "rados_aio_create_completion: " <<
strerror(-err) << endl;
return 1;
}
string obj_ = getobjectid();
err = rados_aio_write_full(io, obj_.c_str(), aio_comp[j],
read_buf[j], bytes);
if (err < 0) {
cerr << "rados_write_full: " << strerror(-err) << endl;
return 1;
}
++aio_ops_count;
}
for (size_t j = 0; j < aio_ops_count; ++j) {
rados_aio_wait_for_complete(aio_comp[j]);
int err = rados_aio_get_return_value(aio_comp[j]); //
Considerably longer delay here ??
if (err < 0) {
cerr << "rados_aio_get_return_value: " <<
strerror(-err) << endl;
return 1;
}
rados_aio_release(aio_comp[j]);
}
}
I ran under Valgrind and see no issues and also read the data back and
checksum it to verify no corruption issues. So everything appears to "work"
as expected except for longer delays at times.
Wondering if anyone is using the AIO APIs to write objects and had
experienced any similar problems.
--
Regards,
Ponnuvel P
On 07:31 Mon 28 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote:
> I am using ceph version 12.2.8
> (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable).
>
> I have not checked the master branch do you think this is an issue in
> luminous that has been removed in later versions?
I haven't hit problem on master branch. Ceph/RDMA changed a lot
from luminous to master branch.
Is below configuration really needed in luminous/ceph.conf?
> ms_async_rdma_local_gid = xxxx
On master branch, this parameter is not needed at all.
B.R.
Changcheng
> __________________________________________________________________
>
> From: Liu, Changcheng <changcheng.liu(a)intel.com>
> Sent: 25 October 2019 18:04
> To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
> <gabryel.mason-williams(a)diamond.ac.uk>
> Cc: ceph-users(a)ceph.com <ceph-users(a)ceph.com>; dev(a)ceph.io
> <dev(a)ceph.io>
> Subject: Re: RMDA Bug?
>
> What's your ceph version? Have you verified whether the problem could
> be
> reproduced on master branch?
> On 08:33 Fri 25 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote:
> > I am currently trying to run Ceph on RDMA, either RoCE 1 or 2.
> However,
> > I am experiencing issues with this.
> >
> > When using Ceph on RDMA I experience issues where OSD’s will
> randomly
> > become unreachable even if the cluster is left alone alone, it
> also is
> > not properly talking over RDMA and using Ethernet when the config
> > states it should as shown by the same results in the bench marking
> of
> > the two setups.
> >
> > After reloading the cluster
> > [cid:36020940-0085-40fc-bb5b-d91de6ace453]
> >
> > After 5m 9s the cluster went from being healthy to down.
> >
> > [cid:ed084bcc-0b97-44bd-9648-ce2e06859cd5]
> >
> > This problem even happens when running a bench mark test on the
> > cluster, OSD’s will just fall over. Another curious issue is that
> it is
> > not properly talking over RDMA as well and instead using the
> Ethernet.
> >
> > [cid:05e9dc68-075e-425d-b76b-ce7fa1d2f7a8]
> >
> > Next test:
> >
> > [cid:4183557e-b1da-41f3-afc3-f081b9fb4034]
> >
> > The config used for the RDMA is a so:
> >
> > [global]
> >
> > fsid = aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
> >
> > mon_initial_members = node1, node2, node3
> >
> > mon_host =xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx
> >
> > auth_cluster_required = cephx
> >
> > auth_service_required =cephx
> >
> > auth_client_required = cephx
> >
> > public_network = xxx.xxx.xxx.xxx/24
> >
> > cluster_network = yyy.yyy.yyy.yyy/16
> >
> > ms_cluster_type =async+rdma
> >
> > ms_public_type = async+posix
> >
> > ms_async_rdma_device_name = mlx4_0
> >
> > [osd.0]
> >
> > ms_async_rdma_local_gid = xxxx
> >
> > [osd.1]
> >
> > ms_async_rdma_local_gid = xxxx
> >
> > [osd.2]
> >
> > ms_async_rdma_local_gid =xxxx
> >
> > Tests to check the system is using RDMA
> >
> > sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show
> |
> > grep ms_cluster
> >
> > OUTPUT
> >
> > "ms_cluster_type": "async+rdma",
> >
> > sudo ceph daemon osd.0 perf dump AsyncMessenger::RDMAWorker-1
> >
> > OUTPUT
> >
> > {
> >
> > "AsyncMessenger::RDMAWorker-1": {
> >
> > "tx_no_mem": 0,
> >
> > "tx_parital_mem": 0,
> >
> > "tx_failed_post": 0,
> >
> > "rx_no_registered_mem": 0,
> >
> > "tx_chunks": 9,
> >
> > "tx_bytes": 2529,
> >
> > "rx_chunks": 0,
> >
> > "rx_bytes": 0,
> >
> > "pending_sent_conns": 0
> >
> > }
> >
> > }
> >
> > When running over Ethernet I have a completely stable system with
> the
> > current benchmarks as so
> >
> > [cid:544ecbbc-10d9-43e6-ab2f-aa7c2bcd88c0]
> >
> > Config setup when using Ethernet is
> >
> > The Config setup when using Ethernet is
> >
> > [global]
> >
> > fsid = aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
> >
> > mon_initial_members = node1, node2, node3
> >
> > mon_host =xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx
> >
> > auth_cluster_required = cephx
> >
> > auth_service_required =cephx
> >
> > auth_client_required = cephx
> >
> > public_network = xxx.xxx.xxx.xxx/24
> >
> > cluster_network = yyy.yyy.yyy.yyy/16
> >
> > ms_cluster_type =async+posix
> >
> > ms_public_type = async+posix
> >
> > ms_async_rdma_device_name = mlx4_0
> >
> > [osd.0]
> >
> > ms_async_rdma_local_gid = xxxx
> >
> > [osd.1]
> >
> > ms_async_rdma_local_gid = xxxx
> >
> > [osd.2]
> >
> > ms_async_rdma_local_gid =xxxx
> > Tests to check the system is using async+posix
> >
> > sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show
> |
> > grep ms_cluster
> >
> > OUTPUT
> >
> > "ms_cluster_type": "async+posix"
> >
> > sudo ceph daemon osd.0 perf dump AsyncMessenger::RDMAWorker-1
> >
> > OUTPUT
> >
> > {}
> >
> > This clearly a issue with RDMA and not with the OSD's shown by the
> fact
> > the system is completely fine over Ethernet and not with RDMA.
> >
> > Any guidance or ideas on how to approach this problem to make Ceph
> work
> > with RDMA would be greatly appreciated.
> >
> > Regards
> >
> > Gabryel Mason-Williams, Placement Student
> >
> > Address: Diamond Light Source Ltd., Diamond House, Harwell Science
> &
> > Innovation Campus, Didcot, Oxfordshire OX11 0DE
> >
> > Email: gabryel.mason-williams(a)diamond.ac.uk
> >
> >
> > --
> >
> > This e-mail and any attachments may contain confidential,
> copyright and
> > or privileged material, and are for the use of the intended
> addressee
> > only. If you are not the intended addressee or an authorised
> recipient
> > of the addressee please notify us of receipt by returning the
> e-mail
> > and do not use, copy, retain, distribute or disclose the
> information in
> > or attached to the e-mail.
> > Any opinions expressed within this e-mail are those of the
> individual
> > and not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > attachments are free from viruses and we cannot accept liability
> for
> > any damage which you may sustain as a result of software viruses
> which
> > may be transmitted in or with the message.
> > Diamond Light Source Limited (company no. 4375679). Registered in
> > England and Wales with its registered office at Diamond House,
> Harwell
> > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE,
> United
> > Kingdom
> > _______________________________________________
> > Dev mailing list -- dev(a)ceph.io
> > To unsubscribe send an email to dev-leave(a)ceph.io
>
>
> --
>
> This e-mail and any attachments may contain confidential, copyright and
> or privileged material, and are for the use of the intended addressee
> only. If you are not the intended addressee or an authorised recipient
> of the addressee please notify us of receipt by returning the e-mail
> and do not use, copy, retain, distribute or disclose the information in
> or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual
> and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for
> any damage which you may sustain as a result of software viruses which
> may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in
> England and Wales with its registered office at Diamond House, Harwell
> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
> Kingdom
We are getting close to start QE valuation for next point releases.
Dev leads, backport team - pls tag all PRs needed to be included with
appropriate labels.
Current plan to start QE in the middle of next week, unless I hear otherwise.
Thx
YuriW
Hi Folks,
Perf meeting is on in ~20 minutes! Discussion topic for today is
rocksdb sharding testing, bluestore trim update, and possibly rgw bucket
sharding. Please feel free to add your own!
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Thanks,
Mark