December 2020 - Dev - lists.ceph.io

by Alexandre Marangone

While poking through one of our Nautilus clusters I noticed OSDs have HB peers that are not sharing PGs. Nautilus added OSDMap::get_random_up_osds_by_subtree() to select random OSDs of type mon_osd_reporter_subtree_level even if mon_osd_min_down_reporters is already met. If you have multiple types of hardware mapped to different pools, OSDs between these pools will HB each other which is not necessarily expected from an operations point of view. This also has the potential of wrongly marking OSDs down if one type of hardware is having issues. The more HB peers the better but couldn't we increase the default for mon_osd_min_down_reporters instead and if not met, call get_random_up_osds_by_subtree? I initially made a patch to exclude any OSD not part of the same crush root, but this wouldn't work widely since it's possible to have a crush rule spanning multiple trees, I'm not sure what other alternatives there are. Another bit from pre-nautilus, the osd id-1 and +1 are added to the HB peers, in order to have a "fully-connected set"[1]. I'm not sure I understand that comment, could somebody briefly explain how it creates a fully connected set and what set we're talking about? Thanks! [1] https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L5141

3 years, 4 months

1
0
0 0

"make all" does not build unit tests anymore

by Kefu Chai

hi folks, just FYI, i just merged a change (https://github.com/ceph/ceph/pull/38501) which removes unit tests from the "all" target. if your work flow depends on this behavior, you might want to build "checks" or "tests" instead for building the unit tests. these two targets depend on the unit tests. to be more specific, the commit which introduced this change is https://github.com/ceph/ceph/pull/38501/commits/461ad210b3b1b26bca33682eb49…. see also https://cmake.org/cmake/help/latest/prop_tgt/EXCLUDE_FROM_ALL.html. cheers,

3 years, 4 months

1
0
0 0

[RFC] CephFS dmClock QoS Scheduler

by Yongseok Oh

Hi Ceph maintainers and developers, The objective of this is to discuss our work on a dmClock based client QoS management for CephFS. Our group at LINE maintains Ceph storage clusters such as RGW, RBD, and CephFS to internally support OpenStack and K8S based private cloud environment for various applications and platforms including LINE messenger. We have seen that the RGW and RBD services can provide consistent performance to multiple active users since RGW employes the dmClock QoS scheduler for S3 clients and hypervisors internally utilize I/O throttler for VM block storage clients. Unfortunately, unlike RGW and RBD, CephFS clients can directly issue metadata requests to MDSs and filedata requests OSDs as they want. This situation occasionally (or frequently) happens and the other client performance may be degraded by the noisy neighbor. In the end, consistent performance cannot be guaranteed in our environment. From this observation and motivation, we are now considering the client QoS scheduler using the dmClock library for CephFS. A few things about how to realize the QoS scheduler. - Per subvolume QoS management. IOPS resources are only shared among the clients that mount the same root directory. QoS parameters can be easily configured through the extended attributes (similar to quota). Each dmClock scheduler can manage clients' requests using client session information. - MDS QoS management. Client metadata requests like create, lookup, and etc. are managed by dmClock scheduler placed between the dispatcher and the main request handler (e.g., Server::handle_client_request()). We have observed that two active MDSs provide approximately 20KIOPS. As performance capacity is sometimes scarce for lots of clients, QoS management is needed for MDS. - OSD QoS management. We would like to reopen and improve the previous work available at https://github.com/ceph/ceph/pull/20235. - Client QoS management. Each client manages the dmClock tracker to keep track of both rho and delta to be packed to client request messages. In case of the CLI, QoS parameters are configured using the extended attributes on each subvolume directory. Specifically, separate QoS configurations are considered for both MDSs and OSDs. setfattr -n ceph.dmclock.mds_reservation -v 200 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 setfattr -n ceph.dmclock.mds_weight -v 500 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 setfattr -n ceph.dmclock.mds_limit -v 1000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 setfattr -n ceph.dmclock.osd_reservation -v 500 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 setfattr -n ceph.dmclock.osd_weight -v 1000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 setfattr -n ceph.dmclock.osd_limit -v 2000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55 Our QoS work has been kicked off from the previous month. Our first step is to go over the prior work and dmClock algorithm/library. Now we are actively focusing on checking the feasibility of our idea with some modifications to MDS and ceph-fuse. Our development is planned as follows. - dmClock scheduler will be integrated into MDS and ceph-fuse by December 2020. - dmClock scheduler will be incorporated with OSD by the first half of the next year. Does the community have any plan to develop per client QoS management? Are there any other issues related to our QoS work? We are looking forward to hearing your valuable comments and feedback at an early stage. Thanks Yongseok Oh

3 years, 4 months

5
8
3 0

Announcing go-ceph v0.7.0

by John Mulligan

I'm happy to announce another release of the go-ceph API bindings. This is a regular release following our every-two-months release cadence. https://github.com/ceph/go-ceph/releases/tag/v0.7.0 Changes in the release are detailed in the link above. The bindings aim to play a similar role to the "pybind" python bindings in the ceph tree but for the Go language. These API bindings require the use of cgo. There are already a few consumers of this library in the wild, including the ceph-csi project. Specific questions, comments, bugs etc are best directed at our github issues tracker. -- John Mulligan phlogistonjohn(a)asynchrono.us jmulligan(a)redhat.com _______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

3 years, 4 months

1
0
0 0

Ceph on vector machines

by Bobby

Hi all, Just out of curiosity.....Considering vector machines are being used in HPC applications to accelerate certain kernels, do you think there are some workloads in Ceph that could be good candidates to be offloaded and accelerated on vector machines ? Thanks in advance..... BR

3 years, 4 months

1
0
0 0

CfP Software Defined Storage devroom

by Jan Fajerski

FOSDEM is a free software event that offers open source communities a place to meet, share ideas and collaborate. It is well known for being highly developer-oriented and in the past brought together 8000+ participants from all over the world. It's home is in the city of Brussels (Belgium). FOSDEM 2021 will take place as an online event during the weekend of February 6./7. 2021. More details about the event can be found at http://fosdem.org/ ** Call For Participation The Software Defined Storage devroom will go into it's fifth round for talks around Open Source Software Defined Storage projects, management tools and real world deployments. Presentation topics could include but are not limited too: - Your work on a SDS project like Ceph, Gluster, OpenEBS, CORTX or Longhorn - Your work on or with SDS related projects like OpenStack SWIFT or Container Storage Interface - Management tools for SDS deployments - Monitoring tools for SDS clusters ** Important dates: - Dec 27th 2020: submission deadline for talk proposals - Dec 31st 2020: announcement of the final schedule - Feb 6th 2021: Software Defined Storage dev room Talk proposals will be reviewed by a steering committee: - Niels de Vos (OpenShift Container Storage Developer - Red Hat) - Jan Fajerski (Ceph Developer - SUSE) - TBD Use the FOSDEM 'pentabarf' tool to submit your proposal: https://penta.fosdem.org/submission/FOSDEM21 - If necessary, create a Pentabarf account and activate it. Please reuse your account from previous years if you have already created it. https://penta.fosdem.org/user/new_account/FOSDEM21 - In the "Person" section, provide First name, Last name (in the "General" tab), Email (in the "Contact" tab) and Bio ("Abstract" field in the "Description" tab). - Submit a proposal by clicking on "Create event". - If you plan to register your proposal in several tracks to increase your chances, don't! Register your talk once, in the most accurate track. - Presentations have to be pre-recorded before the event and will be streamed on the event weekend. - Important! Select the "Software Defined Storage devroom" track (on the "General" tab). - Provide the title of your talk ("Event title" in the "General" tab). - Provide a description of the subject of the talk and the intended audience (in the "Abstract" field of the "Description" tab) - Provide a rough outline of the talk or goals of the session (a short list of bullet points covering topics that will be discussed) in the "Full description" field in the "Description" tab - Provide an expected length of your talk in the "Duration" field. We suggest a length between 15 and 45 minutes. ** For accepted talks Once your proposal is accepted we will assign you a volunteer deputy who will help you to produce the talk recording. The volunteer will also try to ensure the recording is of good quality, help with uploading it to the system, broadcasting it during the event and moderate the Q&A session after the broadcast. Please note that as a presenter you're expected to be available online during and especially after the broadcast of you talk. The schedule will be available under https://fosdem.org/2021/schedule/track/software_defined_storage/ Hope to hear from you soon! And please forward this announcement. If you have any further questions, please write to the mailing list at storage-devroom(a)lists.fosdem.org and we will try to answer as soon as possible. Thanks!

3 years, 4 months

1
0
0 0

12/03/2020 perf meeting is on at 8AM PST!

by Mark Nelson

Hi Folks, The weekly performance meeting will start in approx 10 minutes! The only topic we have for today so far is discussing the excessive PGLog memory usage some folks on the mailing list have been reporting recently. Please feel free to add your own topic as well. Hope to see you there! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark

3 years, 4 months

1
0
0 0

What would happen if bluefs_shared_alloc_size is not a multiple bluestore min_alloc_size?

by Xuehan Xu

Hi, everyone. Recently, one of our online clusters encountered a problem. When we are trying to add machines to an existing crush root, multiple pgs' states are stuck in unknown or activating or peering. Right now, we have resolved this problem by restarting those OSDs that are related to the inactive pgs, but the root cause is still unknown. On the other hand, we found that, in our online configuration, there is an entry "bluestore_min_alloc_size_hdd = 262144", and no "bluefs_shared_alloc_size" is configured which means it is the default value 64K. Normally, this configuration would trigger an error when creating osds. However we found that our online systems' version is 14.2.4, and it wouldn't trigger that error in this version. My question is: could this misconfiguration be the root cause of the problem mentioned above? Thanks:-)

3 years, 4 months

1
0
0 0

Paxos: monitor declare victory

by Liu, Changcheng

Hi all, I'm studying Paxos in Ceph because I need to add one new PaxosService. In Ceph, the Paxos is based on a single-proposer & multi-acceptors. So, the quorum need choose the single-proposer(leader) first. It seems that there're two ways to choose one monitor as leader: I'm curious that why their pre-conditions are different. ------------ This doesn't block me to continue my development work. I just want to know why. Does anyone know that reason? I. Normal way: Elector::handle_ack |--> logic.receive_ack(peer_rank, m->epoch); |--> declare_victory(); // Note: pre-condition is below ------------- electing_me && (acked_me.size() == elector->paxos_size()) II. Another way if timeout event happen: Elector::_start() or Elector::_defer_to |--> reset_timer(); |--> expire_event = mon->timer.add_event_after( g_conf()->mon_election_timeout + plus, new C_MonContext{ mon, [this](int) { logic.end_election_period(); } }); When timeout happens: ElectionLogic::end_election_period() |--> declare_victory(); // Note: pre-condition is below ------------- electing_me && acked_me.size() > (elector->paxos_size() / 2) B.R. Changcheng

3 years, 4 months

2
1
0 0

make check failing and api tests failing

by John Zachary Dover

https://github.com/ceph/ceph/pull/38403 I'm not sure what's causing the make check to fail and the API tests to fail, since this is only a documentation update and the changes made don't touch any of the API-related or make-check-related parts of the code in the repo, so I figured that it wouldn't be a terrible idea to report this here. Is anyone else getting similar failures? Zac Docs Ceph Upstream

3 years, 4 months

2
1
0 0

2024

2023

2022

2021

2020

2019

Dev December 2020