November 2023 - Dev - lists.ceph.io

CephFS Kernel Client response on Ceph side

by gmohren＠hotmail.de

Hello Ceph Devs, I looked into the CephFS kernel client and now I am confused about what happens on the cluster side when the client sends its requests. In particular I am interested in finding out what code is called then. For the longest time I though that the clients commands like read(), write() on the client side etc get mapped to the functions ceph_read(), ceph_write() in the libcephfs.cc file. But now I think that the kernel client and libcephfs.cc are two completely disjunct from one another correct? And is there any relation between the kernel client and the client found in src/client/client.cc ? Greetings

5 months, 3 weeks

1
0
0 0

check_memory_usage() recreation in OSD:tick()

by Suyash Dongre

Hi guys!This is a function for MDS (src/mds/MDCache.cc) that prints out the rss and other memory stats every couple seconds at debug level 2.I am trying to create a similar function for printing memory and other stats for OSD. (will be adding the code in src/osd/OSD:tick() function),can anyone provide a basic guidelines about what all classes, functions I will need to use and what all will I require for achieving so?Any help would be extremely valuable, Thanks for your time! void MDCache::check_memory_usage() { static MemoryModel mm(g_ceph_context); static MemoryModel::snap last; mm.sample(&last); static MemoryModel::snap baseline = last; // check client caps ceph_assert(CInode::count() == inode_map.size() + snap_inode_map.size() + num_shadow_inodes); double caps_per_inode = 0.0; if (CInode::count()) caps_per_inode = (double)Capability::count() / (double)CInode::count(); dout(2) << "Memory usage: " << " total " << last.get_total() << ", rss " << last.get_rss() << ", heap " << last.get_heap() << ", baseline " << baseline.get_heap() << ", " << num_inodes_with_caps << " / " << CInode::count() << " inodes have caps" << ", " << Capability::count() << " caps, " << caps_per_inode << " caps per inode" << dendl; mds->update_mlogger(); mds->mlogger->set(l_mdm_rss, last.get_rss()); mds->mlogger->set(l_mdm_heap, last.get_heap()); }

5 months, 3 weeks

3
2
0 0

sparse read extent limits (was: libceph: remove the max extents check for sparse read)

by Gregory Farnum

[ Moving from ceph-devel@vger to dev@ceph ] Do we need to change the protocol so you can provide a limit on the number of extents when you send a sparse read op? I rather suspect the workloads generating this are the worst case scenario and that we may want server-side limits in any case On Mon, Nov 6, 2023 at 5:02 AM Xiubo Li <xiubli(a)redhat.com> wrote: > > > On 11/6/23 20:32, Ilya Dryomov wrote: > > On Mon, Nov 6, 2023 at 1:15 PM Xiubo Li <xiubli(a)redhat.com> wrote: > >> > >> On 11/6/23 19:38, Ilya Dryomov wrote: > >>> On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli(a)redhat.com> wrote: > >>>> On 11/3/23 18:07, Ilya Dryomov wrote: > >>>> > >>>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli(a)redhat.com> wrote: > >>>> > >>>> From: Xiubo Li <xiubli(a)redhat.com> > >>>> > >>>> There is no any limit for the extent array size and it's possible > >>>> that when reading with a large size contents. Else the messager > >>>> will fail by reseting the connection and keeps resending the inflight > >>>> IOs. > >>>> > >>>> URL: https://tracker.ceph.com/issues/62081 > >>>> Signed-off-by: Xiubo Li <xiubli(a)redhat.com> > >>>> --- > >>>> net/ceph/osd_client.c | 12 ------------ > >>>> 1 file changed, 12 deletions(-) > >>>> > >>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > >>>> index 7af35106acaf..177a1d92c517 100644 > >>>> --- a/net/ceph/osd_client.c > >>>> +++ b/net/ceph/osd_client.c > >>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr) > >>>> } > >>>> #endif > >>>> > >>>> -#define MAX_EXTENTS 4096 > >>>> - > >>>> static int osd_sparse_read(struct ceph_connection *con, > >>>> struct ceph_msg_data_cursor *cursor, > >>>> char **pbuf) > >>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con, > >>>> > >>>> if (count > 0) { > >>>> if (!sr->sr_extent || count > sr->sr_ext_len) { > >>>> - /* > >>>> - * Apply a hard cap to the number of extents. > >>>> - * If we have more, assume something is wrong. > >>>> - */ > >>>> - if (count > MAX_EXTENTS) { > >>>> - dout("%s: OSD returned 0x%x extents in a single reply!\n", > >>>> - __func__, count); > >>>> - return -EREMOTEIO; > >>>> - } > >>>> - > >>>> /* no extent array provided, or too short */ > >>>> kfree(sr->sr_extent); > >>>> sr->sr_extent = kmalloc_array(count, > >>>> -- > >>>> 2.39.1 > >>>> > >>>> Hi Xiubo, > >>>> > >>>> As noted in the tracker ticket, there are many "sanity" limits like > >>>> that in the messenger and other parts of the kernel client. First, > >>>> let's change that dout to pr_warn_ratelimited so that it's immediately > >>>> clear what is going on. > >>>> > >>>> Sounds good to me. > >>>> > >>>> Then, if the limit actually gets hit, let's > >>>> dig into why and see if it can be increased rather than just removed. > >>>> > >>>> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size: > >>>> > >>>> [ RUN ] LibRadosIoPP.SparseReadExtentArrayOpPP > >>>> extents array size : 4297 > >>>> > >>>> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ? > >>> Hi Xiubo, > >>> > >>> I don't think it can be a very large number in practice. > >>> > >>> CephFS uses sparse reads only in the fscrypt case, right? > >> Yeah, it is. > >> > >> > >>> With > >>> fscrypt, sub-4K writes to the object store aren't possible, right? > >> Yeah. > >> > >> > >>> If the answer to both is yes, then the maximum number of extents > >>> would be > >>> > >>> 64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384 > >>> > >>> even if the object store does tracking at byte granularity. > >> So yeah, just for the fscrypt case if we set the max extent number to > >> 16384 it should be fine. But the sparse read could also be enabled in > >> non-fscrypt case. > > Ah, I see that it's also exposed as a mount option. If we expect > > people to use it, then dropping MAX_EXTENTS altogether might be the > > best approach after all -- it doesn't make sense to warn about > > something that we don't really control. > > > > How about printing the number of extents only if the allocation fails? > > Something like: > > > > if (!sr->sr_extent) { > > pr_err("failed to allocate %u sparse read extents\n", count); > > return -ENOMEM; > > } > > Yeah, this also looks good to me. > > I will fix it. > > Thanks > > - Xiubo > > > > Thanks, > > > > Ilya > > > >

6 months

3
2
0 0

Re: [ceph-users] Ceph Dashboard - Community News Sticker [Feedback]

by Anthony D'Atri

IMHO we don't need yet another place to look for information, especially one that some operators never see. ymmv. > >> Hello, >> >> We wanted to get some feedback on one of the features that we are planning >> to bring in for upcoming releases. >> >> On the Ceph GUI, we thought it could be interesting to show information >> regarding the community events, ceph release information (Release notes and >> changelogs) and maybe even notify about new blog post releases and also >> inform regarding the community group meetings. There would be options to >> subscribe to the events that you want to get notified. >> >> Before proceeding with its implementation, we thought it'd be good to get >> some community feedback around it. So please let us know what you think >> (the goods and the bads). >> >> Regards, >> -- >> >> Nizamudeen A

6 months

2
1
0 0

11/9/2023 perf meeting is canceled!

by Mark Nelson

Hi Folks, I won't be able to make it this morning. We'll reconvene next week. Thanks, Mark -- Best Regards, Mark Nelson Head of R&D (USA) Clyso GmbH p: +49 89 21552391 12 a: Loristraße 8 | 80335 München | Germany w: https://clyso.com | e: mark.nelson(a)clyso.com We are hiring: https://www.clyso.com/jobs/

6 months

1
0
0 0

Ceph Dashboard - Community News Sticker [Feedback]

by Nizamudeen A

Hello, We wanted to get some feedback on one of the features that we are planning to bring in for upcoming releases. On the Ceph GUI, we thought it could be interesting to show information regarding the community events, ceph release information (Release notes and changelogs) and maybe even notify about new blog post releases and also inform regarding the community group meetings. There would be options to subscribe to the events that you want to get notified. Before proceeding with its implementation, we thought it'd be good to get some community feedback around it. So please let us know what you think (the goods and the bads). Regards, -- Nizamudeen A Software Engineer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>

6 months

1
0
0 0

CFIteratorImpl lower_bound and upper_bound

by Yixin Jin

Hi folks, I am reading RocksDBStore.cc <http://rocksdbstore.cc/> and trying to understand how the iterator works. I found myself positively puzzled by the implementation of lower_bound() and upper_bound() of CFIteratorImpl. It seems to be that upper_bound() doesn’t really set the upper bound for iteration. Instead, it just resets the lower bound, presumably to a more advanced position after the initial lower_bound() call, since it calls lower_bound(). It doesn’t reset the true upper bound, which is set in iterate_upper_bound and used by RocksDB. Is my understanding correct or am I missing something? Thanks, Yixin

6 months

1
0
0 0

Ceph Leadership Team Weekly Meeting Minutes 2023-11-08

by Patrick Donnelly

Hello all, Here are the minutes from today's meeting. - New time for CDM APAC to increase participation - 9.30 - 11.30 pm PT seems like the most popular based on https://doodle.com/meeting/participate/id/aM9XGZ3a/vote - One more week for more feedback; please ask more APAC folks to suggest their preferred times. - [Ernesto] Revamp Ansible/Ceph-Ansible for non-containerized users? - open nebula / proxmox - solicit maintainers for ceph-ansible on the ML - 18.2.1 - yuri: approval email sent out a few days ago; waiting on some approvals - Blocker: - https://tracker.ceph.com/issues/63391 - lab upgrades (Laura will help Yuri coordinate) - Next Pacific release being worked on in background by Yuri. - https://pad.ceph.com/p/pacific_16.2.15 - Try v16.2.15 milestone to help prune PRs - https://github.com/ceph/ceph/milestone/17 - [Nizam] Ceph News Ticker - Ceph Dashboard - Notify when new release is available (display changelogs) - Display important ceph events - CVEs, critical bug fixes - Maybe newly added blog posts or informations regarding the upcoming group meetings? - User + Dev meeting next week - Topics include migration between EC profiles and challenges related to RGW zone replication - Casey can attend end of meeting - open nebula folks planning to do webinar; looking for speakers -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

6 months

1
0
0 0

Debian packages for Reef - any chance of reviews / builds?

by Matthew Vernon

Hi, When Reef was released, the announcement said that Debian packages would be built once the blocking bug in Bookworm was fixed. As I noted on the tracker item https://tracker.ceph.com/issues/61845 a couple of weeks ago, that is now the case after the most recent Bookworm point release. I also opened a PR to make the minimal change that would build Reef packages on Bookworm[0]. I subsequently opened another PR to fix some low-hanging fruit in terms of packaging errors - missing #! in maintscripts, syntax errors in debian/control, erroneous dependencies on Essential packages[1]. Neither PR has had any feedback/review as far as I can see. Those packages (and the previous state of the debian/ tree) had some significant problems - no copyright file, and some of them contain python scripts without declaring a python dependency, so I've today submitted a slightly larger PR that brings the dh compatibility level up to what I think the latest lowest-common-denominator level is, as well as fixing these errors[2]. I believe these changes all ought to go into the reef branch, but obviously you might prefer to just make the bare-minimum-to-build change in the first PR. Is there any chance of having some reef packages for Bookworm, please? Relatedly, is there interest in further packaging fixes for future branches? lintian still has quite a lot to say about the .debs for Ceph, and while you might reasonably not want to care about crossing every t of Debian policy, I think there are still changes that would be worth doing... I should declare a bit of an interest here - I'd like to evaluate cephadm for work use, which would require us to be able to build our own packages per local policy[3], which in turn would mean I'd want to get Debian-based images going again. But that requires Reef .debs being available to install onto said images :) Thanks, Matthew [0] https://github.com/ceph/ceph/pull/53342 [1] https://github.com/ceph/ceph/pull/53397 [2] https://github.com/ceph/ceph/pull/53546 [3] https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Production_images

6 months

11
18
0 0

Prefix seek of rocksdb

by Yixin Jin

Hi folks, RocksDB has a feature to help facilitate the use cases that involve common prefixes in the object names. It requires the option prefix_extractor to be defined so it knows how to figure out prefixes and currently auto_prefix_mode in ReadOptions is the recommended use of this feature. I don’t seem to find anywhere in ceph code that tries to make use of it. Has this feature been explored by the team? Thanks, Yixin

6 months

2
1
0 0

2024

2023

2022

2021

2020

2019

Dev November 2023