Hello Ceph Devs,
I looked into the CephFS kernel client and now I am confused about what happens on the cluster side when the client sends its requests. In particular I am interested in finding out what code is called then.
For the longest time I though that the clients commands like read(), write() on the client side etc get mapped to the functions ceph_read(), ceph_write() in the libcephfs.cc file.
But now I think that the kernel client and libcephfs.cc are two completely disjunct from one another correct?
And is there any relation between the kernel client and the client found in src/client/client.cc ?
Greetings
Hi guys!This is a function for MDS (src/mds/MDCache.cc) that prints out the
rss and other memory stats every couple seconds at debug level 2.I am
trying to create a similar function for printing memory and other stats for
OSD. (will be adding the code in src/osd/OSD:tick() function),can anyone
provide a basic guidelines about what all classes, functions I will need to
use and what all will I require for achieving so?Any help would be
extremely valuable, Thanks for your time!
void MDCache::check_memory_usage()
{
static MemoryModel mm(g_ceph_context);
static MemoryModel::snap last;
mm.sample(&last);
static MemoryModel::snap baseline = last;
// check client caps
ceph_assert(CInode::count() == inode_map.size() +
snap_inode_map.size() + num_shadow_inodes);
double caps_per_inode = 0.0;
if (CInode::count())
caps_per_inode = (double)Capability::count() / (double)CInode::count();
dout(2) << "Memory usage: "
<< " total " << last.get_total()
<< ", rss " << last.get_rss()
<< ", heap " << last.get_heap()
<< ", baseline " << baseline.get_heap()
<< ", " << num_inodes_with_caps << " / " << CInode::count() << "
inodes have caps"
<< ", " << Capability::count() << " caps, " << caps_per_inode << "
caps per inode"
<< dendl;
mds->update_mlogger();
mds->mlogger->set(l_mdm_rss, last.get_rss());
mds->mlogger->set(l_mdm_heap, last.get_heap());
}
[ Moving from ceph-devel@vger to dev@ceph ]
Do we need to change the protocol so you can provide a limit on the
number of extents when you send a sparse read op?
I rather suspect the workloads generating this are the worst case
scenario and that we may want server-side limits in any case
On Mon, Nov 6, 2023 at 5:02 AM Xiubo Li <xiubli(a)redhat.com> wrote:
>
>
> On 11/6/23 20:32, Ilya Dryomov wrote:
> > On Mon, Nov 6, 2023 at 1:15 PM Xiubo Li <xiubli(a)redhat.com> wrote:
> >>
> >> On 11/6/23 19:38, Ilya Dryomov wrote:
> >>> On Mon, Nov 6, 2023 at 1:14 AM Xiubo Li <xiubli(a)redhat.com> wrote:
> >>>> On 11/3/23 18:07, Ilya Dryomov wrote:
> >>>>
> >>>> On Fri, Nov 3, 2023 at 4:41 AM <xiubli(a)redhat.com> wrote:
> >>>>
> >>>> From: Xiubo Li <xiubli(a)redhat.com>
> >>>>
> >>>> There is no any limit for the extent array size and it's possible
> >>>> that when reading with a large size contents. Else the messager
> >>>> will fail by reseting the connection and keeps resending the inflight
> >>>> IOs.
> >>>>
> >>>> URL: https://tracker.ceph.com/issues/62081
> >>>> Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
> >>>> ---
> >>>> net/ceph/osd_client.c | 12 ------------
> >>>> 1 file changed, 12 deletions(-)
> >>>>
> >>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >>>> index 7af35106acaf..177a1d92c517 100644
> >>>> --- a/net/ceph/osd_client.c
> >>>> +++ b/net/ceph/osd_client.c
> >>>> @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr)
> >>>> }
> >>>> #endif
> >>>>
> >>>> -#define MAX_EXTENTS 4096
> >>>> -
> >>>> static int osd_sparse_read(struct ceph_connection *con,
> >>>> struct ceph_msg_data_cursor *cursor,
> >>>> char **pbuf)
> >>>> @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con,
> >>>>
> >>>> if (count > 0) {
> >>>> if (!sr->sr_extent || count > sr->sr_ext_len) {
> >>>> - /*
> >>>> - * Apply a hard cap to the number of extents.
> >>>> - * If we have more, assume something is wrong.
> >>>> - */
> >>>> - if (count > MAX_EXTENTS) {
> >>>> - dout("%s: OSD returned 0x%x extents in a single reply!\n",
> >>>> - __func__, count);
> >>>> - return -EREMOTEIO;
> >>>> - }
> >>>> -
> >>>> /* no extent array provided, or too short */
> >>>> kfree(sr->sr_extent);
> >>>> sr->sr_extent = kmalloc_array(count,
> >>>> --
> >>>> 2.39.1
> >>>>
> >>>> Hi Xiubo,
> >>>>
> >>>> As noted in the tracker ticket, there are many "sanity" limits like
> >>>> that in the messenger and other parts of the kernel client. First,
> >>>> let's change that dout to pr_warn_ratelimited so that it's immediately
> >>>> clear what is going on.
> >>>>
> >>>> Sounds good to me.
> >>>>
> >>>> Then, if the limit actually gets hit, let's
> >>>> dig into why and see if it can be increased rather than just removed.
> >>>>
> >>>> The test result in https://tracker.ceph.com/issues/62081#note-16 is that I just changed the 'len' to 5000 in ceph PR https://github.com/ceph/ceph/pull/54301 to emulate a random writes to a file and then tries to read with a large size:
> >>>>
> >>>> [ RUN ] LibRadosIoPP.SparseReadExtentArrayOpPP
> >>>> extents array size : 4297
> >>>>
> >>>> For the 'extents array size' it could reach up to a very large number, then what it should be ? Any idea ?
> >>> Hi Xiubo,
> >>>
> >>> I don't think it can be a very large number in practice.
> >>>
> >>> CephFS uses sparse reads only in the fscrypt case, right?
> >> Yeah, it is.
> >>
> >>
> >>> With
> >>> fscrypt, sub-4K writes to the object store aren't possible, right?
> >> Yeah.
> >>
> >>
> >>> If the answer to both is yes, then the maximum number of extents
> >>> would be
> >>>
> >>> 64M (CEPH_MSG_MAX_DATA_LEN) / 4K = 16384
> >>>
> >>> even if the object store does tracking at byte granularity.
> >> So yeah, just for the fscrypt case if we set the max extent number to
> >> 16384 it should be fine. But the sparse read could also be enabled in
> >> non-fscrypt case.
> > Ah, I see that it's also exposed as a mount option. If we expect
> > people to use it, then dropping MAX_EXTENTS altogether might be the
> > best approach after all -- it doesn't make sense to warn about
> > something that we don't really control.
> >
> > How about printing the number of extents only if the allocation fails?
> > Something like:
> >
> > if (!sr->sr_extent) {
> > pr_err("failed to allocate %u sparse read extents\n", count);
> > return -ENOMEM;
> > }
>
> Yeah, this also looks good to me.
>
> I will fix it.
>
> Thanks
>
> - Xiubo
>
>
> > Thanks,
> >
> > Ilya
> >
>
>
IMHO we don't need yet another place to look for information, especially one that some operators never see. ymmv.
>
>> Hello,
>>
>> We wanted to get some feedback on one of the features that we are planning
>> to bring in for upcoming releases.
>>
>> On the Ceph GUI, we thought it could be interesting to show information
>> regarding the community events, ceph release information (Release notes and
>> changelogs) and maybe even notify about new blog post releases and also
>> inform regarding the community group meetings. There would be options to
>> subscribe to the events that you want to get notified.
>>
>> Before proceeding with its implementation, we thought it'd be good to get
>> some community feedback around it. So please let us know what you think
>> (the goods and the bads).
>>
>> Regards,
>> --
>>
>> Nizamudeen A
Hi Folks,
I won't be able to make it this morning. We'll reconvene next week.
Thanks,
Mark
--
Best Regards,
Mark Nelson
Head of R&D (USA)
Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson(a)clyso.com
We are hiring: https://www.clyso.com/jobs/
Hello,
We wanted to get some feedback on one of the features that we are planning
to bring in for upcoming releases.
On the Ceph GUI, we thought it could be interesting to show information
regarding the community events, ceph release information (Release notes and
changelogs) and maybe even notify about new blog post releases and also
inform regarding the community group meetings. There would be options to
subscribe to the events that you want to get notified.
Before proceeding with its implementation, we thought it'd be good to get
some community feedback around it. So please let us know what you think
(the goods and the bads).
Regards,
--
Nizamudeen A
Software Engineer
Red Hat <https://www.redhat.com/>
<https://www.redhat.com/>
Hi folks,
I am reading RocksDBStore.cc <http://rocksdbstore.cc/> and trying to understand how the iterator works. I found myself positively puzzled by the implementation of lower_bound() and upper_bound() of CFIteratorImpl. It seems to be that upper_bound() doesn’t really set the upper bound for iteration. Instead, it just resets the lower bound, presumably to a more advanced position after the initial lower_bound() call, since it calls lower_bound(). It doesn’t reset the true upper bound, which is set in iterate_upper_bound and used by RocksDB.
Is my understanding correct or am I missing something?
Thanks,
Yixin
Hello all,
Here are the minutes from today's meeting.
- New time for CDM APAC to increase participation
- 9.30 - 11.30 pm PT seems like the most popular based on
https://doodle.com/meeting/participate/id/aM9XGZ3a/vote
- One more week for more feedback; please ask more APAC folks to suggest
their preferred times.
- [Ernesto] Revamp Ansible/Ceph-Ansible for non-containerized users?
- open nebula / proxmox
- solicit maintainers for ceph-ansible on the ML
- 18.2.1
- yuri: approval email sent out a few days ago; waiting on some approvals
- Blocker:
- https://tracker.ceph.com/issues/63391
- lab upgrades (Laura will help Yuri coordinate)
- Next Pacific release being worked on in background by Yuri.
- https://pad.ceph.com/p/pacific_16.2.15
- Try v16.2.15 milestone to help prune PRs
- https://github.com/ceph/ceph/milestone/17
- [Nizam] Ceph News Ticker - Ceph Dashboard
- Notify when new release is available (display changelogs)
- Display important ceph events
- CVEs, critical bug fixes
- Maybe newly added blog posts or informations regarding the upcoming
group meetings?
- User + Dev meeting next week
- Topics include migration between EC profiles and challenges related to
RGW zone replication
- Casey can attend end of meeting
- open nebula folks planning to do webinar; looking for speakers
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hi,
When Reef was released, the announcement said that Debian packages would
be built once the blocking bug in Bookworm was fixed. As I noted on the
tracker item https://tracker.ceph.com/issues/61845 a couple of weeks
ago, that is now the case after the most recent Bookworm point release.
I also opened a PR to make the minimal change that would build Reef
packages on Bookworm[0]. I subsequently opened another PR to fix some
low-hanging fruit in terms of packaging errors - missing #! in
maintscripts, syntax errors in debian/control, erroneous dependencies on
Essential packages[1]. Neither PR has had any feedback/review as far as
I can see.
Those packages (and the previous state of the debian/ tree) had some
significant problems - no copyright file, and some of them contain
python scripts without declaring a python dependency, so I've today
submitted a slightly larger PR that brings the dh compatibility level up
to what I think the latest lowest-common-denominator level is, as well
as fixing these errors[2].
I believe these changes all ought to go into the reef branch, but
obviously you might prefer to just make the bare-minimum-to-build change
in the first PR.
Is there any chance of having some reef packages for Bookworm, please?
Relatedly, is there interest in further packaging fixes for future
branches? lintian still has quite a lot to say about the .debs for Ceph,
and while you might reasonably not want to care about crossing every t
of Debian policy, I think there are still changes that would be worth
doing...
I should declare a bit of an interest here - I'd like to evaluate
cephadm for work use, which would require us to be able to build our own
packages per local policy[3], which in turn would mean I'd want to get
Debian-based images going again. But that requires Reef .debs being
available to install onto said images :)
Thanks,
Matthew
[0] https://github.com/ceph/ceph/pull/53342
[1] https://github.com/ceph/ceph/pull/53397
[2] https://github.com/ceph/ceph/pull/53546
[3] https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Production_images
Hi folks,
RocksDB has a feature to help facilitate the use cases that involve common prefixes in the object names. It requires the option prefix_extractor to be defined so it knows how to figure out prefixes and currently auto_prefix_mode in ReadOptions is the recommended use of this feature. I don’t seem to find anywhere in ceph code that tries to make use of it. Has this feature been explored by the team?
Thanks,
Yixin