Hi everyone,
Mark your calendars for April 6 - 7 for the Ceph Developer Summit!
The plan is to have a virtual meeting style with a loose schedule
based on technical developer-focused discussions for the next software
release, Quincy. Each day (2-3 hours) will have development
discussions focused around a particular Ceph component.
The format for these sessions is primarily discussions, but
presentations are ok for visual diagrams, as an example. Birds of a
feather and other hallway-type tracks are acceptable as well.
Please follow this etherpad and the Ceph Dev mailing list for further
updates on exact start times and meeting link. Session proposals will
also be collected on this etherpad:
https://pad.ceph.com/p/cds-quincy
--
Mike Perez
Hello everyone,
I'm a newbee Ceph developer, and recenlty reading code of snapshots. From pg_pool_t::add_unmanaged_snap, it's obvious that the first RBD snapshot id should start from 2, but in reality, it starts from 4, I wonder whether there are some organisms in RBD snap, which increments snap_seq, could anyone help me? Thanks in advance! Below is the code of pg_pool_t::add_unmanaged_snap.
void pg_pool_t::add_unmanaged_snap(uint64_t& snapid)
{
ceph_assert(!is_pool_snaps_mode());
if (snap_seq == 0) {
// kludge for pre-mimic tracking of pool vs selfmanaged snaps. after
// mimic this field is not decoded but our flag is set; pre-mimic, we
// have a non-empty removed_snaps to signifiy a non-pool-snaps pool.
removed_snaps.insert(snapid_t(1));
snap_seq = 1;
}
flags |= FLAG_SELFMANAGED_SNAPS;
snapid = snap_seq = snap_seq + 1;
}
Best regards,
ghost
Greetings,
I'm Mohamed Mansour, a Data Engineer at IBM and a Master's degree student
in the Computer Engineering Department - Faculty of Engineering - Cairo
University.
I would like to apply to google summer of code to work on one of these 3
projects with you:
1- OPTIMIZING CEPH TESTING
2- REPLACE CEPHADM'S SSH PYTHON LIBRARY
3- RGW: S3 SDK COMPATABILITY
Kindly find my attached CV and tell me which project you see fits me better
and I will send the proposal for it.
Thanks in advance
*Eng. Mohamed Mansour(EG+) 20 112 003 3329(EG+) 20 106 352 6328Data
Engineer*
Dear Sirs,
My name is Magzum Assanbayev, I am a Master Student at KIMEP University in
Kazakhstan, expected to graduate in Spring 2022.
Having made some research into your organization I have deduced that my
current skill set might be suitable to your needs.
Out of what I can offer, I have been practicing data analytics for 2.5
years at PwC Competency Centre in Audit and Assurance department dealing
with audit automation with an Alteryx data analytics software. The software
allows seamless big data manipulation and output, and has its own community
sharing the ideas among others, please see link:
https://community.alteryx.com/?category.id=external
As a track record, I can state that the workflows developed under my
supervision have cut significant hours of repetitive work for audit teams,
ranging from 5-20% per audit engagement with positive user feedback. The
range of work done varies from automating mathematical accuracy check of
consolidation reports to disclosure recompilation as well as journal entry
analysis based on a predefined audit criteria.
The software is popular among largest corporate brands in the world which
proves its value for cost and actuality in a competitive market of data
analytics software. The evident users of the software are Big 4 audit
companies, Google itself, Coca-Cola, Deutsche Bank, etc.
In addition, I have an entry-level acquaintance with Python and Excel VBA,
having completed 'Crash Course on Python' and 'Excel/VBA for Creative
Problem Solving, Part 1' courses on Coursera (see certificates attached).
Please note that both courses were completed during the busy season under
heavy audit workload being full-time employed at PwC. My eagerness and
ability to learn is backed up by my Bachelor GPA of 4.31/4.33 at KIMEP,
where I studied Finance and Accounting. I was also awarded scholarships and
stipends for academic achievement for several years.
If this letter has caught your eye and made you interested, I am happy to
brainstorm any potential projects that we can do together during the
upcoming summer!
Please let me know by replying to this email.
Thank you!
On Thu, Apr 8, 2021 at 11:24 AM Robert LeBlanc <robert(a)leblancnet.us> wrote:
>
> On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc <robert(a)leblancnet.us> wrote:
> >
> > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. Yesterday our monitor cluster went nuts and started constantly calling elections because monitor nodes were at 100% and wouldn't respond to heartbeats. I reduced the monitor cluster to one to prevent the constant elections and that let the system limp along until the backfills finished. There are large amounts of time where ceph commands hang with the CPU is at 100%, when the CPU drops I see a lot of work getting done in the monitor logs which stops as soon as the CPU is at 100% again.
> >
> > I did a `perf top` on the node to see what's taking all the time and it appears to be in the rocksdb code path. I've set `mon_compact_on_start = true` in the ceph.conf but that does not appear to help. The `/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the backfills were going on. I've tried adding a second monitor, but it goes back to the constant elections. I tried restarting all the services without luck. I also pulled the monitor from the network work and tried restarting the mon service isolated (this helped a couple of weeks ago when `ceph -s` would cause 100% CPU and lock up the service much worse than this) and didn't see the high CPU load. So I'm guessing it's triggered from some external source.
> >
> > I'm happy to provide more info, just let me know what would be helpful.
>
> Sent this to the dev list, but forgot it needed to be plain text. Here
> is text output of the `perf top` taken a bit later, so not exactly the
> same as the screenshot earlier.
>
> Samples: 20M of event 'cycles', 4000 Hz, Event count (approx.):
> 61966526527 lost: 0/0 drop: 0/0
> Overhead Shared Object Symbol
> 11.52% ceph-mon [.]
> rocksdb::MemTable::KeyComparator::operator()
> 6.80% ceph-mon [.]
> rocksdb::MemTable::KeyComparator::operator()
> 4.75% ceph-mon [.]
> rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator
> const&>::FindGreaterOrEqual
> 2.89% libc-2.27.so [.] vfprintf
> 2.54% libtcmalloc.so.4.3.0 [.] tc_deletearray_nothrow
> 2.31% ceph-mon [.] TLS init
> function for rocksdb::perf_context
> 2.14% ceph-mon [.] rocksdb::DBImpl::GetImpl
> 1.53% libc-2.27.so [.] 0x000000000018acf8
> 1.44% libc-2.27.so [.] _IO_default_xsputn
> 1.34% ceph-mon [.] memcmp@plt
> 1.32% libtcmalloc.so.4.3.0 [.] tc_malloc
> 1.28% ceph-mon [.] rocksdb::Version::Get
> 1.27% libc-2.27.so [.] 0x000000000018abf4
> 1.17% ceph-mon [.] RocksDBStore::get
> 1.08% ceph-mon [.] 0x0000000000639a33
> 1.04% ceph-mon [.] 0x0000000000639a0e
> 0.89% ceph-mon [.] 0x0000000000639a46
> 0.86% ceph-mon [.] rocksdb::TableCache::Get
> 0.72% libc-2.27.so [.] 0x000000000018abfe
> 0.68% libceph-common.so.0 [.] ceph_str_hash_rjenkins
> 0.66% ceph-mon [.] rocksdb::Hash
> 0.63% ceph-mon [.] rocksdb::MemTable::Get
> 0.62% ceph-mon [.] 0x00000000006399ff
> 0.57% libc-2.27.so [.] 0x000000000018abf0
> 0.57% ceph-mon [.]
> rocksdb::GetContext::GetContext
> 0.57% ceph-mon [.]
> rocksdb::BlockBasedTable::Get
> 0.57% ceph-mon [.]
> rocksdb::BlockBasedTable::GetFilter
> 0.55% [vdso] [.] __vdso_clock_gettime
> 0.54% ceph-mon [.] 0x00000000005afa17
> 0.53% ceph-mgr [.]
> std::_Rb_tree<pg_t, pg_t, std::_Identity<pg_t>, std::less<pg_t>,
> std::allocator<pg_t> >::equal_range
> 0.51% libceph-common.so.0 [.] PerfCounters::tinc
> 0.50% ceph-mon [.]
> OSDMonitor::make_snap_epoch_key[abi:cxx11]
Okay, I think I sent it to the old dev list. Trying again.
Thank you,
Robert LeBlanc
Hi,
Context: one of our users is mounting 350 ceph kernel PVCs per 30GB VM
and they notice "memory pressure".
When planning for k8s hosts, what would be a reasonable limit on the
number of ceph kernel PVCs to mount per host? If one kernel mounts the
same cephfs several times (with different prefixes), we observed that
this is a unique client session. But does the ceph module globally
share a single copy of cluster metadata, e.g. osdmaps, or is that all
duplicated per session? Can anyone estimate how much memory is
consumed by each mount (assuming it is a client of an O(1k) osd ceph
cluster)?
Also, k8s makes it trivial for a user to mount a single PVC from
hundreds or thousands of clients. Suppose we wanted to be able to
limit the number of clients per PVC -- Do you think a new
`max_sessions=N` cephx cap would be the best approach for this?
Best Regards,
Dan
Hi everyone,
I cleaned up the CFP coordination etherpad with some events coming up.
Please add other events you think the community should be considering
proposing content on Ceph or adjacent projects like Rook.
KubeCon NA CFP, for example, is ending April 11. Take a look:
https://pad.ceph.com/p/cfp-coordination
I have also added this to our wiki for discovery.
https://tracker.ceph.com/projects/ceph/wiki/Community
--
Mike Perez
Hi all,
This email talks about how to design
1) ReplicaDaemon:
The daemon, running on the host with DCPMM & RNIC(RDMA-NIC),
reports what kind of info to Ceph/Monitor.
2) ReplicaMonitor:
ReplicaMonitor, one new PaxosService in Ceph/Monitor, manage the
ReplicaDaemons' info and deal with librbd's request to select
the appropriate ReplicaDaemons' info to librbd.
This email doesn't talk about:
After librbd get the ReplicaDaemons' info, how librbd will communite
with ReplicaDaemon and how to finish the replication.
RFC PR: [WIP] aggregate client state and route info
https://github.com/ceph/ceph/pull/37931
Detail:
+-----------------------------------+ +-----------------------------------------------+
|+---------------------------------+| | +--------------------+|
|| ReplicaDaemonInfo: || | |PaxosServiceMessage ||
|| || |+---------------------------------------------+|
|| daemon_id; || ||MReplicaDaemonBlink(MSG_REPLICADAEMON_BLINK):||
|| rnic_bind_port; || || ||
|| rnic_addr; || ||ReplicaDaemonInfo; ||
|| free_size; || |+---------------------------------------------+|
|+---------------------------------+| | +--------------------+|
|+---------------------------------+| | |PaxosServiceMessage ||
|| ReqReplicaDaemonInfo: || |+---------------------------------------------+|
|| || ||MMonGetReplicaDaemonMap(CEPH_MSG_MON_GET_REPL||
|| replicas; || ||ICADAEMONMAP): ||
|| replica_size; || || ||
|+---------------------------------+| ||ReqReplicaDaemonInfo; ||
|+---------------------------------+| |+---------------------------------------------++
|| ReplicaDaemonMap: || | +-------+|
|| || | |Message||
|| std::vector<ReplicaDaemonInfo>; || |+---------------------------------------------+|
|+---------------------------------+| ||MReplicaDaemonMap(CEPH_MSG_REPLICADAEMON_MAP)||
| MetaData(need encode/decode) | || ||
| | || ||
| | ||ReplicaDaemonMap; ||
| | |+---------------------------------------------+|
| | | |
| | | Three messages defined for the MetaData |
+-----------------------------------+ +-----------------------------------------------+
+--------+ +------------+
|Dispatch| |PaxosService|
+---------------------+ Update ReplicaDaemonInfo +---------------------------+
| ReplicaDaemon: | through | ReplicaMonitor: |
| | MReplicaDaemonBlink | |
| ReplicaDaemonInfo; -----------------------------------> ReplicaDaemonMap; |
| | | |
| ms_dispatch; | | //Need implement some APIs|
+---------------------+ +------^-------------|------+
Request ReplicaDaemonMap Feedback ReplicaDaemonMap
through | |through
MMonGetReplicaDaemonMap MReplicaDaemonMap
+------|-------------v------+
| librbd |
+---------------------------+
ReplicaDaemon reports ReplicaDaemonInfo to ReplicaMonitor by MReplicaDaemonBlink message.
ReplicaMonitor store all the ReplicaDaemonInfo into ReplicaDaemonMap after going through Paxos.
The client(librbd) send MMonGetReplicaDaemonMap to ReplicaMonitor, ReplicaMonitor will
choose the approprite ReplicaDaemon and pack all the info to new ReplicaDaemonMap to send
back to the client by MReplicaDaemonMap message;
B.R.
Changcheng
Hi Folks,
This week we are having the Quincy Ceph Developer Summit and the perf
meeting overlaps with some of the sessions on Thursday. Let's cancel
and give those sessions priority. Have a great week everyone!
Thanks,
Mark