For developers submitting jobs using teuthology, we now have
recommendations on what priority level to use:
https://docs.ceph.com/docs/master/dev/developer_guide/#testing-priority
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hi Dimitri,
Right now we're using CentOS for the base image for Ceph containers.
Now that CentOS is moving to a rolling-upgrade-esque style release
with CentOS Stream, it's an open question if we should stick with it.
A more stable base image that gets reliable security fixes would be
preferable. One thought is to use Red Hat's Universal Base Image (UBI)
[1] which is just RHEL-lite with a target audience of upstream
projects. Or perhaps we can select another base image.
What do you think?
[1] https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
We're happy to announce the 22nd and likely final backport release in the Nautilus series. Ultimately, we recommend all users upgrade to newer Ceph releases.
For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2021/v14-2-22-nautilus-released
Notable Changes
---------------
* This release sets `bluefs_buffered_io` to true by default to improve performance
for metadata heavy workloads. Enabling this option has been reported to
occasionally cause excessive kernel swapping under certain workloads.
Currently, the most consistent performing combination is to enable
bluefs_buffered_io and disable system level swap.
* The default value of `bluestore_cache_trim_max_skip_pinned` has been
increased to 1000 to control memory growth due to onodes.
* Several other bug fixes in BlueStore, including a fix for an unexpected
ENOSPC bug in Avl/Hybrid allocators.
* The trimming logic in the monitor has been made dynamic, with the
introduction of `paxos_service_trim_max_multiplier`, a factor by which
`paxos_service_trim_max` is multiplied to make trimming faster,
when required. Setting it to 0 disables the upper bound check for trimming
and makes the monitors trim at the maximum rate.
* A `--max <n>` option is available with the `osd ok-to-stop` command to
provide up to N OSDs that can be stopped together without making PGs
unavailable.
* OSD: the option `osd_fast_shutdown_notify_mon` has been introduced to allow
the OSD to notify the monitor it is shutting down even if `osd_fast_shutdown`
is enabled. This helps with the monitor logs on larger clusters, that may get
many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools.
* A long-standing bug that prevented 32-bit and 64-bit client/server
interoperability under msgr v2 has been fixed. In particular, mixing armv7l
(armhf) and x86_64 or aarch64 servers in the same cluster now works.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-14.2.22.tar.gz
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: ca74598065096e6fcbd8433c8779a2be0c889351
Hi Cephers,
These are the topics discussed at today's meeting (
https://pad.ceph.com/p/clt-weekly-minutes):
- *Attendees:*
- Casey, David, Ilya, Josh, Neha, Patrick, Sage, Sebastian, Yehuda,
Yuri, Zac, Ernesto.
- *Releases:*
- Nautilus release notes: https://github.com/ceph/ceph/pull/41958
- Pacific release status:
- Yuri: PRs to get merged today/tomorrow.
- For OpenStack team to start testing it.
- *Community*:
- *Website:*
- New Ceph website is out: (https://github.com/ceph/ceph.io).
- Old one is still available at https://old.ceph.com
- *Ceph-users threads:*
- Cephadm Containers
- Sage: What can we do there? Identify most impacting issues.
- Non-linux platforms? FreeBSD?
- Sebastian: FreeBSD could support non-systemd
containerization.
- Patrick: Concerns raised about debuggability. To ensure
legacy experience (logs, etc.).
- *Other:*
- Ceph on Windows contract
- Already signed.
- Biweekly sync (David).
- Pipeline to produce Windows artifacts.
- *Developers:*
- *Lab:*
- CY22 Red Hat budget planning due 7/23
- Smithee replacement budget (issues with NVMe cards).
- Patrick: More developer machines? VMs (needed for package
isolation)? Containers are not enough for advanced use cases.
- David: That falls under Linux Foundation capex.
- Sebastian: proposed to explore developing and building inside
a container.
- David: All flash systems available (
https://wiki.sepia.ceph.com/doku.php?id=hardware:folio)
- *PRs:*
- Dashboard help on removing CRUSH ruleset references:
https://github.com/ceph/ceph/pull/42041
- *Redmine:*
- Ilya: can't assign new contributors to specific trackers.
- Patrick: there are 2 developer groups: the Ceph one
(everyone) and the Core one (mostly Red Hat, SUSE, etc.,
which had access
to private/embargoed trackers).
- *Events:*
- Josh: Kubecon? No clear travel guidelines issued yet.
Kind Regards,
Ernesto
+ dev(a)ceph.io
On Tue, Jun 29, 2021 at 5:19 PM liucx changxi <liucxer(a)gmail.com> wrote:
>
> dear kchai:
hi Changxi,
> I am a new developer of ceph, and I am developing qos related content recently,Can you help me?
> dmclock implements qos by controlling the priority of the queue。
this is not accurate. in addition to "priority", i.e. weight or
proportion, there are two other parameters in this picture --
reservation and limit. each client is configured with its own weight,
reservation and limit. that's why "mClock" is prefixed with "m"
(multiple), i think, as there are multiple clocks driven by these 3
parameters.
> The read op is awaited synchronously.
> BlueStore::_do_read() {
> ....
> bdev->aio_submit(&ioc);
> dout(20) << __func__ << " waiting for aio" << dendl;
> ioc.aio_wait();
> ....
> }
> But write io is performed asynchronously.
> How does dmclock achieve qos when writing io?
currently, we are only using mClock for QoS in OSD. its distributed
variant, dmClock is not used yet at the time of writing. also, we
don't do QoS in BlueStore. instead, it's implemented in the pipeline
at a higher level. you might want to take a look at how OSDShard and
mClockScheduler interact. basically, they prioritize requests based on
the predefined profile [0] and settings, so that the ones which need
higher QoS have a better chance of being executed sooner. please note,
the requests being scheduled by OSDShard's scheduler include not only
I/O requests, but also requests like pg peering requests.
HTH
---
[0] https://ceph.io/community/qos-study-with-mclock-and-wpq-schedulers/
[1] https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/
>
>
> --- liucxer
Details of this release summarized here:
https://tracker.ceph.com/issues/51268#note-1
At the time of approval and review process we'd like also review and
approve the Release Notes
https://github.com/ceph/ceph/pull/41958
Seeking review/approval:
rgw - Casey?
rbd - Ilya?
krbd - Ilya?
fs, kcephfs, mutlimds - Patrick, Ramana (please unusually high amount
of reds in fs, maybe infra - David FYI )
ceph-deploy - Josh ?
upgrade/client-upgrade-* - Josh, concur w/rhel 7.6 excluding ?
upgrade/mimic-x (nautilus) - Josh?
upgrade/nautilus-x (octopus) - Josh, Casey?
ceph-volume - Guillaume ?
Thx
YuriW
[Correct subject]
Hi all,
Does anyone know how to claim-append part of bufferlist? After the head part has been claim-appended, the left part can be still cliam-appended.
I know there's bufferlist::claim_append API. However it will claim append the whole bufferlist.
The background is:
I use one function handle_io_am_write_request to recevie the network data sent by peer node, then append it into the cache data space(e.g. bufferlist object recv_pending_bl).
Then I trigger the up software layer to read the received the data in recv_pending_bl.
The code is below (you can also click the above link to read the code, no more than 40 lines).
There're several bugs in below code. I'm looking for the high efficiency method to receive the data and trigger the up software layer to read the data in the right way.
Any suggesion is welcome to supply the high efficiency method to do it.
B.R.
Changcheng
Recently (or not so recently, it's been almost 2 years), the nfs-ganesha
project implemented capability to utilize asynchronous non-blocking I/O
to storage backends to prevent thread starvation. The assumption is that
the backend provides non-blocking I/O with a callback mechanism to
notify nfs-ganesha when the I/O is complete so that nfs-ganesha can
subsequently asynchronously respond to the client indicating I/O completion.
Ceph looks like it is structured to allow for such with Context objects
having finish and complete methods that allow the I/O path to notify
completion. In general libcephfs seems to use some form of condition
variable Context to block and wait for this notification. This would be
relatively easy to replace with a call back Context.
However, libcephfs does use ObjectCacher and sets the
block_writes_upfront flag which seems to make any writes that go through
ObjectCacher to block using an internal condition variable and not
utilize the onfreespace Context object (which maybe should have been
named onfinish?).
I'm wondering what the implication of setting block_writes_upfront to
false would be for libcephs beyond needing to assure an onfreespace
Context object is passed.
Thanks
Frank Filz