For developers submitting jobs using teuthology, we now have
recommendations on what priority level to use:
https://docs.ceph.com/docs/master/dev/developer_guide/#testing-priority
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
The office is a productivity suite, developed and maintained by one of the biggest companies, market leaders in technology, Microsoft Office. You must already have gotten your answer, Microsoft is not just another company that poses to be developing great software with just a team of 100 members but Microsoft is a big company that needs to keep its reputation and work always on par with the revenue they are getting. This means they need to hire the best people to get the job done and that’s how it works in Microsoft. There is no issue with Microsoft, but the only issue is that Microsoft has been the target for hackers for quite a while now, It is one of the easiest targets anyone can find when it comes to the customer base and data theft. The data Microsoft has stored on their cloud servers is enormous and is really important for most of the people.
office.com/setuphttps://www.officesetup.helpoffice.com/setuphttps://w-ww-office.com/setupmcafee.com/activatehttps://www.help-mcafee.memcafee.com/activatehttps://w-w-w-mcafee.com/activate
We're happy to announce the fourth bugfix release in the Octopus series.
In addition to a security fix in RGW, this release brings a range of fixes
across all components. We recommend that all Octopus users upgrade to this
release. For a detailed release notes with links & changelog please
refer to the official blog entry at https://ceph.io/releases/v15-2-4-octopus-released
Notable Changes
---------------
* CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration's ExposeHeader
(William Bowling, Adam Mohammed, Casey Bodley)
* Cephadm: There were a lot of small usability improvements and bug fixes:
* Grafana when deployed by Cephadm now binds to all network interfaces.
* `cephadm check-host` now prints all detected problems at once.
* Cephadm now calls `ceph dashboard set-grafana-api-ssl-verify false`
when generating an SSL certificate for Grafana.
* The Alertmanager is now correctly pointed to the Ceph Dashboard
* `cephadm adopt` now supports adopting an Alertmanager
* `ceph orch ps` now supports filtering by service name
* `ceph orch host ls` now marks hosts as offline, if they are not
accessible.
* Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with
a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace
nfs-ns::
ceph orch apply nfs mynfs nfs-ganesha nfs-ns
* Cephadm: `ceph orch ls --export` now returns all service specifications in
yaml representation that is consumable by `ceph orch apply`. In addition,
the commands `orch ps` and `orch ls` now support `--format yaml` and
`--format json-pretty`.
* Cephadm: `ceph orch apply osd` supports a `--preview` flag that prints a preview of
the OSD specification before deploying OSDs. This makes it possible to
verify that the specification is correct, before applying it.
* RGW: The `radosgw-admin` sub-commands dealing with orphans --
`radosgw-admin orphans find`, `radosgw-admin orphans finish`, and
`radosgw-admin orphans list-jobs` -- have been deprecated. They have
not been actively maintained and they store intermediate results on
the cluster, which could fill a nearly-full cluster. They have been
replaced by a tool, currently considered experimental,
`rgw-orphan-list`.
* RBD: The name of the rbd pool object that is used to store
rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule"
to "rbd_trash_purge_schedule". Users that have already started using
`rbd trash purge schedule` functionality and have per pool or namespace
schedules configured should copy "rbd_trash_trash_purge_schedule"
object to "rbd_trash_purge_schedule" before the upgrade and remove
"rbd_trash_purge_schedule" using the following commands in every RBD
pool and namespace where a trash purge schedule was previously
configured::
rados -p <pool-name> [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
rados -p <pool-name> [-N namespace] rm rbd_trash_trash_purge_schedule
or use any other convenient way to restore the schedule after the
upgrade.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.10.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 7447c15c6ff58d7fce91843b705a268a1917325c
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway
Hi Ceph maintainers and developers,
The objective of this is to discuss our work on a dmClock based client QoS management for CephFS.
Our group at LINE maintains Ceph storage clusters such as RGW, RBD, and CephFS to internally support OpenStack and K8S based private cloud environment for various applications and platforms including LINE messenger. We have seen that the RGW and RBD services can provide consistent performance to multiple active users since RGW employes the dmClock QoS scheduler for S3 clients and hypervisors internally utilize I/O throttler for VM block storage clients. Unfortunately, unlike RGW and RBD, CephFS clients can directly issue metadata requests to MDSs and filedata requests OSDs as they want. This situation occasionally (or frequently) happens and the other client performance may be degraded by the noisy neighbor. In the end, consistent performance cannot be guaranteed in our environment. From this observation and motivation, we are now considering the client QoS scheduler using the dmClock library for CephFS.
A few things about how to realize the QoS scheduler.
- Per subvolume QoS management. IOPS resources are only shared among the clients that mount the same root directory. QoS parameters can be easily configured through the extended attributes (similar to quota). Each dmClock scheduler can manage clients' requests using client session information.
- MDS QoS management. Client metadata requests like create, lookup, and etc. are managed by dmClock scheduler placed between the dispatcher and the main request handler (e.g., Server::handle_client_request()). We have observed that two active MDSs provide approximately 20KIOPS. As performance capacity is sometimes scarce for lots of clients, QoS management is needed for MDS.
- OSD QoS management. We would like to reopen and improve the previous work available at https://github.com/ceph/ceph/pull/20235.
- Client QoS management. Each client manages the dmClock tracker to keep track of both rho and delta to be packed to client request messages.
In case of the CLI, QoS parameters are configured using the extended attributes on each subvolume directory. Specifically, separate QoS configurations are considered for both MDSs and OSDs.
setfattr -n ceph.dmclock.mds_reservation -v 200 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
setfattr -n ceph.dmclock.mds_weight -v 500 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
setfattr -n ceph.dmclock.mds_limit -v 1000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
setfattr -n ceph.dmclock.osd_reservation -v 500 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
setfattr -n ceph.dmclock.osd_weight -v 1000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
setfattr -n ceph.dmclock.osd_limit -v 2000 /volumes/_nogroup/fdffc126-7961-4bbc-add2-2675b9e35a55
Our QoS work has been kicked off from the previous month. Our first step is to go over the prior work and dmClock algorithm/library. Now we are actively focusing on checking the feasibility of our idea with some modifications to MDS and ceph-fuse. Our development is planned as follows.
- dmClock scheduler will be integrated into MDS and ceph-fuse by December 2020.
- dmClock scheduler will be incorporated with OSD by the first half of the next year.
Does the community have any plan to develop per client QoS management? Are there any other issues related to our QoS work? We are looking forward to hearing your valuable comments and feedback at an early stage.
Thanks
Yongseok Oh
Asks:
-----
This mail is to trigger a discussion on the potential solution, provided
later below, for the issue as per the subject, and to possibly gather
other ideas/options, to enable the use case as described.
Use case/Background:
--------------------
Ceph is used by kubernetes to provide persistent storage (block and
file, via RBD and CephFS respectively) to pods, via the CSI interface
implemented in ceph-csi [1].
One of the use cases that we want to solve is when multiple kubernetes
clusters access the same Ceph storage cluster [2], and further these
kubernetes clusters provide for DR (disaster recovery) of workloads,
when a peer kubernetes cluster becomes unavailable.
IOW, if a workload is running on kubernetes cluster-a and has access to
persistent storage, it can be migrated to cluster-b in case of a DR
event in cluster-a, ensuring workload continuity and with it access to
the same persistent storage (as the Ceph cluster is shared and available).
Problem:
--------
The exact status of all client/nodes in kubernetes cluster-a on a DR
event is unknown, all maybe down or some may still be up and running,
still accessing storage.
This brings about the need to fence all IO from all
nodes/container-networks on cluster-a, on a DR event, prior to migrating
the workloads to cluster-b.
Existing solutions and issues:
------------------------------
Current schemes to fence IO are, per client [3] and further per image
for RBD. This makes it a prerequisite that all client addresses in
cluster-a are known and are further unique across peer kubernetes
clusters, for a fence/blocklist to be effective.
Also, during recovery of kubernetes cluster-a, as kubernetes uses
current known state of the world (i.e workload "was" running on this
cluster) and reconciles to the desired state of the world eventually, it
is possible that re-mounts may occur prior to reaching desired state of
the world (which would be not to run the said workloads on this cluster).
The recovery may hence cause the existing connection based blocklists to
be reset, as newer mounts/maps of the fs/image are performed on the
recovering cluster.
The issues as above, makes the existing blocklist scheme either
unreliable or cumbersome to deal with for all possible nodes in the
respective kubernetes clusters.
Potential solution:
-------------------
On discussing the above with Jason, he pointed out to a potential
solution (as follows) to resolve the problem,
<snip>
My suggestion would be to utilize CephX to revoke access to the cluster
from site A when site B is promoted. The one immediate issue with this
approach is that any clients with existing tickets will keep their
access to the cluster until the ticket expires. Therefore, for this to
be effective, we would need a transient CephX revocation list capability
to essentially blocklist CephX clients for X period of time until we can
be sure that their tickets have expired and are therefore no longer usable.
</snip>
The above is quite trivial from a kubernetes and ceph-csi POV, as each
peer kubernetes cluster can be configured to use different cephx
identities, and thus independently revoked and later reinstated, solving
the issues laid out above.
The ability to revoke credentials for an existing cephx identity can be
done if we change its existing authorization and hence is readily available.
The ability to provide a revocation list for existing valid tickets,
that clients already have, would need to be developed.
Thoughts and other options?
Thanks,
Shyam
[1] Ceph-csi: https://github.com/ceph/ceph-csi
[2] DR use case in ceph-csi: https://github.com/ceph/ceph-csi/pull/1558
[3] RBD exclusive locks and blocklists:
https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/
CephFS client eviction and blocklists:
https://docs.ceph.com/en/latest/cephfs/eviction/
Hi,
We noticed that if one set an osd crush weight using the command
ceph osd crush set $id $weight host=$host
it updates the osd weight on $host bucket, but does not update it on
the "class" bucket (${host}~hdd or ${host}~ssd), and as a result the
old weight is still used until one runs `ceph osd crush reweight-all`
or do some other changes that cause the crushmap recalculation.
The same behaviour is for `ceph osd crush reweight-subtree <name> <weight>`
command.
Is it expected behavior or should I report a bug to the tracker?
I would consider this is ok if I knew a way how to set a weight for a
"class" (xyz~hdd or ~ssd) bucket. When I try to use ${host}~ssd it
complains about invalid chars ~ in the bucket name.
--
Mykola Golub
I'm working on a F_SETLEASE implementation for kcephfs, and am hitting a
deadlock of sorts, due to a truncate triggering a cap revoke at an
inopportune time.
The issue is that truncates to a smaller size are always done via
synchronous call to the MDS, whereas a truncate larger does not if Fx
caps are held. That synchronous call causes the MDS to issue the client
a cap revoke for caps that the lease holds references on (Frw, in
particular).
The client code has been this way since the inception and I haven't been
able to locate any rationale for it. Some questions about this:
1) Why doesn't the client ever buffer a truncate to smaller size? It
seems like that is something that could be done without a synchronous
MDS call if we hold Fx caps.
2) The client setattr implementations set inode_drop values in the
MetaRequest, but as far as I can tell, those values end up being ignored
by the MDS. What purpose does inode_drop actually serve? Is this field
vestigial?
Thanks,
--
Jeff Layton <jlayton(a)redhat.com>
Hi Folks,
The weekly performance meeting is starting now! This week we have a
number of topics ranging from pg autoscaling, onode memory usage,
io_uring, and continued disucssion of rocksdb and pglog.
Hope to see you there!
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Thanks,
Mark
Hi,
See this issue: https://tracker.ceph.com/issues/47951
PR for Nautilus: https://github.com/ceph/ceph/pull/37816
This breaks a lot of Nautilus deployments I know of and might cause many
other users problems if they upgrade to .12
I would say this fix is big enough to quickly release .13.
What do others say?
Wido