Dev December 2020

dev@ceph.io

39 participants
36 discussions

by Willem Jan Withagen

Hi, Probably the first thing one learns in programming 1.01: do not direct compare floats for equal..... One of the tests fails on FreeBSD: 2 DecayCounter /tmp/typ-C2ukWb5CA /tmp/typ-VIQ7UBigy differ: char 16, line 2 **** DecayCounter test 1 dump_json check failed **** ceph-dencoder type DecayCounter select_test 1 dump_json > /tmp/typ-C2ukWb5CA ceph-dencoder type DecayCounter select_test 1 encode decode dump_json > /tmp/typ-VIQ7UBigy 2c2 < "value": 3, --- > "value": 2.9896058693769652, Nice job to fix this weekend. --WjW

3 years, 4 months

master branch 'make check' broken by 'dashboard-frontend-unittest'

by Ernesto Puerta

Hi all, We have this issue identified (https://tracker.ceph.com/issues/48441) and have a fix under CI testing (https://github.com/ceph/ceph/pull/38409). It only affects the 'master' branch. Issue analysis: - This PR (https://github.com/ceph/ceph/pull/37198) passed CI in Oct. 22. - This other PR (https://github.com/ceph/ceph/pull/37918) was merged later (Nov. 12). That one brought breaking changes to the previous. - The first PR was merged today. Ideas to avoid future issues like this: - Retrigger CI on CI-passing PRs without recent activity - Run master through CI for every PR merged (post-merge hook) Sorry for the inconvenience! Kind regards, Ernesto

3 years, 4 months

provisioning clients in teuthology with an extra local filesystem

by Jeff Layton

I've been working on a patch series to overhaul the fscache code in the kclient. I also have this (really old) tracker to add fscache testing to teuthology: https://tracker.ceph.com/issues/6373 It would be ideal if the clients in such testing had a dedicated filesystem mounted on /var/cache/fscache, so that if it fills up it doesn't take down the rootfs with it. We'll also need to have cachefilesd installed and running in the client hosts. Is it possible to do this in teuthology? How would I approach this? Thanks, -- Jeff Layton <jlayton(a)redhat.com>

3 years, 4 months

Fencing an entire client cluster from access to Ceph (in kubernetes)

by Shyam Ranganathan

Asks: ----- This mail is to trigger a discussion on the potential solution, provided later below, for the issue as per the subject, and to possibly gather other ideas/options, to enable the use case as described. Use case/Background: -------------------- Ceph is used by kubernetes to provide persistent storage (block and file, via RBD and CephFS respectively) to pods, via the CSI interface implemented in ceph-csi [1]. One of the use cases that we want to solve is when multiple kubernetes clusters access the same Ceph storage cluster [2], and further these kubernetes clusters provide for DR (disaster recovery) of workloads, when a peer kubernetes cluster becomes unavailable. IOW, if a workload is running on kubernetes cluster-a and has access to persistent storage, it can be migrated to cluster-b in case of a DR event in cluster-a, ensuring workload continuity and with it access to the same persistent storage (as the Ceph cluster is shared and available). Problem: -------- The exact status of all client/nodes in kubernetes cluster-a on a DR event is unknown, all maybe down or some may still be up and running, still accessing storage. This brings about the need to fence all IO from all nodes/container-networks on cluster-a, on a DR event, prior to migrating the workloads to cluster-b. Existing solutions and issues: ------------------------------ Current schemes to fence IO are, per client [3] and further per image for RBD. This makes it a prerequisite that all client addresses in cluster-a are known and are further unique across peer kubernetes clusters, for a fence/blocklist to be effective. Also, during recovery of kubernetes cluster-a, as kubernetes uses current known state of the world (i.e workload "was" running on this cluster) and reconciles to the desired state of the world eventually, it is possible that re-mounts may occur prior to reaching desired state of the world (which would be not to run the said workloads on this cluster). The recovery may hence cause the existing connection based blocklists to be reset, as newer mounts/maps of the fs/image are performed on the recovering cluster. The issues as above, makes the existing blocklist scheme either unreliable or cumbersome to deal with for all possible nodes in the respective kubernetes clusters. Potential solution: ------------------- On discussing the above with Jason, he pointed out to a potential solution (as follows) to resolve the problem, <snip> My suggestion would be to utilize CephX to revoke access to the cluster from site A when site B is promoted. The one immediate issue with this approach is that any clients with existing tickets will keep their access to the cluster until the ticket expires. Therefore, for this to be effective, we would need a transient CephX revocation list capability to essentially blocklist CephX clients for X period of time until we can be sure that their tickets have expired and are therefore no longer usable. </snip> The above is quite trivial from a kubernetes and ceph-csi POV, as each peer kubernetes cluster can be configured to use different cephx identities, and thus independently revoked and later reinstated, solving the issues laid out above. The ability to revoke credentials for an existing cephx identity can be done if we change its existing authorization and hence is readily available. The ability to provide a revocation list for existing valid tickets, that clients already have, would need to be developed. Thoughts and other options? Thanks, Shyam [1] Ceph-csi: https://github.com/ceph/ceph-csi [2] DR use case in ceph-csi: https://github.com/ceph/ceph-csi/pull/1558 [3] RBD exclusive locks and blocklists: https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/ CephFS client eviction and blocklists: https://docs.ceph.com/en/latest/cephfs/eviction/

3 years, 4 months

Proper build and test steps

by Duncan Bellamy

Hi, I am trying to enable tests for alpine linux and split in to separate build and test steps. If I run cmake with "-DWITH_TESTS=ON” It compiles the tests with "make all”, but it runs them as well. I have looked through the generated makefile and can’t see a target to just build ceph, the gentoo buildscript uses "make all" for build and "make check” for test. Is there any target that is just building ceph? Thanks, Duncan

3 years, 4 months

v15.2.7 Octopus released

by David Galloway

This is the 7th backport release in the Octopus series. This release fixes a serious bug in RGW that has been shown to cause data loss when a read of a large RGW object (i.e., one with at least one tail segment) takes longer than one half the time specified in the configuration option `rgw_gc_obj_min_wait`. The bug causes the tail segments of that read object to be added to the RGW garbage collection queue, which will in turn cause them to be deleted after a period of time. Changelog --------- * rgw: during GC defer, prevent new GC enqueue Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-15.2.7.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 88e41c6c49beb18add4fdb6b4326ca466d931db8

3 years, 4 months

2024

2023

2022

2021

2020

2019

Dev December 2020