March 2021 - Dev - lists.ceph.io

by David Galloway

All, The teuthology VM's disk keeps filling up. I'm cleaning up ceph.git clones pretty aggressively but it's still not enough. I need to grow the VM's disk and will need to reboot the host to do so. Does tomorrow work? Friday? @Josh, this could be a good opportunity to update paddles too. Preferably tomorrow if we're going to do that. Thanks, -- David Galloway Senior Systems Administrator Ceph Engineering

3 years, 1 month

2
1
0 0

Ceph Code Walkthroughs: RADOS Snapshots

by Mike Perez

Hi everyone, Our next Ceph Code Walkthroughs for March will be RADOS Snapshots by Samuel Just. The stream starts on March 23rd at 18:00 UTC / 19:00 CET / 1:00 PM EST / 10:00 AM PST https://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs See you then! -- Mike Perez

3 years, 1 month

1
2
0 0

DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC

by John Zachary Dover

There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC. We will discuss the Google Season of Docs proposal (the Comprehensive Contribution Guide), the rewriting of the cephadm documentation and the new sectin of the Teuthology Guide. DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC https://bluejeans.com/908675367 https://pad.ceph.com/p/Ceph_Documentation

3 years, 1 month

1
0
0 0

Sepia Satellite server is misbehaving

by David Galloway

Hey all, I made the mistake of trying debug the Satellite server on a Friday and now it's worse off than it was earlier. RHEL jobs are likely to fail. I'll try to poke at it more tomorrow. Keep an eye on https://status.sepia.ceph.com/incidents/3899 for updates. Sorry for the inconvenience. -- David Galloway Senior Systems Administrator Ceph Engineering

3 years, 1 month

1
0
0 0

fscrypt and file truncation on cephfs

by Jeff Layton

tl;dr version: in cephfs, the MDS handles truncating object data when inodes are truncated. This is problematic with fscrypt. Longer version: I've been working on a patchset to add fscrypt support to kcephfs, and have hit a problem with the way that truncation is handled. The main issue is that fscrypt uses block-based ciphers, so we must ensure that we read and write complete crypto blocks on the OSDs. I'm currently using 4k crypto blocks, but we may want to allow this to be tunable eventually (though it will need to be smaller than and align with the OSD object size). For simplicity's sake, I'm planning to disallow custom layouts on encrypted inodes. We could consider adding that later (but it doesn't sound likely to be worthwhile). Normally, when a file is truncated (usually via a SETATTR MDS call), the MDS handles truncating or deleting objects on the OSDs. This is done somewhat lazily in that the MDS replies to the client before this process is complete (AFAICT). Once we add fscrypt support, the MDS handling truncation becomes a problem, in that we need to be able to deal with complete crypto blocks. Letting the MDS truncate away part of a block will leave us with a block that can't be decrypted. There are a number of possible approaches to fixing this, but ultimately the client will have to zero-pad, encrypt and write the blocks at the edges since the MDS doesn't have access to the keys. There are several possible approaches that I've identified: 1/ We could teach the MDS the crypto blocksize, and ensure that it doesn't truncate away partial blocks. The client could tell the MDS what blocksize it's using on the inode and the MDS could ensure that truncates align to the blocks. The client will still need to write partial blocks at the edges of holes or at the EOF, and it probably shouldn't do that until it gets the unstable reply from the MDS. We could handle this by adding a new truncate op or extending the existing one. 2/ We could cede the object truncate/delete to the client altogether. The MDS is aware when an inode is encrypted so it could just not do it for those inodes. We also already handle hole punching completely on the client (though the size doesn't change there). Truncate could be a special case of that. Probably, the client would issue the truncate and then be responsible for deleting/rewriting blocks after that reply comes in. We'd have to consider how to handle delinquent clients that don't clean up correctly. 3/ We could maintain a separate field in the inode for the real inode->i_size that crypto-enabled clients would use. The client would always communicate a size to the MDS that is rounded up to the end of the last crypto block, such that the "true" size of the inode on disk would always be represented in the rstats. Only crypto-enabled clients would care about the "realsize" field. In fact, this value could _itself_ be encrypted too, so that the i_size of the file is masked from clients that don't have keys. Ceph's truncation machinery is pretty complex in general, so I could have missed other approaches or something that makes these ideas impossible. I'm leaning toward #3 here since I think it has the most benefit and keeps the MDS out of the whole business. What should we do here? -- Jeff Layton <jlayton(a)redhat.com>

3 years, 1 month

3
9
0 0

boost 1.74 and BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT

by Casey Bodley

hi Kefu, continuing our discussion from https://github.com/ceph/ceph/pull/40230 on the future of this BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT define to summarize the issue: in 1.66, boost::asio made a lot of changes for 'Networking TS compatibility', including the executors proposed therein. i raised this on ceph-devel in the thread "coming in boost 1.66" (see https://www.spinics.net/lists/ceph-devel/msg39243.html) meanwhile, the c++ standards committee was working on 'unified executors' proposals outside of the Networking TS, and networking was left out of c++20 so it could wait for a unified executor model instead of adding its own in 1.74, boost::asio added support for this new executor model, which its docs summarize well at https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/std_executors.html. a BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT option was added to preserve compatibility with existing code, so ceph now relies on this in several places to build against boost 1.74+ i've been hesitant to push for a conversion to this new model for two main reasons: * it's mostly internal to asio, so i don't see much benefit to changing as long as boost continues to support the TS executors * it's hard to tell how close it is to the 'final form' that we'll see in a future c++ standard, so later changes may require us to do another conversion does anyone else have a stake in this? if there's interest in working on it, i'm happy to help with review

3 years, 1 month

2
1
0 0

servicemap vs cephadm

by Sage Weil

Hi everyone, The non-core daemon registrations in servicemap vs cephadm came up twice in the last couple of weeks: First, https://github.com/ceph/ceph/pull/40035 changed rgw to register as rgw.$id.$gid and made cephadm complain about stray unmanaged daemons. The motivation was that the PR allows multiple radosgw daemons to share the same auth name + key and still show up in the servicemap. Then, today, I noticed that cephfs-mirror caused the same cephadm error because was registering as cephfs-mirror.$gid instead of the cephfs-mirror.$id that cephadm expected. I went to fix that in cephfs-mirror, but noticed that the behavior was copied from rbd-mirror.. which wasn't causing any cephadm error. It turns out that cephadm has some special code from rbd-mirror to identify daemons in the servicemap: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L4… So to fix cephfs-mirror, I opted to keep the existing behavior and adjust cephadm: https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f2… For now, at least, that solves the problem. But, as things stand rgw and {cephfs,rbd}-mirror are behaving a bit differently with servicemap. The registrations look like so: { "epoch": 538, "modified": "2021-03-18T17:28:12.500356-0400", "services": { "cephfs-mirror": { "daemons": { "summary": "", "4220": { "start_epoch": 501, "start_stamp": "2021-03-18T12:49:32.929888-0400", "gid": 4220, "addr": "10.3.64.25:0/3521332238", "metadata": { ... "id": "dael.csfspq", "instance_id": "4220", ... }, "task_status": {} } } }, "rbd-mirror": { "daemons": { "summary": "", "4272": { "start_epoch": 531, "start_stamp": "2021-03-18T16:31:26.540108-0400", "gid": 4272, "addr": "10.3.64.25:0/2576541551", "metadata": { ... "id": "dael.kfenmm", "instance_id": "4272", ... }, "task_status": {} }, "4299": { "start_epoch": 534, "start_stamp": "2021-03-18T16:52:59.027580-0400", "gid": 4299, "addr": "10.3.64.25:0/600966616", "metadata": { ... "id": "dael.yfhmmq", "instance_id": "4299", ... }, "task_status": {} } } }, "rgw": { "daemons": { "summary": "", "foo.dael.hwyogi": { "start_epoch": 537, "start_stamp": "2021-03-18T17:27:58.998535-0400", "gid": 4319, "addr": "10.3.64.25:0/3084463187", "metadata": { ... "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", "zone_name": "default", "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", "zonegroup_name": "default" }, "task_status": {} }, "foo.dael.pyvurh": { "start_epoch": 537, "start_stamp": "2021-03-18T17:27:58.999620-0400", "gid": 4318, "addr": "10.3.64.25:0/2303221705", "metadata": { ... "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", "zone_name": "default", "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", "zonegroup_name": "default" }, "task_status": {} }, "foo.dael.rqipjp": { "start_epoch": 538, "start_stamp": "2021-03-18T17:28:10.866327-0400", "gid": 4330, "addr": "10.3.64.25:0/4039152887", "metadata": { ... "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", "zone_name": "default", "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", "zonegroup_name": "default" }, "task_status": {} } } } } } With the *-mirror approach, the servicemap "key" is always the gid, and you have to look at the "id" to see how the daemon is named/authenticated. With rgw, the name is the key and there is no "id" key. I'm inclined to just go with the gid-as-key for rgw too and add the "id" key so that we are behaving consistently. This would have the side-effect of also solving the original goal of allowing many rgw daemons to share the same auth identity and still show up in the servicemap. The downside is that interpreting the service for the running daemons is a bit more work. For example, currently ceph -s shows services: mon: 1 daemons, quorum a (age 2d) mgr: x(active, since 58m) osd: 1 osds: 1 up (since 2d), 1 in (since 2d) cephfs-mirror: 1 daemon active (4220) rbd-mirror: 2 daemons active (4272, 4299) rgw: 2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh) Showing the gids there is clearly now what we want. But similarly showing the daemon names is probably also a bad idea since it won't scale beyond ~3 or so; we probably just want a simple count. Reasonable? sage

3 years, 1 month

3
3
0 0

system requirements to build pacific

by Ken Dreyer

Hi folks, I'm seeing some of our internal Red Hat builders going OOM and killing ceph builds. This is happening across architectures. Upstream our braggi builders have 48 vCPUs and 256GB of RAM. That's not small. What is the minimum memory and CPU requirement for building pacific? Internally, to use one ppc64le example, we're running with 14Gb RAM and 16 CPUs, and the RPM spec file chooses -j5, hitting OOM. We tuned mem_per_process from 2500 to 2700 a while back to alleviate this, but we're still hitting OOM consistently with the pacific branch now. - Ken

3 years, 1 month

3
3
0 0

Re: v15.2.10 Octopus released

by David Galloway

Terribly sorry for the mistake. There was a bug in the script I use to sync packages to download.ceph.com that wasn't listing directories in the desired order. That meant the download.ceph.com/{rpm,deb}-octopus symlinks still pointed to 15.2.9. This is fixed. I'm re-running the container jobs to get those pushed too. On 3/18/21 10:45 AM, David Orman wrote: > Hi David, > > The "For Packages" link in your email/the blog posts do not appear to > work. Additionally, we browsed the repo, and it doesn't appear the > packages are uploaded, at least for debian-octopus: > http://download.ceph.com/debian-octopus/pool/main/c/ceph/. We only use > the release packages for cephadm bootstrapping, so it's not a > deal-breaker for us, just wanted to give you a head's up. > > Cheers, > David Orman > > On Thu, Mar 18, 2021 at 9:11 AM David Galloway <dgallowa(a)redhat.com> wrote: >> >> We're happy to announce the 10th backport release in the Octopus series. >> We recommend users to update to this release. For a detailed release >> notes with links & changelog please refer to the official blog entry at >> https://ceph.io/releases/v15-2-10-octopus-released >> >> Notable Changes >> --------------- >> >> * The containers include an updated tcmalloc that avoids crashes seen on >> 15.2.9. See `issue#49618 <https://tracker.ceph.com/issues/49618>`_ for >> details. >> >> * RADOS: BlueStore handling of huge(>4GB) writes from RocksDB to BlueFS >> has been fixed. >> >> * When upgrading from a previous cephadm release, systemctl may hang >> when trying to start or restart the monitoring containers. (This is >> caused by a change in the systemd unit to use `type=forking`.) After the >> upgrade, please run:: >> >> ceph orch redeploy nfs >> ceph orch redeploy iscsi >> ceph orch redeploy node-exporter >> ceph orch redeploy prometheus >> ceph orch redeploy grafana >> ceph orch redeploy alertmanager >> >> >> Getting Ceph >> ------------ >> * Git at git://github.com/ceph/ceph.git >> * Tarball at http://download.ceph.com/tarballs/ceph-15.2.10.tar.gz >> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ >> * Release git sha1: 27917a557cca91e4da407489bbaa64ad4352cc02 >> _______________________________________________ >> Dev mailing list -- dev(a)ceph.io >> To unsubscribe send an email to dev-leave(a)ceph.io >

3 years, 1 month

1
0
0 0

v15.2.10 Octopus released

by David Galloway

We're happy to announce the 10th backport release in the Octopus series. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v15-2-10-octopus-released Notable Changes --------------- * The containers include an updated tcmalloc that avoids crashes seen on 15.2.9. See `issue#49618 <https://tracker.ceph.com/issues/49618>`_ for details. * RADOS: BlueStore handling of huge(>4GB) writes from RocksDB to BlueFS has been fixed. * When upgrading from a previous cephadm release, systemctl may hang when trying to start or restart the monitoring containers. (This is caused by a change in the systemd unit to use `type=forking`.) After the upgrade, please run:: ceph orch redeploy nfs ceph orch redeploy iscsi ceph orch redeploy node-exporter ceph orch redeploy prometheus ceph orch redeploy grafana ceph orch redeploy alertmanager Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-15.2.10.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 27917a557cca91e4da407489bbaa64ad4352cc02

3 years, 1 month

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev March 2021