All,
The teuthology VM's disk keeps filling up. I'm cleaning up ceph.git
clones pretty aggressively but it's still not enough. I need to grow
the VM's disk and will need to reboot the host to do so.
Does tomorrow work? Friday?
@Josh, this could be a good opportunity to update paddles too.
Preferably tomorrow if we're going to do that.
Thanks,
--
David Galloway
Senior Systems Administrator
Ceph Engineering
Hi everyone,
Our next Ceph Code Walkthroughs for March will be RADOS Snapshots by
Samuel Just.
The stream starts on March 23rd at 18:00 UTC / 19:00 CET / 1:00 PM EST
/ 10:00 AM PST
https://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs
See you then!
--
Mike Perez
There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC.
We will discuss the Google Season of Docs proposal (the Comprehensive
Contribution Guide), the rewriting of the cephadm documentation and the new
sectin of the Teuthology Guide.
DocuBetter Meeting -- APAC
25 Mar 2021
0100 UTC
https://bluejeans.com/908675367https://pad.ceph.com/p/Ceph_Documentation
Hey all,
I made the mistake of trying debug the Satellite server on a Friday and
now it's worse off than it was earlier. RHEL jobs are likely to fail.
I'll try to poke at it more tomorrow.
Keep an eye on https://status.sepia.ceph.com/incidents/3899 for updates.
Sorry for the inconvenience.
--
David Galloway
Senior Systems Administrator
Ceph Engineering
tl;dr version: in cephfs, the MDS handles truncating object data when
inodes are truncated. This is problematic with fscrypt.
Longer version:
I've been working on a patchset to add fscrypt support to kcephfs, and
have hit a problem with the way that truncation is handled. The main
issue is that fscrypt uses block-based ciphers, so we must ensure that
we read and write complete crypto blocks on the OSDs.
I'm currently using 4k crypto blocks, but we may want to allow this to
be tunable eventually (though it will need to be smaller than and align
with the OSD object size). For simplicity's sake, I'm planning to
disallow custom layouts on encrypted inodes. We could consider adding
that later (but it doesn't sound likely to be worthwhile).
Normally, when a file is truncated (usually via a SETATTR MDS call), the
MDS handles truncating or deleting objects on the OSDs. This is done
somewhat lazily in that the MDS replies to the client before this
process is complete (AFAICT).
Once we add fscrypt support, the MDS handling truncation becomes a
problem, in that we need to be able to deal with complete crypto blocks.
Letting the MDS truncate away part of a block will leave us with a block
that can't be decrypted.
There are a number of possible approaches to fixing this, but ultimately
the client will have to zero-pad, encrypt and write the blocks at the
edges since the MDS doesn't have access to the keys.
There are several possible approaches that I've identified:
1/ We could teach the MDS the crypto blocksize, and ensure that it
doesn't truncate away partial blocks. The client could tell the MDS what
blocksize it's using on the inode and the MDS could ensure that
truncates align to the blocks. The client will still need to write
partial blocks at the edges of holes or at the EOF, and it probably
shouldn't do that until it gets the unstable reply from the MDS. We
could handle this by adding a new truncate op or extending the existing
one.
2/ We could cede the object truncate/delete to the client altogether.
The MDS is aware when an inode is encrypted so it could just not do it
for those inodes. We also already handle hole punching completely on the
client (though the size doesn't change there). Truncate could be a
special case of that. Probably, the client would issue the truncate and
then be responsible for deleting/rewriting blocks after that reply comes
in. We'd have to consider how to handle delinquent clients that don't
clean up correctly.
3/ We could maintain a separate field in the inode for the real
inode->i_size that crypto-enabled clients would use. The client would
always communicate a size to the MDS that is rounded up to the end of
the last crypto block, such that the "true" size of the inode on disk
would always be represented in the rstats. Only crypto-enabled clients
would care about the "realsize" field. In fact, this value could
_itself_ be encrypted too, so that the i_size of the file is masked from
clients that don't have keys.
Ceph's truncation machinery is pretty complex in general, so I could
have missed other approaches or something that makes these ideas
impossible. I'm leaning toward #3 here since I think it has the most
benefit and keeps the MDS out of the whole business.
What should we do here?
--
Jeff Layton <jlayton(a)redhat.com>
hi Kefu,
continuing our discussion from https://github.com/ceph/ceph/pull/40230
on the future of this BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT define
to summarize the issue:
in 1.66, boost::asio made a lot of changes for 'Networking TS
compatibility', including the executors proposed therein. i raised
this on ceph-devel in the thread "coming in boost 1.66" (see
https://www.spinics.net/lists/ceph-devel/msg39243.html)
meanwhile, the c++ standards committee was working on 'unified
executors' proposals outside of the Networking TS, and networking was
left out of c++20 so it could wait for a unified executor model
instead of adding its own
in 1.74, boost::asio added support for this new executor model, which
its docs summarize well at
https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/std_executors.html.
a BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT option was added to preserve
compatibility with existing code, so ceph now relies on this in
several places to build against boost 1.74+
i've been hesitant to push for a conversion to this new model for two
main reasons:
* it's mostly internal to asio, so i don't see much benefit to
changing as long as boost continues to support the TS executors
* it's hard to tell how close it is to the 'final form' that we'll see
in a future c++ standard, so later changes may require us to do
another conversion
does anyone else have a stake in this? if there's interest in working
on it, i'm happy to help with review
Hi everyone,
The non-core daemon registrations in servicemap vs cephadm came up
twice in the last couple of weeks:
First, https://github.com/ceph/ceph/pull/40035 changed rgw to register
as rgw.$id.$gid and made cephadm complain about stray unmanaged
daemons. The motivation was that the PR allows multiple radosgw
daemons to share the same auth name + key and still show up in the
servicemap.
Then, today, I noticed that cephfs-mirror caused the same cephadm
error because was registering as cephfs-mirror.$gid instead of the
cephfs-mirror.$id that cephadm expected. I went to fix that in
cephfs-mirror, but noticed that the behavior was copied from
rbd-mirror.. which wasn't causing any cephadm error. It turns out
that cephadm has some special code from rbd-mirror to identify daemons
in the servicemap:
https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L4…
So to fix cephfs-mirror, I opted to keep the existing behavior and
adjust cephadm:
https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f2…
For now, at least, that solves the problem. But, as things stand rgw
and {cephfs,rbd}-mirror are behaving a bit differently with
servicemap. The registrations look like so:
{
"epoch": 538,
"modified": "2021-03-18T17:28:12.500356-0400",
"services": {
"cephfs-mirror": {
"daemons": {
"summary": "",
"4220": {
"start_epoch": 501,
"start_stamp": "2021-03-18T12:49:32.929888-0400",
"gid": 4220,
"addr": "10.3.64.25:0/3521332238",
"metadata": {
...
"id": "dael.csfspq",
"instance_id": "4220",
...
},
"task_status": {}
}
}
},
"rbd-mirror": {
"daemons": {
"summary": "",
"4272": {
"start_epoch": 531,
"start_stamp": "2021-03-18T16:31:26.540108-0400",
"gid": 4272,
"addr": "10.3.64.25:0/2576541551",
"metadata": {
...
"id": "dael.kfenmm",
"instance_id": "4272",
...
},
"task_status": {}
},
"4299": {
"start_epoch": 534,
"start_stamp": "2021-03-18T16:52:59.027580-0400",
"gid": 4299,
"addr": "10.3.64.25:0/600966616",
"metadata": {
...
"id": "dael.yfhmmq",
"instance_id": "4299",
...
},
"task_status": {}
}
}
},
"rgw": {
"daemons": {
"summary": "",
"foo.dael.hwyogi": {
"start_epoch": 537,
"start_stamp": "2021-03-18T17:27:58.998535-0400",
"gid": 4319,
"addr": "10.3.64.25:0/3084463187",
"metadata": {
...
"zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
"zone_name": "default",
"zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
"zonegroup_name": "default"
},
"task_status": {}
},
"foo.dael.pyvurh": {
"start_epoch": 537,
"start_stamp": "2021-03-18T17:27:58.999620-0400",
"gid": 4318,
"addr": "10.3.64.25:0/2303221705",
"metadata": {
...
"zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
"zone_name": "default",
"zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
"zonegroup_name": "default"
},
"task_status": {}
},
"foo.dael.rqipjp": {
"start_epoch": 538,
"start_stamp": "2021-03-18T17:28:10.866327-0400",
"gid": 4330,
"addr": "10.3.64.25:0/4039152887",
"metadata": {
...
"zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
"zone_name": "default",
"zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
"zonegroup_name": "default"
},
"task_status": {}
}
}
}
}
}
With the *-mirror approach, the servicemap "key" is always the gid,
and you have to look at the "id" to see how the daemon is
named/authenticated. With rgw, the name is the key and there is no
"id" key.
I'm inclined to just go with the gid-as-key for rgw too and add the
"id" key so that we are behaving consistently. This would have the
side-effect of also solving the original goal of allowing many rgw
daemons to share the same auth identity and still show up in the
servicemap.
The downside is that interpreting the service for the running daemons
is a bit more work. For example, currently ceph -s shows
services:
mon: 1 daemons, quorum a (age 2d)
mgr: x(active, since 58m)
osd: 1 osds: 1 up (since 2d), 1 in (since 2d)
cephfs-mirror: 1 daemon active (4220)
rbd-mirror: 2 daemons active (4272, 4299)
rgw: 2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh)
Showing the gids there is clearly now what we want. But similarly
showing the daemon names is probably also a bad idea since it won't
scale beyond ~3 or so; we probably just want a simple count.
Reasonable?
sage
Hi folks,
I'm seeing some of our internal Red Hat builders going OOM and killing
ceph builds. This is happening across architectures.
Upstream our braggi builders have 48 vCPUs and 256GB of RAM. That's not small.
What is the minimum memory and CPU requirement for building pacific?
Internally, to use one ppc64le example, we're running with 14Gb RAM
and 16 CPUs, and the RPM spec file chooses -j5, hitting OOM. We tuned
mem_per_process from 2500 to 2700 a while back to alleviate this, but
we're still hitting OOM consistently with the pacific branch now.
- Ken
Terribly sorry for the mistake. There was a bug in the script I use to
sync packages to download.ceph.com that wasn't listing directories in
the desired order. That meant the download.ceph.com/{rpm,deb}-octopus
symlinks still pointed to 15.2.9. This is fixed.
I'm re-running the container jobs to get those pushed too.
On 3/18/21 10:45 AM, David Orman wrote:
> Hi David,
>
> The "For Packages" link in your email/the blog posts do not appear to
> work. Additionally, we browsed the repo, and it doesn't appear the
> packages are uploaded, at least for debian-octopus:
> http://download.ceph.com/debian-octopus/pool/main/c/ceph/. We only use
> the release packages for cephadm bootstrapping, so it's not a
> deal-breaker for us, just wanted to give you a head's up.
>
> Cheers,
> David Orman
>
> On Thu, Mar 18, 2021 at 9:11 AM David Galloway <dgallowa(a)redhat.com> wrote:
>>
>> We're happy to announce the 10th backport release in the Octopus series.
>> We recommend users to update to this release. For a detailed release
>> notes with links & changelog please refer to the official blog entry at
>> https://ceph.io/releases/v15-2-10-octopus-released
>>
>> Notable Changes
>> ---------------
>>
>> * The containers include an updated tcmalloc that avoids crashes seen on
>> 15.2.9. See `issue#49618 <https://tracker.ceph.com/issues/49618>`_ for
>> details.
>>
>> * RADOS: BlueStore handling of huge(>4GB) writes from RocksDB to BlueFS
>> has been fixed.
>>
>> * When upgrading from a previous cephadm release, systemctl may hang
>> when trying to start or restart the monitoring containers. (This is
>> caused by a change in the systemd unit to use `type=forking`.) After the
>> upgrade, please run::
>>
>> ceph orch redeploy nfs
>> ceph orch redeploy iscsi
>> ceph orch redeploy node-exporter
>> ceph orch redeploy prometheus
>> ceph orch redeploy grafana
>> ceph orch redeploy alertmanager
>>
>>
>> Getting Ceph
>> ------------
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at http://download.ceph.com/tarballs/ceph-15.2.10.tar.gz
>> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
>> * Release git sha1: 27917a557cca91e4da407489bbaa64ad4352cc02
>> _______________________________________________
>> Dev mailing list -- dev(a)ceph.io
>> To unsubscribe send an email to dev-leave(a)ceph.io
>
We're happy to announce the 10th backport release in the Octopus series.
We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/releases/v15-2-10-octopus-released
Notable Changes
---------------
* The containers include an updated tcmalloc that avoids crashes seen on
15.2.9. See `issue#49618 <https://tracker.ceph.com/issues/49618>`_ for
details.
* RADOS: BlueStore handling of huge(>4GB) writes from RocksDB to BlueFS
has been fixed.
* When upgrading from a previous cephadm release, systemctl may hang
when trying to start or restart the monitoring containers. (This is
caused by a change in the systemd unit to use `type=forking`.) After the
upgrade, please run::
ceph orch redeploy nfs
ceph orch redeploy iscsi
ceph orch redeploy node-exporter
ceph orch redeploy prometheus
ceph orch redeploy grafana
ceph orch redeploy alertmanager
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-15.2.10.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 27917a557cca91e4da407489bbaa64ad4352cc02