On Jun 2, 2023, at 10:09, Stefan Kooman
<stefan(a)bit.nl> wrote:
On 5/26/23 23:09, Alexander E. Patrakov wrote:
Hello Frank,
On Fri, May 26, 2023 at 6:27 PM Frank Schilder <frans(a)dtu.dk> wrote:
Hi all,
jumping on this thread as we have requests for which per-client fs mount encryption makes
a lot of sense:
What kind of security to you want to achieve with
encryption keys stored
on the server side?
One of the use cases is if a user requests a share with encryption at rest. Since
encryption has an unavoidable performance impact, it is impractical to make 100% of users
pay for the requirements that only 1% of users really have. Instead of all-OSD back-end
encryption hitting everyone for little reason, encrypting only some user-buckets/fs-shares
on the front-end application level will ensure that the data is encrypted at rest.
I would disagree about the unavoidable performance impact of at-rest
encryption of OSDs. Read the CloudFlare blog article which shows how
they make the encryption impact on their (non-Ceph) drives negligible:
https://blog.cloudflare.com/speeding-up-linux-disk-encryption/. The
main part of their improvements (the ability to disable dm-crypt
workqueues) is already in the mainline kernel. There is also a Ceph
pull request that disables dm-crypt workqueues on certain drives:
https://github.com/ceph/ceph/pull/49554
Indeed. With the bypass workqueue option enabled for flash devices the overhead of crypto
is really low. Here a partial repost from an email I send earlier
I repeated the tests from Cloudflare and could draw the same conclusions: TL;DR:
performance is increased a lot and less CPU is used. Some fio 4k write, iodepth=1,
performance numbers on a Samsung PM983 3.84 TB drive )Ubuntu 22.04 with HWE kernel,
5.15.0-52-generic, AMD EPYC 7302P 16-Core Processor, C-state pinning, CPU performance mode
on, Samsung PM 983 firmware: EDA5702Q):
Unencrypted NVMe:
write: IOPS=63.3k, BW=247MiB/s (259MB/s)(62.6GiB/259207msec); 0 zone resets
clat (nsec): min=13190, max=56400, avg=15397.89, stdev=1506.45
lat (nsec): min=13250, max=56940, avg=15462.03, stdev=1507.88
Encrypted (without no_write_workqueue / no_read_workqueue):
write: IOPS=34.8k, BW=136MiB/s (143MB/s)(47.4GiB/357175msec); 0 zone resets
clat (usec): min=24, max=1221, avg=28.12, stdev= 2.98
lat (usec): min=24, max=1221, avg=28.37, stdev= 2.99
Encrypted (with no_write_workqueue / no_read_workqueue enabled):
write: IOPS=55.7k, BW=218MiB/s (228MB/s)(57.3GiB/269574msec); 0 zone resets
clat (nsec): min=15710, max=87090, avg=17550.99, stdev=875.72
lat (nsec): min=15770, max=87150, avg=17614.82, stdev=876.85
So encryption does have a performance impact, but the added latency is only a few micro
seconds. And these tests are on NVMe drives, not Ceph OSDS. Compared to the latency Ceph
itself adds to (client) IO this seems negligible. At least, when the work queues are
bypassed, otherwise a lot of CPU seems to be involved (loads of kcryptd threads). And that
might hurt max performance on a system (especially if CPU bound).
So, today I did a comparison on a production cluster while draining an OSD with 10
concurrent backfills:
without no_write_workqueue / no_read_workqueue: 32 krcryptd threads each doing on average
5.5% CPU, and dmcrypt write thread doing ~ 9% CPU. So that's almost 2 CPU cores
with no_write_workqueue / no_read_workqueue: dmcrypt / cryptd threads do not even show up
in top ...
So if encryption is important, even for a subset of your users, I'm pretty sure you
can enable it and won't have a negative impact.
It does require reprovisioning of all your OSDs ... which is not a small feat. Although
this thread started with "per user" encryption. If your users do not trust your
Ceph cluster ... client side encryption (i.e. CephFS fscrypt) with a key _they_ manage is
still the only way to go.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io