Stefan,
The material in the documentation that stated that Ceph works only with
LUKS version 1 has been removed.
https://tracker.ceph.com/issues/58354
Zac Dover
Upstream Docs
Ceph Foundation
On Wed, Dec 28, 2022 at 8:04 PM John Zachary Dover <zac.dover(a)gmail.com>
wrote:
> Stefan,
>
> The documentation portion of this complaint is tracked here:
>
> https://tracker.ceph.com/issues/58354
>
> I will make this change when everyone is back from Christmas break and I
> can get technical verification from the ceph-volume team for the changes
> that I'll make.
>
> Zac Dover
> Upstream Docs
> Ceph Foundation
>
> On Tue, Dec 20, 2022 at 9:00 PM Stefan Kooman <stefan(a)bit.nl> wrote:
>
>> Hi,
>>
>> I'm working on some patches to get ceph-volume encryption support some
>> additional options (related to encryption performance, see: [1]).
>> While looking into the current code base of ceph-volume, and Ceph
>> encryption docs [2], it looks like some basic (internal) functionality
>> is missing, some code is not working (anymore), and the docs are partly
>> incorrect.
>>
>> functionality missing: ceph-volume has no code to look up CEPH_MON
>> config settings, i.e "ceph config get osd param". Instead, at least in
>> encryption.py, it only checks the ceph config file on disk. I'm pretty
>> sure this is not sufficient anymore, as operators have started to rely
>> on the central (mon) config database. Moreover, one of the parameters
>> you can configure, "osd_dmcrypt_key_size", cannot be set:
>>
>> ceph config set osd dmcrypt_key_size 256
>> Error EINVAL: unrecognized config option 'dmcrypt_key_size'
>>
>> Broken code: One of the current parameters cryptsetup can be fed is
>> "osd_dmcrypt_key_size". When set however, ceph-volume will throw the
>> following error:
>>
>> "TypeError: get_safe() got an unexpected keyword argument 'check_valid'".
>>
>> See [3] for the LOC. If I remove that check it works and ceph-volume
>> uses the value from the config file. I can (should?) log a tracker
>> ticket for this. Apparently nobody is using a non-default key size ...
>> or using crypto ;-).
>>
>> Wrong docs: The docs claim "only LUKS (version 1) is used". But this
>> seems no longer to be true in all cases. Modern distros use LUKS version
>> 2 by default. And ceph-volume encryption.py is not explicitly forcing
>> the LUKS format version, so the default (2) gets used. That does seem to
>> work fine however, as I've got a test cluster fully encrypted with LUKS
>> 2 OSDs, see:
>>
>> cryptsetup luksDump
>>
>> ceph--e3cf57cd--27dc--4cf2--9784--b2b5198dfcbb-osd--block--5a743d7f--2f60--47da--b9f3--46aa6f5df284
>> LUKS header information
>> Version: 2
>> ...
>>
>> All in all it looks like the ceph-volume code base needs some love :-).
>> Some functionality, like checking CEPH_MON config variables seems to be
>> present in cephadm. I'm not sure if cephadm / ceph-volume code can /
>> should be refactored to have both make use of it? My changes will be
>> minor, but with ceph-volume in the current state I wonder how to best
>> proceed.
>>
>> I would appreciate any comments,
>>
>> Gr. Stefan
>>
>> [1]:
>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/D5URLVPVGX5…
>> [2]: https://docs.ceph.com/en/latest/ceph-volume/lvm/encryption/#
>> [3]:
>>
>> https://github.com/ceph/ceph/blob/main/src/ceph-volume/ceph_volume/util/enc…
>> _______________________________________________
>> Dev mailing list -- dev(a)ceph.io
>> To unsubscribe send an email to dev-leave(a)ceph.io
>>
>
Hi Frank,
IMO all the below logic is a bit of overkill and no one can provide 100%
valid guidance on specific numbers atm. Generally I agree with
Dongdong's point that crash is effectively an OSD restart and hence no
much sense to perform such a restart manually - well, the rationale
might be to do that gracefully and avoid some potential issues though...
Anyway I'd rather recommend to do periodic(!) manual OSD restart e.g. on
a daily basis at off-peak hours instead of using tricks with mempool
stats analysis..
Thanks,
Igor
On 1/10/2023 1:15 PM, Frank Schilder wrote:
> Hi Dongdong and Igor,
>
> thanks for pointing to this issue. I guess if its a memory leak issue (well, cache pool trim issue), checking for some indicator and an OSD restart should be a work-around? Dongdong promised a work-around but talks only about a patch (fix).
>
> Looking at the tracker items, my conclusion is that unusually low values of .mempool.by_pool.bluestore_cache_onode.items of an OSD might be such an indicator. I just run a very simple check on all our OSDs:
>
> for o in $(ceph osd ls); do n_onode="$(ceph tell "osd.$o" dump_mempools | jq ".mempool.by_pool.bluestore_cache_onode.items")"; echo -n "$o: "; ((n_onode<100000)) && echo "$n_onode"; done; echo ""
>
> and found 2 with seemingly very unusual values:
>
> 1111: 3098
> 1112: 7403
>
> Comparing two OSDs with same disk on the same host gives:
>
> # ceph daemon osd.1111 dump_mempools | jq ".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
> 3200
> 1971200
> 260924
> 900303680
>
> # ceph daemon osd.1030 dump_mempools | jq ".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
> 60281
> 37133096
> 8908591
> 255862680
>
> OSD 1111 does look somewhat bad. Shortly after restarting this OSD I get
>
> # ceph daemon osd.1111 dump_mempools | jq ".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
> 20775
> 12797400
> 803582
> 24017100
>
> So, the above procedure seems to work and, yes, there seems to be a leak of items in cache_other that pushes other pools down to 0. There seem to be 2 useful indicators:
>
> - very low .mempool.by_pool.bluestore_cache_onode.items
> - very high .mempool.by_pool.bluestore_cache_other.bytes/.mempool.by_pool.bluestore_cache_other.items
>
> Here a command to get both numbers with OSD ID in an awk-friendly format:
>
> for o in $(ceph osd ls); do printf "%6d %8d %7.2f\n" "$o" $(ceph tell "osd.$o" dump_mempools | jq ".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_other.bytes/.mempool.by_pool.bluestore_cache_other.items"); done
>
> Pipe it to a file and do things like:
>
> awk '$2<50000 || $3>200' FILE
>
> For example, I still get:
>
> # awk '$2<50000 || $3>200' cache_onode.txt
> 1092 49225 43.74
> 1093 46193 43.70
> 1098 47550 43.47
> 1101 48873 43.34
> 1102 48008 43.31
> 1103 48152 43.29
> 1105 49235 43.59
> 1107 46694 43.35
> 1109 48511 43.08
> 1113 14612 739.46
> 1114 13199 693.76
> 1116 45300 205.70
>
> flagging 3 more outliers.
>
> Would it be possible to provide a bit of guidance to everyone about when to consider restarting an OSD? What values of the above variables are critical and what are tolerable? Of course a proper fix would be better, but I doubt that everyone is willing to apply a patch. Therefore, some guidance on how to mitigate this problem to acceptable levels might be useful. I'm thinking here how few onode items are acceptable before performance drops painfully.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Igor Fedotov<igor.fedotov(a)croit.io>
> Sent: 09 January 2023 13:34:42
> To: Dongdong Tao;ceph-users(a)ceph.io
> Cc:dev@ceph.io
> Subject: [ceph-users] Re: OSD crash on Onode::put
>
> Hi Dongdong,
>
> thanks a lot for your post, it's really helpful.
>
>
> Thanks,
>
> Igor
>
> On 1/5/2023 6:12 AM, Dongdong Tao wrote:
>> I see many users recently reporting that they have been struggling
>> with this Onode::put race condition issue[1] on both the latest
>> Octopus and pacific.
>> Igor opened a PR [2] to address this issue, I've reviewed it
>> carefully, and looks good to me. I'm hoping this could get some
>> priority from the community.
>>
>> For those who had been hitting this issue, I would like to share a
>> workaround that could unblock you:
>>
>> During the investigation of this issue, I found this race condition
>> always happens after the bluestore onode cache size becomes 0.
>> Setting debug_bluestore = 1/30 will allow you to see the cache size
>> after the crash:
>> ---
>> 2022-10-25T00:47:26.562+0000 7f424f78e700 30
>> bluestore.MempoolThread(0x564a9dae2a68) _resize_shards
>> max_shard_onodes: 0 max_shard_buffer: 8388608
>> ---
>>
>> This is apparently wrong as this means the bluestore metadata cache is
>> basically disabled,
>> but it makes much sense to explain why we are hitting the race
>> condition so easily -- An onode will be trimmed right away after it's
>> unpinned.
>>
>> Keep going with the investigation, it turns out the culprit for the
>> 0-sized cache is the leak that happened in bluestore_cache_other mempool
>> Please refer to the bug tracker [3] which has the detail of the leak
>> issue, it was already fixed by [4], and the next Pacific point
>> release will have it.
>> But it was never backported to Octopus.
>> So if you are hitting the same:
>> For those who are on Octopus, you can manually backport this patch to
>> fix the leak and prevent the race condition from happening.
>> For those who are on Pacific, you can wait for the next Pacific point
>> release.
>>
>> By the way, I'm backporting the fix to ubuntu Octopus and Pacific
>> through this SRU [5], so it will be landed in ubuntu's package soon.
>>
>> [1]https://tracker.ceph.com/issues/56382
>> [2]https://github.com/ceph/ceph/pull/47702
>> [3]https://tracker.ceph.com/issues/56424
>> [4]https://github.com/ceph/ceph/pull/46911
>> [5]https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1996010
>>
>> Cheers,
>> Dongdong
>>
>>
> --
> Igor Fedotov
> Ceph Lead Developer
> --
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web<https://croit.io/> | LinkedIn<http://linkedin.com/company/croit> |
> Youtube<https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
> Twitter<https://twitter.com/croit_io>
>
> Meet us at the SC22 Conference! Learn more<https://croit.io/croit-sc22>
> Technology Fast50 Award Winner by Deloitte
> <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunicatio…>!
>
> <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunicatio…>
> _______________________________________________
> ceph-users mailing list --ceph-users(a)ceph.io
> To unsubscribe send an email toceph-users-leave(a)ceph.io
--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
Twitter <https://twitter.com/croit_io>
Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22>
Technology Fast50 Award Winner by Deloitte
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunicatio…>!
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunicatio…>
Here's a summary of yesterday's Ceph Leadership Team meeting:
* Introducing Sage McTaggart, Product Security specialist. Sage
coordinates CVEs and security issue communication for Ceph within the
upstream community, IBM, and Red Hat.
* We will drop the recommendation to PGP-encrypt emails to
security(a)ceph.io. The benefits do not outweigh the hassle. Please send
unencrypted-only emails to that address, and we'll update our website
accordingly.
* Sepia infrastructure updates from Adam. Adam, Dan and Zack recently
moved several services from clustered VMs to bare metal and OpenShift.
* lists.ceph.io had significant data loss, and we had to restore from
a backup taken approximately mid-2021. Mike is drafting broader
communications about what this means for our mailing list users.
* Brief discussion about Ceph days from Mike. NYC - Feb 21. Southern
California - CFP open - Mar 9
- Ken
Hi Folks,
Happy New Year! The weekly performance meeting will be starting in
approximately 5 minutes at 8AM PST. The etherpad is mostly working
again, huzzah! Nothing specific on the agenda today, but if Adam and
Igor can make it we can talk about on-going bluestore changes and
updated RocksDB improvements.
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Mark
Hi all,
As you all know, we’ve had a lot of issues in sepia recently. One of them
is that we’re having trouble with Java paths during builds (I havent
followed the details, tbh). Radek pointed out [1] that CephFS seems to be
the only user of Java in the codebase, and when I ran a “git grep” the only
occurrences were for cephfs-java, which AFAIK only exists to support the
(very defunct) Hadoop plug-in.
Are there no other users? Can we disable this temporarily to get builds
working in the lab and continue testing? Is anybody going to notice if we
remove it?
If the answers are “yes”, let’s do as much as we can and get some builds
going ASAP?
Thanks,
-Greg
Dear Ceph Community
My name is Deep and I recently got into the field of DevOps. I have
experience working with Linux, YAML, Kubernetes, and Docker, and I am
currently learning AWS.
I am very interested in contributing to the Ceph project and would like to
know more about how I can do so.
Can you please suggest what areas of the project would be most suitable for
my skillset and what additional areas I should focus on learning in order
to make meaningful contributions to the Ceph project?
Thank you for your time and I look forward to hearing from you.
Best,
Deep
Re-adding the dev list and adding the user list because others might
benefit from this information.
Thanks,
Neha
On Tue, Jan 10, 2023 at 10:21 AM Wyll Ingersoll <
wyllys.ingersoll(a)keepertech.com> wrote:
> Also, it was only my ceph-users account that was lost, dev account was
> still active.
> ------------------------------
> *From:* Wyll Ingersoll <wyllys.ingersoll(a)keepertech.com>
> *Sent:* Tuesday, January 10, 2023 1:20 PM
> *To:* Neha Ojha <nojha(a)redhat.com>; Adam Kraitman <akraitma(a)redhat.com>;
> Dan Mick <dan.mick(a)redhat.com>
> *Subject:* Re: What's happening with ceph-users?
>
> I ended up re-subscribing this morning. But it might be worth
> investigating if others are having similar issues.
> ------------------------------
> *From:* Neha Ojha <nojha(a)redhat.com>
> *Sent:* Tuesday, January 10, 2023 1:14 PM
> *To:* Wyll Ingersoll <wyllys.ingersoll(a)keepertech.com>; Adam Kraitman <
> akraitma(a)redhat.com>; Dan Mick <dan.mick(a)redhat.com>
> *Subject:* Re: What's happening with ceph-users?
>
> +Adam Kraitman <akraitma(a)redhat.com> +Dan Mick <dan.mick(a)redhat.com> Is
> this expected?
>
> On Tue, Jan 10, 2023 at 6:15 AM Wyll Ingersoll <
> wyllys.ingersoll(a)keepertech.com> wrote:
>
>
> All of my subscriptions to the ceph.io lists (users and developers) seem
> to have been deleted. Do we need to re-subscribe or is this something
> that is being fixed?
> ------------------------------
> *From:* Neha Ojha <nojha(a)redhat.com>
> *Sent:* Monday, January 9, 2023 2:40 PM
> *To:* Dan van der Ster <dvanders(a)gmail.com>
> *Cc:* Ceph Developers <dev(a)ceph.io>; Josh Durgin <jdurgin(a)redhat.com>;
> Mike Perez <miperez(a)redhat.com>; Adam Kraitman <akraitma(a)redhat.com>
> *Subject:* Re: What's happening with ceph-users?
>
> Our mailing lists were down due to the recent lab issues. They should be
> back up now. Please let us know if you see any issues.
>
> Thanks,
> Neha
>
> On Sun, Jan 8, 2023 at 9:53 AM Dan van der Ster <dvanders(a)gmail.com>
> wrote:
>
> Hi,
>
> Has ceph-users been down a few days? And now it seems to have been
> reverted to an old backup? (I'm referring mail on an address I unsubbed
> many months ago)
>
> Thanks, Dan
>
>
Our mailing lists were down due to the recent lab issues. They should be
back up now. Please let us know if you see any issues.
Thanks,
Neha
On Sun, Jan 8, 2023 at 9:53 AM Dan van der Ster <dvanders(a)gmail.com> wrote:
> Hi,
>
> Has ceph-users been down a few days? And now it seems to have been
> reverted to an old backup? (I'm referring mail on an address I unsubbed
> many months ago)
>
> Thanks, Dan
>
>
On Mon, Dec 26, 2022 at 5:54 PM John Zachary Dover <zac.dover(a)gmail.com> wrote:
>
> Documentation backports to quincy aren't being picked up on docs.ceph.com.
>
> Here's an example of a merged docs PR that doesn't appear on the quincy branch of docs.ceph.com:
>
> https://github.com/ceph/ceph/pull/49574
>
> And here is a link to the page on which this change should appear but doesn't:
>
> https://docs.ceph.com/en/quincy/glossary/
Hi Zac, this appears resolved?
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D