March 2021 - ceph-users - lists.ceph.io

by jinguk.kwon＠ungleich.ch

Hello there, Thank you for advanced. My ceph is ceph version 14.2.9 I have a repair issue too. ceph health detail HEALTH_WARN Too many repaired reads on 2 OSDs OSD_TOO_MANY_REPAIRS Too many repaired reads on 2 OSDs osd.29 had 38 reads repaired osd.16 had 17 reads repaired ~# ceph tell osd.16 bench { "bytes_written": 1073741824, "blocksize": 4194304, "elapsed_sec": 7.1486738159999996, "bytes_per_sec": 150201541.10217974, "iops": 35.81083800844663 } ~# ceph tell osd.29 bench { "bytes_written": 1073741824, "blocksize": 4194304, "elapsed_sec": 6.9244327500000002, "bytes_per_sec": 155065672.9246161, "iops": 36.970537406114602 } But it looks like those osds are ok. how can i clear this warning ? Best regards JG

3 years, 1 month

1
0
0 0

How ceph sees when the pool is getting full?

by Szabo, Istvan (Agoda)

Hi, I want to set alert on the user's pool before it got's full but in nautilus I still haven't found the way which is the value of their data usage based on ceph detail df? POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR k8s-dbss-w-mdb 16 1.8 TiB 488.61k 2.6 TiB 2.87 45 TiB N/A 1.8 TiB 488.61k 0 B 0 B This output is still confusing :/ ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 1 month

1
0
0 0

ceph octopus mysterious OSD crash

by Philip Brown

I've been banging on my ceph octopus test cluster for a few days now. 8 nodes. each node has 2 SSDs and 8 HDDs. They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition. service_type: osd service_id: osd_spec_default placement: host_pattern: '*' data_devices: rotational: 1 db_devices: rotational: 0 things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down". I went to check the logs, with journalctl -u ceph-xxxx(a)osd.xxx all it showed were a bunch of generic debug info, and the fact that it stopped. and various automatic attempts to restart. but no indication of what was wrong, and why the restarts KEEP failing. sample output: systemd[1]: Stopped Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00. systemd[1]: Starting Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00... bash[9340]: ceph-e51eb2fa-7f82-11eb-94d5-78e3b5148f00-osd.33-activate bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices. bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices. podman[9369]: 2021-03-07 16:00:15.543010794 -0800 PST m=+0.318475882 container create podman[9369]: 2021-03-07 16:00:15.73461926 -0800 PST m=+0.510084288 container init ..... bash[1611473]: --> ceph-volume lvm activate successful for osd ID: 33 podman[1611501]: 2021-03-18 10:23:02.564242824 -0700 PDT m=+1.379793448 container died bash[1611473]: ceph-xx-xx-xx-xx-osd.33 bash[1611473]: WARNING: The same type, major and minor should not be used for multiple devices. (repeat, repeat...) podman[1611615]: 2021-03-18 10:23:03.530992487 -0700 PDT m=+0.333130660 container create .... systemd[1]: Started Ceph osd.33 for xx-xx-xx-xx systemd[1]: ceph-xx-xx-xx-xx(a)osd.33.service: main process exited, code=exited, status=1/FAILURE bash[1611797]: ceph-xx-xx-xx-xx-osd.33-deactivate and eventually it just gives up. smartctl -a doesnt show any errors on the HDD dmesg doesnt show anything. So... what do I do? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 1 month

6
21
0 0

LVM vs. direct disk acess

by Nico Schottelius

Good evening, I've seen the shift in ceph to focus more on LVM than on plain (direct) access to disks. I was wondering what the motivation is for that. From my point of view OSD disk layouts never change (they are re-added if they do), so the dynamic approach of LVM is probably not the motivation. LVM also adds another layer of indirection and it seems it would be of disadvantage performance wise as well as added complexity for management. The former is probably only a minor degradation, the latter is something I see more as obstacle for maintenance. At ungleich we are using a custom script [0] to format a disk with two partitions, one for the metadata, one for the rest, which seems to be more simple. I assume there are good reasons not to do as we do, but I was wondering what the practical reasons actually are. Best regards, Nico [0] https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/ceph/… -- Sustainable and modern Infrastructures by ungleich.ch

3 years, 1 month

4
5
0 0

Re: Issues upgrading Ceph from 15.2.8 to 15.2.10

by David Rivera

Hi Julian, You are most likely running into this same issue: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QNR2XRZPEYK… It is podman 2.2 related. I ran into this using CentOS 8.3 and decided to move to CentOS Stream to be able to upgrade the cluster. David

3 years, 1 month

2
1
0 0

Wrong PG placement with custom CRUSH rule

by Davíð Steinn Geirsson

Hi, I have a small cluster of 3 nodes. Each node has 10 or 11 OSDs, mostly HDDs with a couple of SSDs for faster pools. I am trying to set up an erasure coded pool with m=6 k=6, with each node storing 4 chunks on seperate OSDs. Since this seems not possible with the CLI tooling I have written my own CRUSH rule to achieve this, which looks like this: ``` rule 3host4osd { id 3 type erasure min_size 12 max_size 12 step set_chooseleaf_tries 20 step set_choose_tries 100 step take default class hdd step choose indep 3 type host step choose indep 4 type osd step emit } ``` I've set up my erasure code profile and pool: ``` root@virt02:~# ceph osd pool get rbd_erasure crush_rule crush_rule: 3host4osd root@virt02:~# ceph osd pool get rbd_erasure size size: 12 root@virt02:~# ceph osd pool get rbd_erasure min_size min_size: 7 root@virt02:~# ceph osd pool get rbd_erasure erasure_code_profile erasure_code_profile: default root@virt02:~# ceph osd erasure-code-profile get default crush-device-class= crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=6 m=6 plugin=jerasure technique=reed_sol_van w=8 ``` Based on my understanding of ceph, this should pick 3 hosts, then pick 4 OSDs for each of those hosts. This is *almost* the case. However when testing taking out a host after putting a bunch of data on there, it seems 5 PGs (out of 512) seem to have more than 4 chunks placed on the same host. In all cases it's the same host that gets the extra pieces. When the host is out, I see errors: ``` [WRN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive, 5 pgs down pg 2.87 is down, acting [2147483647,2147483647,2147483647,2147483647,22,2147483647,2147483647,20,16,2147483647,17,18] pg 2.f3 is down, acting [2147483647,22,2147483647,2147483647,23,2147483647,18,17,2147483647,2147483647,2147483647,2147483647] pg 2.100 is down, acting [2147483647,18,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,9,20,22,4] pg 2.141 is down, acting [2147483647,18,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,20,7,4,22] pg 2.1bb is down, acting [20,2147483647,2147483647,2147483647,18,2147483647,23,17,2147483647,2147483647,2147483647,2147483647] ``` As an example, PG 2.87 (rbd_erasure has pool ID 2 according to `ceph osd lspools`): ``` root@virt02:~# ceph pg 2.87 query [...] "up": [ 2, 0, 6, 5, 22, 1, 8, 20, 16, 14, 17, 18 ], "acting": [ 2, 0, 6, 5, 22, 1, 8, 20, 16, 14, 17, 18 ], [...] ``` OSDs 0, 1, 2, 5, 6, 8 and 14 are all running on the same OSD host. All hosts are running ceph octopus 15.2.9. I've put the output of various diagnostic commands into files accessible over HTTPS here: https://dsg.is/ceph_placement_problem_data/ceph_osd_crush_rule_dump_3host4o… https://dsg.is/ceph_placement_problem_data/ceph_osd_lspools.txt https://dsg.is/ceph_placement_problem_data/ceph_osd_pool_get_rbd_erasure_al… https://dsg.is/ceph_placement_problem_data/ceph_pg_2.87_query.txt https://dsg.is/ceph_placement_problem_data/ceph_pg_dump_all.txt https://dsg.is/ceph_placement_problem_data/ceph_pg_ls.txt Any thoughts or ideas what I'm doing wrong? Kind regards, Davíð

3 years, 1 month

1
0
0 0

Issues setting up oidc with keycloak

by Mateusz Kozicki

Hi I'm currently having a bit of an issue with setting up end user authentication and I would be thankful for any tips I could get. The general scenario is like that; end users are authorised thorough webapp and mobile app thorough keycloak. User has to be able to upload and download data using web interface and mobile app. In order to do that I need to get AssumeRoleWithWebIdentity working. I followed the steps outlined in https://docs.ceph.com/en/latest/radosgw/STS/. Following that guide I was able to get AssumeRole example to work, but not AssumeRoleWithWebIdentity. This is the behaviour I'm getting (logged in aws-cli as TESTER): Username TESTER Full name TestUser Suspended No System No Maximum buckets 1000 Capabilities oidc-provider (*) roles (*) $ aws --endpoint=http://10.10.xx.xx iam list-roles { "Roles": [ { "Path": "/", "RoleName": "S3Access", "RoleId": "d1b84ec1-cceb-4c32-a605-f208b30123e2", "Arn": "arn:aws:iam:::role/S3Access", "CreateDate": "2021-03-24T13:08:20.522Z", "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": [ "arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt" ] }, "Action": [ "sts:AssumeRoleWithWebIdentity" ], "Condition": { "StringEquals": { "xxxxx.xxxxnt.com/auth/realms/xxxxnt:app_id": "xxxxnt_xxxx_backend" } } } ] }, "MaxSessionDuration": 3600 } ] } $ aws --endpoint=http://10.10.xx.xx iam list-open-id-connect-providers { "OpenIDConnectProviderList": [ { "Arn": "arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt" } ] } $ aws --endpoint=http://10.10.xx.xx iam get-open-id-connect-provider --open-id-connect-provider-arn "arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt" { "Url": "https://xxxxx.xxxxnt.com/auth/realms/xxxxnt", "ClientIDList": [ "test_ceph" ], "ThumbprintList": [ "02DC870BD9E72360C090Fxxxxxxxxxxxxxxxxxxx" ], "CreateDate": "2021-03-24T12:26:38.173Z" } $ curl -X POST https://xxxxx.xxxxnt.com/auth/realms/xxxxnt/protocol/openid-connect/token -H "Content-Type: application/x-www-form-urlencoded" -d "username=admin" -d "password=omitted" -d "grant_type=password" -d "client_id=test_ceph" -d "client_secret=d01eafe2-xxxx-xxxx-xxxx-xxxxxx7b7dad" {"access_token":"eyJhbGc.........tTRy1bA","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbG......RU","token_type":"Bearer","not-before-policy":0,"session_state":"3a57b32a-b17c-4b29-bd68-8ce06b6bd2a8","scope":"email account ..... xxxxnt_xxxx_backend profile"} $ aws --debug --endpoint=http://10.10.xx.xx sts assume-role-with-web-identity --role-arn "arn:aws:iam:::role/S3Access" --role-session-name "test" --web-identity-token "eyJhbGc.........tTRy1bA" ..... 2021-03-25 10:17:45,309 - MainThread - botocore.parsers - DEBUG - Response body: b'<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><RequestId>tx000000000000000000032-00605c5538-2cf3a-pl</RequestId><HostId>2cf3a-pl-default</HostId></Error>'| ..... An error occurred (Unknown) when calling the AssumeRoleWithWebIdentity operation: Unknown JWT token returned by keycloak contains fields "iss": "https://xxxxx.xxxxnt.com/auth/realms/flexgent"<https://xxxxx.xxxxnt.com/auth/realms/flexgent>, "aud": "xxxxnt_xxxx_backend", "azp": "test_ceph", Thumbprint was generated using example from ceph documentation (curl from jwks_uri). I'm not really sure what might be wrong, I'll be thankful for any hints - including debugging hints, because so far I'm unable to get useful logs on that. [https://softgent.com/wp-content/uploads/2020/01/Zasob-14.png]<https://www.softgent.com> Softgent Sp. z o.o., Budowlanych 31d, 80-298 Gdansk, POLAND KRS: 0000674406, NIP: 9581679801, REGON: 367090912 www.softgent.com Sąd Rejonowy Gdańsk-Północ w Gdańsku, VII Wydział Gospodarczy Krajowego Rejestru Sądowego KRS 0000674406, Kapitał zakładowy: 25 000,00 zł wpłacony w całości.

3 years, 1 month

1
0
0 0

CephFS max_file_size

by Mark Schouten

Hi, There is a default limit of 1TiB for the max_file_size in CephFS. I altered that to 2TiB, but I now got a request for storing a file up to 7TiB. I'd expect the limit to be there for a reason, but what is the risk of setting that value to say 10TiB? -- Mark Schouten <mark(a)tuxis.nl> Tuxis, Ede, https://www.tuxis.nl T: +31 318 200208

3 years, 1 month

5
4
0 0

cephadm rgw bug with uppercase realm and zone.

by Glen Baars

Hello Ceph Users, Has anyone come across this error when converting to cephadm? 2021-03-25 20:41:05,616 DEBUG /bin/podman: stderr Error: error getting image "ceph-375dcabe-574f-4002-b322-e7f89cf199e1-rgw.COMPANY.LOCATION.NAS-COMPANY-RK2-CEPH06.pcckdr": repository name must be lowercase It seems that cephadm needs to ensure the hostname / realm / zone is in lowercase when passing through to podman. Any known workarounds? All our realms are uppercase. It seems to work fine on ubuntu 20.04 but not centos 7 ( I assume podman issue ) Thanks Glen This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately.

3 years, 1 month

1
0
0 0

Pacific release candidate v16.1.0 is out

by Josh Durgin

There are just a couple remaining issues before the final release. Please test it out and report any bugs. The full release notes are in progress here [0]. Notable Changes --------------- * New ``bluestore_rocksdb_options_annex`` config parameter. Complements ``bluestore_rocksdb_options`` and allows setting rocksdb options without repeating the existing defaults. * The cephfs addes two new CDentry tags, 'I' --> 'i' and 'L' --> 'l', and on-RADOS metadata is no longer backwards compatible after upgraded to Pacific or a later release. * $pid expansion in config paths like ``admin_socket`` will now properly expand to the daemon pid for commands like ``ceph-mds`` or ``ceph-osd``. Previously only ``ceph-fuse``/``rbd-nbd`` expanded ``$pid`` with the actual daemon pid. * The allowable options for some ``radosgw-admin`` commands have been changed. * ``mdlog-list``, ``datalog-list``, ``sync-error-list`` no longer accepts start and end dates, but does accept a single optional start marker. * ``mdlog-trim``, ``datalog-trim``, ``sync-error-trim`` only accept a single marker giving the end of the trimmed range. * Similarly the date ranges and marker ranges have been removed on the RESTful DATALog and MDLog list and trim operations. * ceph-volume: The ``lvm batch`` subcommand received a major rewrite. This closed a number of bugs and improves usability in terms of size specification and calculation, as well as idempotency behaviour and disk replacement process. Please refer to https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more detailed information. * Configuration variables for permitted scrub times have changed. The legal values for ``osd_scrub_begin_hour`` and ``osd_scrub_end_hour`` are 0 - 23. The use of 24 is now illegal. Specifying ``0`` for both values causes every hour to be allowed. The legal vaues for ``osd_scrub_begin_week_day`` and ``osd_scrub_end_week_day`` are 0 - 6. The use of 7 is now illegal. Specifying ``0`` for both values causes every day of the week to be allowed. * Multiple file systems in a single Ceph cluster is now stable. New Ceph clusters enable support for multiple file systems by default. Existing clusters must still set the "enable_multiple" flag on the fs. Please see the CephFS documentation for more information. * volume/nfs: Recently "ganesha-" prefix from cluster id and nfs-ganesha common config object was removed, to ensure consistent namespace across different orchestrator backends. Please delete any existing nfs-ganesha clusters prior to upgrading and redeploy new clusters after upgrading to Pacific. * A new health check, DAEMON_OLD_VERSION, will warn if different versions of Ceph are running on daemons. It will generate a health error if multiple versions are detected. This condition must exist for over mon_warn_older_version_delay (set to 1 week by default) in order for the health condition to be triggered. This allows most upgrades to proceed without falsely seeing the warning. If upgrade is paused for an extended time period, health mute can be used like this "ceph health mute DAEMON_OLD_VERSION --sticky". In this case after upgrade has finished use "ceph health unmute DAEMON_OLD_VERSION". * MGR: progress module can now be turned on/off, using the commands: ``ceph progress on`` and ``ceph progress off``. * An AWS-compliant API: "GetTopicAttributes" was added to replace the existing "GetTopic" API. The new API should be used to fetch information about topics used for bucket notifications. * librbd: The shared, read-only parent cache's config option ``immutable_object_cache_watermark`` now has been updated to property reflect the upper cache utilization before space is reclaimed. The default ``immutable_object_cache_watermark`` now is ``0.9``. If the capacity reaches 90% the daemon will delete cold cache. * OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced to allow the OSD to notify the monitor it is shutting down even if ``osd_fast_shutdown`` is enabled. This helps with the monitor logs on larger clusters, that may get many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools. [0] https://github.com/ceph/ceph/pull/40265

3 years, 1 month

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021