Hello there,
Thank you for advanced.
My ceph is ceph version 14.2.9
I have a repair issue too.
ceph health detail
HEALTH_WARN Too many repaired reads on 2 OSDs
OSD_TOO_MANY_REPAIRS Too many repaired reads on 2 OSDs
osd.29 had 38 reads repaired
osd.16 had 17 reads repaired
~# ceph tell osd.16 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 7.1486738159999996,
"bytes_per_sec": 150201541.10217974,
"iops": 35.81083800844663
}
~# ceph tell osd.29 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 6.9244327500000002,
"bytes_per_sec": 155065672.9246161,
"iops": 36.970537406114602
}
But it looks like those osds are ok. how can i clear this warning ?
Best regards
JG
Hi,
I want to set alert on the user's pool before it got's full but in nautilus I still haven't found the way which is the value of their data usage based on ceph detail df?
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
k8s-dbss-w-mdb 16 1.8 TiB 488.61k 2.6 TiB 2.87 45 TiB N/A 1.8 TiB 488.61k 0 B 0 B
This output is still confusing :/
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
I've been banging on my ceph octopus test cluster for a few days now.
8 nodes. each node has 2 SSDs and 8 HDDs.
They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition.
service_type: osd
service_id: osd_spec_default
placement:
host_pattern: '*'
data_devices:
rotational: 1
db_devices:
rotational: 0
things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down".
I went to check the logs, with
journalctl -u ceph-xxxx(a)osd.xxx
all it showed were a bunch of generic debug info, and the fact that it stopped.
and various automatic attempts to restart.
but no indication of what was wrong, and why the restarts KEEP failing.
sample output:
systemd[1]: Stopped Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00.
systemd[1]: Starting Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00...
bash[9340]: ceph-e51eb2fa-7f82-11eb-94d5-78e3b5148f00-osd.33-activate
bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices.
bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices.
podman[9369]: 2021-03-07 16:00:15.543010794 -0800 PST m=+0.318475882 container create
podman[9369]: 2021-03-07 16:00:15.73461926 -0800 PST m=+0.510084288 container init
.....
bash[1611473]: --> ceph-volume lvm activate successful for osd ID: 33
podman[1611501]: 2021-03-18 10:23:02.564242824 -0700 PDT m=+1.379793448 container died
bash[1611473]: ceph-xx-xx-xx-xx-osd.33
bash[1611473]: WARNING: The same type, major and minor should not be used for multiple devices.
(repeat, repeat...)
podman[1611615]: 2021-03-18 10:23:03.530992487 -0700 PDT m=+0.333130660 container create
....
systemd[1]: Started Ceph osd.33 for xx-xx-xx-xx
systemd[1]: ceph-xx-xx-xx-xx(a)osd.33.service: main process exited, code=exited, status=1/FAILURE
bash[1611797]: ceph-xx-xx-xx-xx-osd.33-deactivate
and eventually it just gives up.
smartctl -a doesnt show any errors on the HDD
dmesg doesnt show anything.
So... what do I do?
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Good evening,
I've seen the shift in ceph to focus more on LVM than on plain (direct)
access to disks. I was wondering what the motivation is for that.
From my point of view OSD disk layouts never change (they are re-added
if they do), so the dynamic approach of LVM is probably not the
motivation.
LVM also adds another layer of indirection and it seems it would be of
disadvantage performance wise as well as added complexity for
management. The former is probably only a minor degradation, the latter
is something I see more as obstacle for maintenance.
At ungleich we are using a custom script [0] to format a disk with two
partitions, one for the metadata, one for the rest, which seems to be
more simple.
I assume there are good reasons not to do as we do, but I was wondering
what the practical reasons actually are.
Best regards,
Nico
[0] https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/ceph/…
--
Sustainable and modern Infrastructures by ungleich.ch
Hi,
I have a small cluster of 3 nodes. Each node has 10 or 11 OSDs, mostly HDDs
with a couple of SSDs for faster pools. I am trying to set up an erasure
coded pool with m=6 k=6, with each node storing 4 chunks on seperate OSDs.
Since this seems not possible with the CLI tooling I have written my own
CRUSH rule to achieve this, which looks like this:
```
rule 3host4osd {
id 3
type erasure
min_size 12
max_size 12
step set_chooseleaf_tries 20
step set_choose_tries 100
step take default class hdd
step choose indep 3 type host
step choose indep 4 type osd
step emit
}
```
I've set up my erasure code profile and pool:
```
root@virt02:~# ceph osd pool get rbd_erasure crush_rule
crush_rule: 3host4osd
root@virt02:~# ceph osd pool get rbd_erasure size
size: 12
root@virt02:~# ceph osd pool get rbd_erasure min_size
min_size: 7
root@virt02:~# ceph osd pool get rbd_erasure erasure_code_profile
erasure_code_profile: default
root@virt02:~# ceph osd erasure-code-profile get default
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=6
plugin=jerasure
technique=reed_sol_van
w=8
```
Based on my understanding of ceph, this should pick 3 hosts, then pick 4
OSDs for each of those hosts. This is *almost* the case. However when testing
taking out a host after putting a bunch of data on there, it seems 5 PGs (out
of 512) seem to have more than 4 chunks placed on the same host. In all cases
it's the same host that gets the extra pieces. When the host is out, I see
errors:
```
[WRN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive, 5 pgs down
pg 2.87 is down, acting [2147483647,2147483647,2147483647,2147483647,22,2147483647,2147483647,20,16,2147483647,17,18]
pg 2.f3 is down, acting [2147483647,22,2147483647,2147483647,23,2147483647,18,17,2147483647,2147483647,2147483647,2147483647]
pg 2.100 is down, acting [2147483647,18,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,9,20,22,4]
pg 2.141 is down, acting [2147483647,18,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,20,7,4,22]
pg 2.1bb is down, acting [20,2147483647,2147483647,2147483647,18,2147483647,23,17,2147483647,2147483647,2147483647,2147483647]
```
As an example, PG 2.87 (rbd_erasure has pool ID 2 according to
`ceph osd lspools`):
```
root@virt02:~# ceph pg 2.87 query
[...]
"up": [
2,
0,
6,
5,
22,
1,
8,
20,
16,
14,
17,
18
],
"acting": [
2,
0,
6,
5,
22,
1,
8,
20,
16,
14,
17,
18
],
[...]
```
OSDs 0, 1, 2, 5, 6, 8 and 14 are all running on the same
OSD host.
All hosts are running ceph octopus 15.2.9.
I've put the output of various diagnostic commands into files accessible over
HTTPS here:
https://dsg.is/ceph_placement_problem_data/ceph_osd_crush_rule_dump_3host4o…https://dsg.is/ceph_placement_problem_data/ceph_osd_lspools.txthttps://dsg.is/ceph_placement_problem_data/ceph_osd_pool_get_rbd_erasure_al…https://dsg.is/ceph_placement_problem_data/ceph_pg_2.87_query.txthttps://dsg.is/ceph_placement_problem_data/ceph_pg_dump_all.txthttps://dsg.is/ceph_placement_problem_data/ceph_pg_ls.txt
Any thoughts or ideas what I'm doing wrong?
Kind regards,
Davíð
Hi
I'm currently having a bit of an issue with setting up end user authentication and I would be thankful for any tips I could get.
The general scenario is like that; end users are authorised thorough webapp and mobile app thorough keycloak. User has to be able to upload and download data using web interface and mobile app. In order to do that I need to get AssumeRoleWithWebIdentity working.
I followed the steps outlined in https://docs.ceph.com/en/latest/radosgw/STS/. Following that guide I was able to get AssumeRole example to work, but not AssumeRoleWithWebIdentity.
This is the behaviour I'm getting (logged in aws-cli as TESTER):
Username TESTER
Full name TestUser
Suspended No
System No
Maximum buckets 1000
Capabilities
oidc-provider (*)
roles (*)
$ aws --endpoint=http://10.10.xx.xx iam list-roles
{
"Roles": [
{
"Path": "/",
"RoleName": "S3Access",
"RoleId": "d1b84ec1-cceb-4c32-a605-f208b30123e2",
"Arn": "arn:aws:iam:::role/S3Access",
"CreateDate": "2021-03-24T13:08:20.522Z",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": [
"arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringEquals": {
"xxxxx.xxxxnt.com/auth/realms/xxxxnt:app_id": "xxxxnt_xxxx_backend"
}
}
}
]
},
"MaxSessionDuration": 3600
}
]
}
$ aws --endpoint=http://10.10.xx.xx iam list-open-id-connect-providers
{
"OpenIDConnectProviderList": [
{
"Arn": "arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt"
}
]
}
$ aws --endpoint=http://10.10.xx.xx iam get-open-id-connect-provider --open-id-connect-provider-arn "arn:aws:iam:::oidc-provider/xxxxx.xxxxnt.com/auth/realms/xxxxnt"
{
"Url": "https://xxxxx.xxxxnt.com/auth/realms/xxxxnt",
"ClientIDList": [
"test_ceph"
],
"ThumbprintList": [
"02DC870BD9E72360C090Fxxxxxxxxxxxxxxxxxxx"
],
"CreateDate": "2021-03-24T12:26:38.173Z"
}
$ curl -X POST https://xxxxx.xxxxnt.com/auth/realms/xxxxnt/protocol/openid-connect/token -H "Content-Type: application/x-www-form-urlencoded" -d "username=admin" -d "password=omitted" -d "grant_type=password" -d "client_id=test_ceph" -d "client_secret=d01eafe2-xxxx-xxxx-xxxx-xxxxxx7b7dad"
{"access_token":"eyJhbGc.........tTRy1bA","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbG......RU","token_type":"Bearer","not-before-policy":0,"session_state":"3a57b32a-b17c-4b29-bd68-8ce06b6bd2a8","scope":"email account ..... xxxxnt_xxxx_backend profile"}
$ aws --debug --endpoint=http://10.10.xx.xx sts assume-role-with-web-identity --role-arn "arn:aws:iam:::role/S3Access" --role-session-name "test" --web-identity-token "eyJhbGc.........tTRy1bA"
.....
2021-03-25 10:17:45,309 - MainThread - botocore.parsers - DEBUG - Response body:
b'<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><RequestId>tx000000000000000000032-00605c5538-2cf3a-pl</RequestId><HostId>2cf3a-pl-default</HostId></Error>'|
.....
An error occurred (Unknown) when calling the AssumeRoleWithWebIdentity operation: Unknown
JWT token returned by keycloak contains fields
"iss": "https://xxxxx.xxxxnt.com/auth/realms/flexgent"<https://xxxxx.xxxxnt.com/auth/realms/flexgent>,
"aud": "xxxxnt_xxxx_backend",
"azp": "test_ceph",
Thumbprint was generated using example from ceph documentation (curl from jwks_uri).
I'm not really sure what might be wrong, I'll be thankful for any hints - including debugging hints, because so far I'm unable to get useful logs on that.
[https://softgent.com/wp-content/uploads/2020/01/Zasob-14.png]<https://www.softgent.com>
Softgent Sp. z o.o., Budowlanych 31d, 80-298 Gdansk, POLAND
KRS: 0000674406, NIP: 9581679801, REGON: 367090912
www.softgent.com
Sąd Rejonowy Gdańsk-Północ w Gdańsku, VII Wydział Gospodarczy Krajowego Rejestru Sądowego
KRS 0000674406, Kapitał zakładowy: 25 000,00 zł wpłacony w całości.
Hi,
There is a default limit of 1TiB for the max_file_size in CephFS. I altered that to 2TiB, but I now got a request for storing a file up to 7TiB.
I'd expect the limit to be there for a reason, but what is the risk of setting that value to say 10TiB?
--
Mark Schouten <mark(a)tuxis.nl>
Tuxis, Ede, https://www.tuxis.nl
T: +31 318 200208
Hello Ceph Users,
Has anyone come across this error when converting to cephadm?
2021-03-25 20:41:05,616 DEBUG /bin/podman: stderr Error: error getting image "ceph-375dcabe-574f-4002-b322-e7f89cf199e1-rgw.COMPANY.LOCATION.NAS-COMPANY-RK2-CEPH06.pcckdr": repository name must be lowercase
It seems that cephadm needs to ensure the hostname / realm / zone is in lowercase when passing through to podman.
Any known workarounds? All our realms are uppercase.
It seems to work fine on ubuntu 20.04 but not centos 7 ( I assume podman issue )
Thanks
Glen
This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately.
There are just a couple remaining issues before the final release.
Please test it out and report any bugs.
The full release notes are in progress here [0].
Notable Changes
---------------
* New ``bluestore_rocksdb_options_annex`` config
parameter. Complements ``bluestore_rocksdb_options`` and allows
setting rocksdb options without repeating the existing defaults.
* The cephfs addes two new CDentry tags, 'I' --> 'i' and 'L' --> 'l',
and on-RADOS metadata is no longer backwards compatible after
upgraded to Pacific or a later release.
* $pid expansion in config paths like ``admin_socket`` will now
properly expand to the daemon pid for commands like ``ceph-mds`` or
``ceph-osd``. Previously only ``ceph-fuse``/``rbd-nbd`` expanded
``$pid`` with the actual daemon pid.
* The allowable options for some ``radosgw-admin`` commands have been
changed.
* ``mdlog-list``, ``datalog-list``, ``sync-error-list`` no longer
accepts start and end dates, but does accept a single optional
start marker. * ``mdlog-trim``, ``datalog-trim``,
``sync-error-trim`` only accept a single marker giving the end of
the trimmed range. * Similarly the date ranges and marker ranges
have been removed on the RESTful DATALog and MDLog list and trim
operations.
* ceph-volume: The ``lvm batch`` subcommand received a major
rewrite. This closed a number of bugs and improves usability in
terms of size specification and calculation, as well as idempotency
behaviour and disk replacement process. Please refer to
https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more
detailed information.
* Configuration variables for permitted scrub times have changed. The
legal values for ``osd_scrub_begin_hour`` and ``osd_scrub_end_hour``
are 0 - 23. The use of 24 is now illegal. Specifying ``0`` for
both values causes every hour to be allowed. The legal vaues for
``osd_scrub_begin_week_day`` and ``osd_scrub_end_week_day`` are 0 -
6. The use of 7 is now illegal. Specifying ``0`` for both values
causes every day of the week to be allowed.
* Multiple file systems in a single Ceph cluster is now stable. New
Ceph clusters enable support for multiple file systems by
default. Existing clusters must still set the "enable_multiple" flag
on the fs. Please see the CephFS documentation for more information.
* volume/nfs: Recently "ganesha-" prefix from cluster id and
nfs-ganesha common config object was removed, to ensure consistent
namespace across different orchestrator backends. Please delete any
existing nfs-ganesha clusters prior to upgrading and redeploy new
clusters after upgrading to Pacific.
* A new health check, DAEMON_OLD_VERSION, will warn if different
versions of Ceph are running on daemons. It will generate a health
error if multiple versions are detected. This condition must exist
for over mon_warn_older_version_delay (set to 1 week by default) in
order for the health condition to be triggered. This allows most
upgrades to proceed without falsely seeing the warning. If upgrade
is paused for an extended time period, health mute can be used like
this "ceph health mute DAEMON_OLD_VERSION --sticky". In this case
after upgrade has finished use "ceph health unmute
DAEMON_OLD_VERSION".
* MGR: progress module can now be turned on/off, using the commands:
``ceph progress on`` and ``ceph progress off``. * An AWS-compliant
API: "GetTopicAttributes" was added to replace the existing
"GetTopic" API. The new API should be used to fetch information
about topics used for bucket notifications.
* librbd: The shared, read-only parent cache's config option
``immutable_object_cache_watermark`` now has been updated to
property reflect the upper cache utilization before space is
reclaimed. The default ``immutable_object_cache_watermark`` now is
``0.9``. If the capacity reaches 90% the daemon will delete cold
cache.
* OSD: the option ``osd_fast_shutdown_notify_mon`` has been introduced
to allow the OSD to notify the monitor it is shutting down even if
``osd_fast_shutdown`` is enabled. This helps with the monitor logs
on larger clusters, that may get many 'osd.X reported immediately
failed by osd.Y' messages, and confuse tools.
[0] https://github.com/ceph/ceph/pull/40265