April 2021 - ceph-users - lists.ceph.io

by Philip Brown

I'm trying to follow the directions in ceph-ansible for having it automatically set up the crush map. I've also looked at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/i… But.. my setup isnt working. ansible complains about: [WARNING]: While constructing a mapping from /etc/ansible/hosts.yml, line 5, column 9, found a duplicate dict key (hostA2). Using last defined value only. Could anyone explain what I'm doing wrong, please? Here's the relevant hosts.yml snippets all: children: osds: hosts: hostA1: osd_crush_location: host: 'hostA1' chassis: 'hostA' root: 'default' hostA2: osd_crush_location: host: 'hostA2' chassis: 'hostA' root: 'default' -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years

1
2
0 0

Version of podman for Ceph 15.2.10

by mabi

Hello, I would like to install Ceph 15.2.10 using cephadm and just found the following table by checking the requirements on the host: https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-p… Do I understand this table correctly that I should be using podman version 2.1? and what happens if I use the latest podman version 3.0 Best regards, Mabi

3 years

2
4
0 0

upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

by Simon Oosthoek

Hi I'm in a bit of a panic :-( Recently we started attempting to configure a radosgw to our ceph cluster, which was until now only doing cephfs (and rbd wss working as well). We were messing about with ceph-ansible, as this was how we originally installed the cluster. Anyway, it installed nautilus 14.2.18 on the radosgw and I though it would be good to pull up the rest of the cluster to that level as well using our tried and tested ceph upgrade script (it basically does an update of all ceph nodes one by one and checks whether ceph is ok again before doing the next) After the 3rd mon/mgr was done, all pg's were unavailable :-( obviously, the script is not continuing, but ceph is also broken now... The message deceptively is: HEALTH_WARN Reduced data availability: 5568 pgs inactive That's all PGs! I tried as a desperate measure to upgrade one ceph OSD node, but that broke as well, the osd service on that node gets an interrupt from the kernel.... the versions are now like: 20:29 [root@cephmon1 ~]# ceph versions { "mon": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156 }, "mds": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2 }, "overall": { "ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158, "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6 } } 12 OSDs are down # ceph -s cluster: id: b489547c-ba50-4745-a914-23eb78e0e5dc health: HEALTH_WARN Reduced data availability: 5568 pgs inactive services: mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m) mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2 mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 remapped pgs data: pools: 12 pools, 5568 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 5568 unknown progress: Rebalancing after osd.103 marked in [..............................]

3 years

5
11
0 0

short pages when listing RADOSGW buckets via Swift API

by Paul Collins

Hi, I noticed while using rclone to migrate some data from a Swift cluster into a RADOSGW cluster that sometimes when listing a bucket RADOSGW will not always return as many results as specified by the "limit" parameter, even when more objects remain to list. This results in rclone believing on subsequent runs that the objects do not exist, since it performs an initial comparison based on bucket listings, and so it needlessly recopies data. This seems contrary to how pagination is specified by Swift: https://docs.openstack.org/swift/latest/api/pagination.html Is this known behaviour, or should I go ahead and file a bug? I believe the cluster is running 15.2.8 or so, but will confirm. Thanks, Paul --- Further observations: * Here's a summary of the reply lengths I got when listing various buckets in our RADOSGW cluster. (This is not all of the buckets in the tenant; the other 100 or so are fine.) reply lengths: 1000 999 1000 1000 1000 1000 1000 1000 1000 1000 119 reply lengths: 1000 992 1000 1000 1000 1000 1000 935 1000 1000 257 reply lengths: 1000 1000 1000 1000 1000 975 1000 948 reply lengths: 953 1000 1000 1000 1000 1000 954 1000 1000 70 reply lengths: 1000 1000 1000 1000 998 15 reply lengths: 1000 1000 1000 1000 974 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 939 1000 1000 1000 1000 949 1000 1000 1000 644 reply lengths: 1000 1000 1000 1000 999 1000 1000 937 1000 1000 538 reply lengths: 1000 998 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 551 reply lengths: 1000 1000 1000 1000 1000 1000 1000 931 1000 986 1000 1000 1000 975 1000 989 1000 1000 1000 966 1000 998 921 994 1000 1000 973 58 reply lengths: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 976 1000 366 reply lengths: 1000 1000 1000 1000 1000 983 1000 1000 1000 1000 1000 1000 1000 517 reply lengths: 1000 1000 1000 984 1000 1000 971 1000 1000 401 reply lengths: 949 1000 1000 1000 1000 1000 1000 403 reply lengths: 1000 998 532 reply lengths: 951 1000 1000 1000 1000 1000 976 1000 877 * rclone uses a default $limit of 1,000, in contrast to the Python swiftclient's default of 10,000. * The Swift API doc seems clear that $limit results should always be returned if at least $limit results are available, and that receiving less than $limit results indicates no more exist. (It doesn't *explicitly* say the last, but the document could be a lot shorter if it were not intended for that to follow.) * When swiftclient is asked to fetch a listing, and full_listing is set to True, instead of implementing pagingation as described in the document above, swiftclient simply keeps fetching pages until it receives an empty page. So Swift API implementations that don't strictly implement paging per the docs may not even be noticed by most users. * From a review of its code, swiftclient seems to have done this since the very beginning. Perhaps the code was written first and then pagination on the server side was nailed down later? -- Paul Collins Wellington, New Zealand

3 years

1
0
0 0

bluestore_min_alloc_size_hdd on Octopus (15.2.10) / XFS formatted RBDs

by David Orman

Now that the hybrid allocator appears to be enabled by default in Octopus, is it safe to change bluestore_min_alloc_size_hdd to 4k from 64k on Octopus 15.2.10 clusters, and then redeploy every OSD to switch to the smaller allocation size, without massive performance impact for RBD? We're seeing a lot of storage usage amplification on EC 8+3 clusters which are HDD backed that lines up with a lot of the mailing list posts we've seen here. Upgrading to Pacific before making this change is also a possibility once a more stable release arrives, if that's necessary. Second part of this question - we are using RBDs currently on the clusters impacted. These have XFS filesystems on top, which detect the sector size of the RBD as 512byte, and XFS has a block size of 4k. With the default of 64k for bluestore_min_alloc_size_hdd, let's say a 1G file is written out to the XFS filesystem backed by the RBD. On the ceph side, is this being seen as a lot of 4k objects thus a significant space waste is occurring, or is RBD able to coalesce these into 64k objects, even though XFS is using a 4k block size? XFS details below, you can see the allocation groups are quite large: meta-data=/dev/rbd0 isize=512 agcount=501, agsize=268435440 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=134217728000, imaxpct=1 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 I'm curious if people have been tuning XFS on RBD for better performance, as well. Thank you!

3 years

2
1
0 0

Nautilus 14.2.19 radosgw ignoring ceph config

by Graham Allan

We just updated one of our ceph clusters from 14.2.15 to 14.2.19, and see some unexpected behavior by radosgw - it seems to ignore parameters set by the ceph config database. Specifically this is making it start up listening only on port 7480, and not the configured 80 and 443 (ssl) ports. Downgrading ceph on the rgw nodes back to 14.2.15 restores the expected behavior (I haven't yet tried any intermediate versions). The host OS is CentOS 7, if that matters... Here's a ceph config dump for one of the affected nodes, along with the radosgw startup log: > # ceph config dump|grep tier2-gw02 > client.rgw.tier2-gw02 basic log_file /var/log/ceph/radosgw.log * > client.rgw.tier2-gw02 advanced rgw_dns_name s3.msi.umn.edu * > client.rgw.tier2-gw02 advanced rgw_enable_usage_log true > client.rgw.tier2-gw02 basic rgw_frontends beast port=80 ssl_port=443 ssl_certificate=/etc/ceph/civetweb.pem * > client.rgw.tier2-gw02 basic rgw_thread_pool_size 512 > # tail /var/log/ceph/radosgw.log > 2021-04-08 11:51:07.956 7f420b78f700 -1 received signal: Terminated from /usr/lib/systemd/systemd --switched-root --system --deserialize 22 (PID: 1) UID: 0 > 2021-04-08 11:51:07.956 7f420b78f700 1 handle_sigterm > 2021-04-08 11:51:07.956 7f4220bc5900 -1 shutting down > 2021-04-08 11:51:07.956 7f420b78f700 1 handle_sigterm set alarm for 120 > 2021-04-08 11:51:08.010 7f4220bc5900 1 final shutdown > 2021-04-08 11:51:08.159 7f2ac6105900 0 deferred set uid:gid to 167:167 (ceph:ceph) > 2021-04-08 11:51:08.159 7f2ac6105900 0 ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) nautilus (stable), process radosgw, pid 88256 > 2021-04-08 11:51:08.300 7f2ac6105900 0 starting handler: beast > 2021-04-08 11:51:08.302 7f2ac6105900 0 set uid:gid to 167:167 (ceph:ceph) > 2021-04-08 11:51:08.317 7f2ac6105900 1 mgrc service_daemon_register rgw.tier2-gw02 metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) nautilus (stable),ceph_version_short=14.2.19,cpu=AMD EPYC 7302P 16-Core Processor,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=beast port=7480,frontend_type#0=beast,hostname=tier2-gw02.msi.umn.edu,kernel_description=#1 SMP Tue Mar 16 18:28:22 UTC 2021,kernel_version=3.10.0-1160.21.1.el7.x86_64,mem_swap_kb=4194300,mem_total_kb=131754828,num_handles=1,os=Linux,pid=88256,zone_id=default,zone_name=default,zonegroup_id=default,zonegroup_name=default} BTW I can also change "rgw_frontends" to specify a civetweb frontend instead and it will still start the default beast... I haven't seen anyone else report such a problem so I wonder if this is something local to us - like perhaps I'm using "ceph config" incorrectly in a way which happened to be accepted before? Has anyone else seen this behavior? Graham -- Graham Allan - gta(a)umn.edu Associate Director of Operations - Minnesota Supercomputing Institute

3 years

2
1
0 0

Nautilus 14.2.19 mon 100% CPU

by Robert LeBlanc

I upgraded our Luminous cluster to Nautilus a couple of weeks ago and converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. Yesterday our monitor cluster went nuts and started constantly calling elections because monitor nodes were at 100% and wouldn't respond to heartbeats. I reduced the monitor cluster to one to prevent the constant elections and that let the system limp along until the backfills finished. There are large amounts of time where ceph commands hang with the CPU is at 100%, when the CPU drops I see a lot of work getting done in the monitor logs which stops as soon as the CPU is at 100% again. I did a `perf top` on the node to see what's taking all the time and it appears to be in the rocksdb code path. I've set `mon_compact_on_start = true` in the ceph.conf but that does not appear to help. The `/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the backfills were going on. I've tried adding a second monitor, but it goes back to the constant elections. I tried restarting all the services without luck. I also pulled the monitor from the network work and tried restarting the mon service isolated (this helped a couple of weeks ago when `ceph -s` would cause 100% CPU and lock up the service much worse than this) and didn't see the high CPU load. So I'm guessing it's triggered from some external source. I'm happy to provide more info, just let me know what would be helpful. Thank you, Robert LeBlanc [image: image.png] ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

3 years

2
4
0 0

Ceph CFP Coordination for 2021

by Mike Perez

Hi everyone, I cleaned up the CFP coordination etherpad with some events coming up. Please add other events you think the community should be considering proposing content on Ceph or adjacent projects like Rook. KubeCon NA CFP, for example, is ending April 11. Take a look: https://pad.ceph.com/p/cfp-coordination I have also added this to our wiki for discovery. https://tracker.ceph.com/projects/ceph/wiki/Community -- Mike Perez

3 years

1
1
0 0

KRBD failed to mount rbd image if mapping it to the host with read-only option

by Ha, Son Hai

Hi everyone, We encountered an issue with KRBD mounting after mapping it to the host with read-only option. We try to pinpoint where the problem is, but not able to do it. The image is mounted well if we map it without the "read-only" option. This leads to an issue that the pod in k8s cannot use the snapshotted persistent volume created by ceph-csi rbd provisioner. Thank you for reading. I have reported the bug here: Bug #50234: krbd failed to mount after map image with read-only option - Ceph - Ceph<https://tracker.ceph.com/issues/50234> Context - Using admin keyring - Linux Kernel: 3.10.0-1160.15.2.el7.x86_64 - Linux Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo) - Ceph version: "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)" rbd image 'csi-vol-85919409-9797-11eb-80ba-720b2b57c790': size 10 GiB in 2560 objects order 22 (4 MiB objects) snapshot_count: 0 id: 533a03bba388ea block_name_prefix: rbd_data.533a03bba388ea format: 2 features: layering op_features: flags: create_timestamp: Wed Apr 7 13:51:02 2021 access_timestamp: Wed Apr 7 13:51:02 2021 modify_timestamp: Wed Apr 7 13:51:02 2021 Bug Reproduction # Map RBD image WITH read-only option, CANNOT mount with both readonly or readwrite option sudo rbd device map -p k8s-sharedpool csi-vol-85919409-9797-11eb-80ba-720b2b57c790 -ro /dev/rbd0 sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1 mount: cannot mount /dev/rbd0 read-only sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1 mount: /dev/rbd0 is write-protected, mounting read-only mount: cannot mount /dev/rbd0 read-only # Map RBD image WITHOUT read-only option, CAN mount with both readonly or readwrite option sudo rbd device map -p k8s-sharedpool csi-vol-85919409-9797-11eb-80ba-720b2b57c790 /dev/rbd0 sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1 mount: /mnt/test1 does not contain SELinux labels. You just mounted an file system that supports labels which does not contain labels, onto an SELinux box. It is likely that confined applications will generate AVC messages and not be allowed access to this file system. For more details see restorecon(8) and mount(8). mount: /dev/rbd0 mounted on /mnt/test1. sudo mount -v -t ext4 /dev/rbd0 /mnt/test1 mount: /mnt/test1 does not contain SELinux labels. You just mounted an file system that supports labels which does not contain labels, onto an SELinux box. It is likely that confined applications will generate AVC messages and not be allowed access to this file system. For more details see restorecon(8) and mount(8). mount: /dev/rbd0 mounted on /mnt/test1. With my best regards, Son Hai Ha -- KPMG IT Service GmbH Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank Aufsichtsratsvorsitzender: WP StB Klaus Becker Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU Datenschutz-Grundverordnung sind hier <https://home.kpmg.com/content/dam/kpmg/de/pdf/Themen/2018/datenschutzinform…> abrufbar. Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte Information untersagt. In dieser E-Mail enthaltene Meinungen oder Empfehlungen unterliegen den Bedingungen des jeweiligen Mandatsverhältnisses mit dem Adressaten. The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any opinions or advice contained in this e-mail are subject to the terms and conditions expressed in the governing KPMG client engagement letter.

3 years

2
2
0 0

cephadm/podman :: upgrade to pacific stuck

by Adrian Sevcenco

Hi! I have a single machine ceph installation and after trying to update to pacific the upgrade is stuck with: ceph -s cluster: id: d9f4c810-8270-11eb-97a7-faa3b09dcf67 health: HEALTH_WARN Upgrade: Need standby mgr daemon services: mon: 1 daemons, quorum sev.spacescience.ro (age 3w) mgr: sev.spacescience.ro.wpozds(active, since 2w) mds: sev-ceph:1 {0=sev-ceph.sev.vmvwrm=up:active} osd: 2 osds: 2 up (since 2w), 2 in (since 2w) data: pools: 4 pools, 194 pgs objects: 32 objects, 8.4 KiB usage: 2.0 GiB used, 930 GiB / 932 GiB avail pgs: 194 active+clean progress: Upgrade to docker.io/ceph/ceph:v16.2.0 (0s) [............................] How can i put the mgr on standby? so far i did not find anything relevant.. Thanks a lot! Adrian

3 years

3
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2021