- ceph-users - lists.ceph.io

by Amudhan P

Hi, I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs. My current setup: 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick each. Volume is a 3 replica, I am trying to simulate node failure. I powered down one host and started getting msg in other systems when running any command "-bash: fork: Cannot allocate memory" and system not responding to commands. what could be the reason for this? at this stage, I could able to read some of the data stored in the volume and some just waiting for IO. output from "sudo ceph -s" cluster: id: 7c138e13-7b98-4309-b591-d4091a1742b4 health: HEALTH_WARN 1 osds down 2 hosts (3 osds) down Degraded data redundancy: 5313488/7970232 objects degraded (66.667%), 64 pgs degraded services: mon: 1 daemons, quorum mon01 mgr: mon01(active) mds: cephfs-tst-1/1/1 up {0=mon01=up:active} osd: 4 osds: 1 up, 2 in data: pools: 2 pools, 64 pgs objects: 2.66 M objects, 206 GiB usage: 421 GiB used, 3.2 TiB / 3.6 TiB avail pgs: 5313488/7970232 objects degraded (66.667%) 64 active+undersized+degraded io: client: 79 MiB/s rd, 24 op/s rd, 0 op/s wr output from : sudo ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 1.81940 0 0 B 0 B 0 B 0 0 0 3 hdd 1.81940 0 0 B 0 B 0 B 0 0 0 1 hdd 1.81940 1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00 0 2 hdd 1.81940 1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00 64 TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31 MIN/MAX VAR: 1.00/1.00 STDDEV: 0.03 regards Amudhan

4 years, 8 months

1
0
0 0

ceph -openstack -kolla-ansible deployed using docker containers - One OSD is down out of 4- how can I bringt it up

by Reddi Prasad Yendluri

Hi Team, I have a production Openstack which was deployed using kolla-ansible and docker containerization has been used to deploy Ceph cluster storage. Now, 1/4 OSD is down. How can I bring it up. Basically, management of docker containerized application and restart the services or bring up the OSD services on one of the Ceph node. Can you please support me if any of you aware of this? Thanks, Reddi Prasad YENDLURI Cloud Specialist M +65 8345 9599 | D +65 6220 9908 Office: *51B Circular Road Singapore 049406* [image: PALO IT]

4 years, 8 months

1
1
0 0

Unable to replace OSDs deployed with ceph-volume lvm batch

by Burkhard Linke

Hi, we had a failing hard disk, and I replace it and want to create a new OSD on it now. But ceph-volume fails under these circumstances. In the original setup, the OSDs were created with ceph-volume lvm batch using a bunch of drives and a NVMe device for bluestore db. The batch mode uses a volume group on the NVMe device instead of partitions.I have removed the former db logical volume, the lvm setup for the former hard disk and all other remainders. Creating a new OSD with any combination of devices now fails: --data /dev/sda --block.db <nvme device> --data /dev/sda --block.db <nvme volume group> --data /dev/sda --block.db <lv created manually in nvme volume group> # ceph-volume lvm create --bluestore --data /dev/sda --block.db /dev/ceph-block-dbs-ea684aa8-544e-4c4a-8664-6cb50b3116b8/osd-block-db-a8f1489a-d97b-479e-b9a7-30fc9fa99cb5 Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b Running command: /sbin/vgcreate -s 1G --force --yes ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb /dev/sda stdout: Physical volume "/dev/sda" successfully created. stdout: Volume group "ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb" successfully created Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb stdout: Logical volume "osd-block-55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b" created. --> blkid could not detect a PARTUUID for device: /dev/ceph-block-dbs-ea684aa8-544e-4c4a-8664-6cb50b3116b8/osd-block-db-a8f1489a-d97b-479e-b9a7-30fc9fa99cb5 --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.136 --yes-i-really-mean-it stderr: purged osd.136 --> RuntimeError: unable to use device In all cases ceph-volume is not able to detect a partition uuid for the db device (which is correct, since the device is a logical volume....). Running 'ceph-volume lvm batch' again results in a OSD without using the NVMe device as db. So what is the recommended way to manually create an OSD with a certain hard disk and an existing logical volume as db partition? I would like to avoid to zap all other OSDs using the NVMe device and recreate them in a single run with 'ceph-volume lvm batch ...'. Regards, Burkhard

4 years, 8 months

2
2
0 0

ceph-iscsi and tcmu-runner RPMs for CentOS?

by Robert Sander

Hi, In the Documentation on https://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ it is stated that you need at least CentOS 7.5 with at least kernel 4.16 and to install tcmu-runner and ceph-iscsi "from your Linux distribution's software repository". CentOS does not know about tcmu-runner nor ceph-iscsi. Where do I get these RPMs from? Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

4 years, 8 months

2
1
0 0

Bucket policies with OpenStack integration and limiting access

by shubjero＠gmail.com

Good day, We have a Ceph cluster and make use of object-storage and integrate with OpenStack. Each OpenStack project/tenant is given a radosgw user which allows all keystone users of that project to access the object-storage as that single radosgw user. The radosgw user is the project id of the OpenStack project/tenant. Sometimes we have use cases where we want to access the object-storage outside of the swift-api and use tools like the aws-cli or homebrew java applications to access the object storage. For this use case what we do is generate S3 access/secret key for the specific radosgw user and they have full access to the object storage for that OpenStack project/tenant. What we want to know is if it is possible to provide granular access to containers within a single OpenStack project using S3 access keys or S3 sub-users? I know that the Swift API has ACL's that can limit by keystone user but we are exploring the possibility of doing this using S3 and S3 bucket policies so that the tools our team are developing (open source) are more transferrable to AWS S3 and Rados GW. Thanks all, Jared Baker Cloud Architect, OICR

4 years, 8 months

1
0
0 0

Out of memory

by Sylvain PORTIER

Hi, On my ceph osd servers I have lot of "out of memory messages". My servers are configured with : - 32 G of memory - 11 HDD (3,5 T each) (+ 2 HDD for the system) And the error messages are : /[101292.017968] Out of memory: Kill process 2597 (ceph-osd) score 102 or sacrifice child// //[101292.018836] Killed process 2597 (ceph-osd) total-vm:5048008kB, anon-rss:3002648kB, file-rss:0kB, shmem-rss:0kB// / Top result : PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 75469 ceph 20 0 4982324 3,988g 0 S 0,7 12,7 3:45.05 ceph-osd 75499 ceph 20 0 5095896 3,710g 0 S 0,3 11,8 4:08.93 ceph-osd 65848 ceph 20 0 5713748 3,329g 0 S 0,0 10,6 68:49.12 ceph-osd 67237 ceph 20 0 5580720 3,155g 0 S 0,0 10,1 57:15.99 ceph-osd 71113 ceph 20 0 5557608 3,101g 0 S 0,3 9,9 36:47.69 ceph-osd 74745 ceph 20 0 5117212 3,062g 0 S 3,7 9,8 8:13.56 ceph-osd 72494 ceph 20 0 5621156 2,828g 0 S 0,3 9,0 27:19.97 ceph-osd 70954 ceph 20 0 5765016 2,571g 0 S 0,3 8,2 40:02.00 ceph-osd 74817 ceph 20 0 5139328 2,510g 0 S 0,3 8,0 7:24.33 ceph-osd 76523 ceph 20 0 3324820 2,422g 0 S 0,3 7,7 0:55.54 ceph-osd Is there a way to limit or reduce the memory usage of each osd deamon ? Thank you, Regards, Sylvain PORTIER. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus

4 years, 8 months

4
5
0 0

How to test PG mapping with reweight

by Robert LeBlanc

I'd like to test how reweighting an OSD will change how the PGs map in the cluster. I suspect that I'd dump the CRUSH map and PGs in the cluster that I'm interested in then use osdmaptool. I'm not understanding how to use osdmaptool to set the reweight, then query a PG or the entire set of PGs that I'm interested in. I then suspect that if I'm okay with the new map that I could inject it into the cluster instead of having to run reweight on the OSD(s). This is a Jewel cluster and I'm trying to calculate OSD usage offline, then inject a map that is more distributed instead of doing a reweight, move the PGs which take a long time to just rinse and repeat over and over again. Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 8 months

2
3
0 0

Ceph client failed to mount RBD device after reboot

by Vang Le-Quy

I am suspecting that there is a bug in 12.2.8-0ubuntu0.18.04.1. We have too ceph client n01 and n03 with following software versions. # n01 ceph-common: Installed: 12.2.8-0ubuntu0.18.04.1 Candidate: 12.2.12-0ubuntu0.18.04.2 # n03 ceph-common: Installed: 12.2.11-0ubuntu0.18.04.2 Candidate: 12.2.12-0ubuntu0.18.04.2 # n01 uname -a Linux n01 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # n03 uname -a Linux n03 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Problem: N01 can mount its RBD after reboot whereas N03 can’t. To verify that it is something on client side, I upgrade n01 to have the same package version to n03 including ceph-common. And n01 does indeed fail to mount its CEPH RBD partition. Syslog shows messages like the following: Sep 6 12:52:33 n01 kernel: [ 875.160275] rbd: rbd0: no lock owners detected Sep 6 12:52:33 n01 kernel: [ 875.161069] rbd: rbd0: client2508785 seems dead, breaking lock Sep 6 12:52:33 n01 kernel: [ 875.161723] rbd: rbd0: blacklist of client2508785 failed: -13 Sep 6 12:52:33 n01 kernel: [ 875.161724] rbd: rbd0: failed to acquire lock: -13 Sep 6 12:52:33 n01 kernel: [ 875.161956] rbd: rbd0: no lock owners detected Sep 6 12:52:33 n01 kernel: [ 875.162755] rbd: rbd0: client2508785 seems dead, breaking lock Sep 6 12:52:33 n01 kernel: [ 875.163415] rbd: rbd0: blacklist of client2508785 failed: -13 Sep 6 12:52:33 n01 kernel: [ 875.163415] rbd: rbd0: failed to acquire lock: -13 Sep 6 12:52:33 n01 kernel: [ 875.163645] rbd: rbd0: no lock owners detected Sep 6 12:52:33 n01 kernel: [ 875.164431] rbd: rbd0: client2508785 seems dead, breaking lock Sep 6 12:52:33 n01 kernel: [ 875.165082] rbd: rbd0: blacklist of client2508785 failed: -13 Sep 6 12:52:33 n01 kernel: [ 875.165082] rbd: rbd0: failed to acquire lock: -13 Sep 6 12:52:33 n01 kernel: [ 875.165355] rbd: rbd0: no lock owners detected [signature_1482564062] Vang Quy Le Special Consultant in Data Science and Infrastructure T: (+45) 9940 7710 | Email: vle(a)its.aau.dk<mailto:vle@its.aau.dk> Kontor 0-1-91 | Selma Lagerløfs Vej 300 | DK-9220 Aalborg Ø |

4 years, 8 months

4
5
0 0

Re: using non client.admin user for ceph-iscsi gateways

by Jason Dillaman

On Fri, Sep 6, 2019 at 12:00 PM Wesley Dillingham <wdillingham(a)godaddy.com> wrote: > > the iscsi-gateway.cfg seemingly allows for an alternative cephx user other than client.admin to be used, however the comments in the documentations says specifically to use client.admin. Hmm, can you point out where this is in the docs? Originally, tcmu-runner didn't support the ability to change the user id, but that has been available for about a year now [1]. > Other than having the cfg file point to the appropriate key/user with "gateway_keyring" and giving that client read caps on the mons and full access to the pool configured to be used for iscsi are any other particular steps / settings / actions needed? Just use "profile rbd" for your caps to keep it simple. > It seems prudent to not use client.admin but I don't want to have unstable behavior or untested setup. > > Thanks. > > Respectfully, > > Wes Dillingham > wdillingham(a)godaddy.com > Site Reliability Engineer IV - Platform Storage / Ceph > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io [1] https://github.com/open-iscsi/tcmu-runner/commit/c85ccdcfb7f4b17926eda1df89… -- Jason

4 years, 8 months

2
1
0 0

using non client.admin user for ceph-iscsi gateways

by Wesley Dillingham

the iscsi-gateway.cfg seemingly allows for an alternative cephx user other than client.admin to be used, however the comments in the documentations says specifically to use client.admin. Other than having the cfg file point to the appropriate key/user with "gateway_keyring" and giving that client read caps on the mons and full access to the pool configured to be used for iscsi are any other particular steps / settings / actions needed? It seems prudent to not use client.admin but I don't want to have unstable behavior or untested setup. Thanks. Respectfully, Wes Dillingham wdillingham(a)godaddy.com Site Reliability Engineer IV - Platform Storage / Ceph

4 years, 8 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users