Hi,
I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
My current setup:
3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
each.
Volume is a 3 replica, I am trying to simulate node failure.
I powered down one host and started getting msg in other systems when
running any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.
what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume
and some just waiting for IO.
output from "sudo ceph -s"
cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
1 osds down
2 hosts (3 osds) down
Degraded data redundancy: 5313488/7970232 objects degraded
(66.667%), 64 pgs degraded
services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up {0=mon01=up:active}
osd: 4 osds: 1 up, 2 in
data:
pools: 2 pools, 64 pgs
objects: 2.66 M objects, 206 GiB
usage: 421 GiB used, 3.2 TiB / 3.6 TiB avail
pgs: 5313488/7970232 objects degraded (66.667%)
64 active+undersized+degraded
io:
client: 79 MiB/s rd, 24 op/s rd, 0 op/s wr
output from : sudo ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
0 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
3 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
1 hdd 1.81940 1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00 0
2 hdd 1.81940 1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00 64
TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00 STDDEV: 0.03
regards
Amudhan
Hi Team,
I have a production Openstack which was deployed using kolla-ansible and
docker containerization has been used to deploy Ceph cluster storage.
Now, 1/4 OSD is down. How can I bring it up.
Basically, management of docker containerized application and restart the
services or bring up the OSD services on one of the Ceph node.
Can you please support me if any of you aware of this?
Thanks,
Reddi Prasad YENDLURI
Cloud Specialist
M +65 8345 9599 | D +65 6220 9908
Office: *51B Circular Road Singapore 049406*
[image: PALO IT]
Hi,
we had a failing hard disk, and I replace it and want to create a new
OSD on it now.
But ceph-volume fails under these circumstances. In the original setup,
the OSDs were created with ceph-volume lvm batch using a bunch of drives
and a NVMe device for bluestore db. The batch mode uses a volume group
on the NVMe device instead of partitions.I have removed the former db
logical volume, the lvm setup for the former hard disk and all other
remainders. Creating a new OSD with any combination of devices now fails:
--data /dev/sda --block.db <nvme device>
--data /dev/sda --block.db <nvme volume group>
--data /dev/sda --block.db <lv created manually in nvme volume group>
# ceph-volume lvm create --bluestore --data /dev/sda --block.db
/dev/ceph-block-dbs-ea684aa8-544e-4c4a-8664-6cb50b3116b8/osd-block-db-a8f1489a-d97b-479e-b9a7-30fc9fa99cb5
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
-i - osd new 55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b
Running command: /sbin/vgcreate -s 1G --force --yes
ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb /dev/sda
stdout: Physical volume "/dev/sda" successfully created.
stdout: Volume group "ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb"
successfully created
Running command: /sbin/lvcreate --yes -l 100%FREE -n
osd-block-55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b
ceph-1a2ddc14-780b-45e4-a036-e928862e6ccb
stdout: Logical volume
"osd-block-55cfb9f8-aa30-4f8b-8b95-a43d3f97fe5b" created.
--> blkid could not detect a PARTUUID for device:
/dev/ceph-block-dbs-ea684aa8-544e-4c4a-8664-6cb50b3116b8/osd-block-db-a8f1489a-d97b-479e-b9a7-30fc9fa99cb5
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
osd purge-new osd.136 --yes-i-really-mean-it
stderr: purged osd.136
--> RuntimeError: unable to use device
In all cases ceph-volume is not able to detect a partition uuid for the
db device (which is correct, since the device is a logical volume....).
Running 'ceph-volume lvm batch' again results in a OSD without using the
NVMe device as db.
So what is the recommended way to manually create an OSD with a certain
hard disk and an existing logical volume as db partition? I would like
to avoid to zap all other OSDs using the NVMe device and recreate them
in a single run with 'ceph-volume lvm batch ...'.
Regards,
Burkhard
Hi,
In the Documentation on
https://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ it is stated
that you need at least CentOS 7.5 with at least kernel 4.16 and to
install tcmu-runner and ceph-iscsi "from your Linux distribution's
software repository".
CentOS does not know about tcmu-runner nor ceph-iscsi.
Where do I get these RPMs from?
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Good day,
We have a Ceph cluster and make use of object-storage and integrate
with OpenStack. Each OpenStack project/tenant is given a radosgw user
which allows all keystone users of that project to access the
object-storage as that single radosgw user. The radosgw user is the
project id of the OpenStack project/tenant.
Sometimes we have use cases where we want to access the object-storage
outside of the swift-api and use tools like the aws-cli or homebrew
java applications to access the object storage. For this use case what
we do is generate S3 access/secret key for the specific radosgw user
and they have full access to the object storage for that OpenStack
project/tenant.
What we want to know is if it is possible to provide granular access
to containers within a single OpenStack project using S3 access keys
or S3 sub-users? I know that the Swift API has ACL's that can limit by
keystone user but we are exploring the possibility of doing this using
S3 and S3 bucket policies so that the tools our team are developing
(open source) are more transferrable to AWS S3 and Rados GW.
Thanks all,
Jared Baker
Cloud Architect, OICR
Hi,
On my ceph osd servers I have lot of "out of memory messages".
My servers are configured with :
- 32 G of memory
- 11 HDD (3,5 T each) (+ 2 HDD for the system)
And the error messages are :
/[101292.017968] Out of memory: Kill process 2597 (ceph-osd) score 102
or sacrifice child//
//[101292.018836] Killed process 2597 (ceph-osd) total-vm:5048008kB,
anon-rss:3002648kB, file-rss:0kB, shmem-rss:0kB//
/
Top result :
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
75469 ceph 20 0 4982324 3,988g 0 S 0,7 12,7 3:45.05
ceph-osd
75499 ceph 20 0 5095896 3,710g 0 S 0,3 11,8 4:08.93
ceph-osd
65848 ceph 20 0 5713748 3,329g 0 S 0,0 10,6 68:49.12
ceph-osd
67237 ceph 20 0 5580720 3,155g 0 S 0,0 10,1 57:15.99
ceph-osd
71113 ceph 20 0 5557608 3,101g 0 S 0,3 9,9 36:47.69
ceph-osd
74745 ceph 20 0 5117212 3,062g 0 S 3,7 9,8 8:13.56
ceph-osd
72494 ceph 20 0 5621156 2,828g 0 S 0,3 9,0 27:19.97
ceph-osd
70954 ceph 20 0 5765016 2,571g 0 S 0,3 8,2 40:02.00
ceph-osd
74817 ceph 20 0 5139328 2,510g 0 S 0,3 8,0 7:24.33
ceph-osd
76523 ceph 20 0 3324820 2,422g 0 S 0,3 7,7 0:55.54
ceph-osd
Is there a way to limit or reduce the memory usage of each osd deamon ?
Thank you,
Regards,
Sylvain PORTIER.
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
I'd like to test how reweighting an OSD will change how the PGs map in the
cluster.
I suspect that I'd dump the CRUSH map and PGs in the cluster that I'm
interested in then use osdmaptool. I'm not understanding how to use
osdmaptool to set the reweight, then query a PG or the entire set of PGs
that I'm interested in. I then suspect that if I'm okay with the new map
that I could inject it into the cluster instead of having to run reweight
on the OSD(s).
This is a Jewel cluster and I'm trying to calculate OSD usage offline, then
inject a map that is more distributed instead of doing a reweight, move the
PGs which take a long time to just rinse and repeat over and over again.
Thanks,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Fri, Sep 6, 2019 at 12:00 PM Wesley Dillingham
<wdillingham(a)godaddy.com> wrote:
>
> the iscsi-gateway.cfg seemingly allows for an alternative cephx user other than client.admin to be used, however the comments in the documentations says specifically to use client.admin.
Hmm, can you point out where this is in the docs? Originally,
tcmu-runner didn't support the ability to change the user id, but that
has been available for about a year now [1].
> Other than having the cfg file point to the appropriate key/user with "gateway_keyring" and giving that client read caps on the mons and full access to the pool configured to be used for iscsi are any other particular steps / settings / actions needed?
Just use "profile rbd" for your caps to keep it simple.
> It seems prudent to not use client.admin but I don't want to have unstable behavior or untested setup.
>
> Thanks.
>
> Respectfully,
>
> Wes Dillingham
> wdillingham(a)godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
[1] https://github.com/open-iscsi/tcmu-runner/commit/c85ccdcfb7f4b17926eda1df89…
--
Jason
the iscsi-gateway.cfg seemingly allows for an alternative cephx user other than client.admin to be used, however the comments in the documentations says specifically to use client.admin.
Other than having the cfg file point to the appropriate key/user with "gateway_keyring" and giving that client read caps on the mons and full access to the pool configured to be used for iscsi are any other particular steps / settings / actions needed?
It seems prudent to not use client.admin but I don't want to have unstable behavior or untested setup.
Thanks.
Respectfully,
Wes Dillingham
wdillingham(a)godaddy.com
Site Reliability Engineer IV - Platform Storage / Ceph