July 2020 - ceph-users - lists.ceph.io

Adding OpenStack Keystone integrated radosGWs to an existing radosGW cluster

by Thomas Byrne - UKRI STFC

Hi all, We run a simple single zone nautilus radosGW instance with a few gateway machines for some of our users. I've got some more gateway machines earmarked for the purpose of adding some OpenStack Keystone integrated RadosGW gateways to the cluster. I'm not sure how best to add them alongside the exiting radosGWs gateways/infrastructure. The options I think I have are: 1) Add keystone integration to all radosGWs gateways. Simplest, but I have (possibly unfounded) concerns about issues with keystone causing problems for non OpenStack users (added authentication latency), and I'm not sure I fully understand how the OpenStack users/buckets will interact with our existing users. 2) Add keystone integration to separate gateways. This keeps the radosGW servers separate, and deals with one of my concerns above. 3) Add a separate radosGW zone/instance (not sure what the correct term is), and have separate gateways for this instance. Seems very heavyweight for what I'm trying to achieve, but that may be my inexperience talking. 4) Something else entirely? Any advice would be greatly appreciated! Cheers, Tom This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI.

3 years, 9 months

1
0
0 0

Poor Windows performance on ceph RBD.

by Frank Schilder

Dear all, maybe someone can give me a pointer here. We are running OpenNebula with ceph RBD as a back-end store. We have a pool of spinning disks to create large low-demand data disks, mainly for backups and other cold storage. Everything is fine when using linux VMs. However, Windows VMs perform poorly, they are like a factor 20 slower than a similarly created linux VM. If anyone has pointers what to look for, we would be very grateful. The OpenNebula installation is more or less default. The current OS and libvirt versions we use are: Centos 7.6 with stock kernel 3.10.0-1062.1.1.el7.x86_64 libvirt-client.x86_64 4.5.0-23.el7_7.1 @updates qemu-kvm-ev.x86_64 10:2.12.0-33.1.el7 @centos-qemu-ev Some benchmark results from good to worse workloads: rbd bench --io-size 4M --io-total 4G --io-pattern seq --io-type write --io-threads 16 : 450MB/s rbd bench --io-size 4M --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 230MB/s rbd bench --io-size 1M --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 190MB/s rbd bench --io-size 64K --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 150MB/s rbd bench --io-size 64K --io-total 1G --io-pattern rand --io-type write --io-threads 1 : 26MB/s dd with conv=fdatasync gives awesome 500MB/s inside linux VM for sequential write of 4GB. We copied a couple of large ISO files inside the Windows VM and for the first ca. 1 to 1.5G it performs as expected. Thereafter, however, write speed drops rapidly to ca. 25MB/s and does not recover. It is almost as if Windows translates large sequential writes to small random writes. If anyone has seen and solved this before, please let us know. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 9 months

5
9
0 0

Error on upgrading to 15.2.4 / invalid service name using containers

by Mario J. Barchéin Molina

Hello. I'm trying to upgrade to ceph 15.2.4 from 15.2.3. The upgrade is almost finished, but it has entered in a service start/stop loop. I'm using a container deployment over Debian 10 with 4 nodes. The problem is with a service named literally "mds.label:mds". It has the colon character, which is of special use in docker. This character can't appear in the name of the container and also breaks the volumen binding syntax. I have seen in the /var/lib/ceph/UUID/ the files for this service: root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca# ls -la total 48 drwx------ 12 167 167 4096 jul 10 02:54 . drwxr-x--- 3 ceph ceph 4096 jun 24 16:36 .. drwx------ 3 nobody nogroup 4096 jun 24 16:37 alertmanager.ceph-admin drwx------ 3 167 167 4096 jun 24 16:36 crash drwx------ 2 167 167 4096 jul 10 01:35 crash.ceph-admin drwx------ 4 998 996 4096 jun 24 16:38 grafana.ceph-admin drwx------ 2 167 167 4096 jul 10 02:55 mds.label:mds.ceph-admin.rwmtkr drwx------ 2 167 167 4096 jul 10 01:33 mgr.ceph-admin.doljkl drwx------ 3 167 167 4096 jul 10 01:34 mon.ceph-admin drwx------ 2 nobody nogroup 4096 jun 24 16:38 node-exporter.ceph-admin drwx------ 4 nobody nogroup 4096 jun 24 16:38 prometheus.ceph-admin drwx------ 4 root root 4096 jul 3 02:43 removed root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr# ls -la total 32 drwx------ 2 167 167 4096 jul 10 02:55 . drwx------ 12 167 167 4096 jul 10 02:54 .. -rw------- 1 167 167 295 jul 10 02:55 config -rw------- 1 167 167 152 jul 10 02:55 keyring -rw------- 1 167 167 38 jul 10 02:55 unit.configured -rw------- 1 167 167 48 jul 10 02:54 unit.created -rw------- 1 root root 24 jul 10 02:55 unit.image -rw------- 1 root root 0 jul 10 02:55 unit.poststop -rw------- 1 root root 981 jul 10 02:55 unit.run root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr# cat unit.run /usr/bin/install -d -m0770 -o 167 -g 167 /var/run/ceph/0ce93550-b628-11ea-9484-f6dc192416ca /usr/bin/docker run --rm --net=host --ipc=host --name ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=ceph-admin -v /var/ru n/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/run/ceph:z -v /var/log/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/log/ceph:z -v /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/crash:/var/lib/ceph/c rash:z -v /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z -v /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.l abel:mds.ceph-admin.rwmtkr/config:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph-mds docker.io/ceph/ceph:v15 -n mds.label:mds.ceph-admin.rwmtkr -f --setuser ceph --setgroup ceph --default-log-to-file=fal se --default-log-to-stderr=true --default-log-stderr-prefix="debug " If I try to manually run the docker command, this is the error: docker: Error response from daemon: Invalid container name (ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr), only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed. If I try with a different container name, then the volume binding error rises: docker: Error response from daemon: invalid volume specification: '/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z'. This mds is not needed and I would be happy simply removing it, but I don't know how to do it. The documentation says how to do it for "normal" services, but my installation is a container deployment. I have tried to remove the directory and restart the upgrading process but then the directory with this service appears again. Please, how can I remove or rename this service so I can complete the upgrading? Also, I think it's a bug to allow docker-forbidden characters in the service names when using container deployment and it should be checked. Thank you very much. -- *Mario J. Barchéin Molina* *Departamento de I+D+i* mario(a)intelligenia.com Madrid: +34 911 86 35 46 US: +1 (918) 856 - 3838 Granada: +34 958 07 70 70 ―― intelligenia · Intelligent Engineering · Web & APP & Intranet www.intelligenia.com · @intelligenia <http://twitter.com/intelligenia> · fb.com/intelligenia · blog.intelligenia.com Madrid · C/ de la Alameda 22, 28014, Madrid, Spain <https://maps.google.com/?q=C/+de+la+Alameda+22,+28014,+Madrid,+Spain&entry=…> Miami · 2153 Coral Way #400, Miami, FL, US, 33145 <https://www.google.es/maps/place/2153+Coral+Way+%23400,+Miami,+FL+33145/@25…> Granada‎ · C/ Luis Amador nº 24, 18014, Granada, Spain <https://www.google.es/maps/place/intelligenia/@37.1947393,-3.6170297,17z/da…> *PROTECCIÓN DE DATOS*: le informamos que los datos personales y dirección de correo electrónico, recabados del propio interesado, serán tratados bajo la responsabilidad de Intelligenia Soluciones Informáticas, S.L. para el envío de comunicaciones sobre nuestros servicios y se conservarán mientras exista un interés mutuo para ello. Los datos no serán comunicados a terceros, salvo obligación legal. Le informamos que puede ejercer los derechos de acceso, rectificación, portabilidad y supresión de sus datos y los de limitación y oposición a su tratamiento dirigiéndose a C/ Luis Amador (Centro de negocios Cámara) 24 , 18014 Granada. Si considera que el tratamiento no se ajusta a la normativa vigente, podrá presentar una reclamación ante la autoridad de control en aepd.es. En cumplimiento de lo previsto en el artículo 21 de la Ley 34/2002 de Servicios de la Sociedad de la Información y Comercio Electrónico (LSSICE), si usted no desea recibir más información sobre nuestros servicios, puede darse de baja enviando un correo electrónico a info(a)intelligenia.com indicando en el *Asunto *"*BAJA*" o "*NO ENVIAR*".

3 years, 9 months

2
2
0 0

Ceph `realm pull` permission denied error

by Alex Hussein-Kershaw

Hi Ceph Users, I'm struggling with an issue that I'm hoping someone can point me towards a solution. We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup that I'm working with has 3 VMs, but of-course our design expects this to be scaled by a user as appropriate. I have a cluster deployed and it's functioning happily as storage for our product, the error occurs when I go to setup a second cluster and pair it with the first. I'm using ceph-ansible to deploy. I get the following error about 20 minutes into running the site-container playbook. 2020-07-09 14:21:10,966 p=2134 u=qs-admin | TASK [ceph-rgw : fetch the realm] *********************************************************************************************** ************************************************************************************ 2020-07-09 14:21:10,966 p=2134 u=qs-admin | Thursday 09 July 2020 14:21:10 +0000 (0:00:00.410) 0:16:18.245 ********* 2020-07-09 14:21:11,901 p=2134 u=qs-admin | fatal: [10.225.21.213 -> 10.225.21.213]: FAILED! => changed=true cmd: - docker - exec - ceph-mon-albamons_sc2 - radosgw-admin - realm - pull - --url=https://10.225.36.197:7480 - --access-key=2CQ006Lereqpysbr0l0s - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp delta: '0:00:00.545895' end: '2020-07-09 14:21:11.516539' msg: non-zero return code rc: 13 start: '2020-07-09 14:21:10.970644' stderr: |- request failed: (13) Permission denied If the realm has been changed on the master zone, the master zone's gateway may need to be restarted to recognize this user. stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> Re-running the command manually reproduces the error. I understand that the permission denied error appears to indicate the keys are not valid, suggested by https://tracker.ceph.com/issues/36619. However, I've triple checked the keys are correct on the other site. I'm at a loss of where to look for debugging, I've turned up logs on both the local and remote site for RGW and MON processes but neither seem to yield anything related. I've tried restarting everything as suggested in the error text from all the processes to a full reboot of all the VMs. I've no idea why the keys are being declined either, as they are correct (or atleast `radosgw-admin period get` on the primary site thinks so). Thanks for your help, Alex

3 years, 9 months

1
0
0 0

compaction_threads and flusher_threads can not used

by 精灵王

Hello!    For ceph nautilus v14.2.10, I cannot used "compaction_threads and flusher_threads" 。     Why restrict bluestore_rocksdb_options to setting parameters?

3 years, 9 months

1
0
0 0

Re: about replica size

by Zhenshi Zhou

Thank you all for helping me understand clearly about the 'size'. Ml Ml <mliebherr99(a)googlemail.com> 于2020年7月10日周五下午11:08写道： > If size is 2 and one disks fails you are already going to be in error > state with read only. > > Let's say you reboot one node, you will instantly get into trouble. > > If you are going to reboot one node and at the same time the other disk > fails, then you very like loose data. > > Just never ever use size 2. Not even temporary :) > > > Zhenshi Zhou <deaderzzs(a)gmail.com> schrieb am Fr., 10. Juli 2020, 04:11: > >> Hi, >> >> As we all know, the default replica setting of 'size' is 3 which means >> there >> are 3 copies of an object. What is the disadvantages if I set it to 2, >> except >> I get fewer copies? >> >> Thanks >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >

3 years, 9 months

1
0
0 0

ceph install with Ansible

by Hauke Homburg

Hello, Ich need to install in CentOS 7 Ceph with Ansible. I searched at Ansible Galaxy and some Websites for a good Howto and Playbook. Does anyone have a good Howto to do this? Thanks for help. Regards Hauke -- www.compi-creative.net

3 years, 9 months

2
2
0 0

Spillover warning log file?

by Lindsay Mathieson

Dumb question - in what log file are the rocks db spillover warnings posted? Thanks. -- Lindsay

3 years, 9 months

2
2
0 0

Research and Industrial conferences for Ceph research results

by Bobby

Hi Cephers, Can someone please share about the research and industrial conferences where one can publish Ceph related new research results? Additionally, are there any conferences which are particularly interested in Ceph results? I would like to know all suitable conferences. Thanks :-) Looking forward to hearing from you. BR Bobby !!

3 years, 9 months

1
0
0 0

MON store.db keeps growing with Octopus

by Michael Fladischer

Hi, our cluster is on Octopus 15.2.4. We noticed that our MON all ran out of space yesterday because the store.db folder kept growing until it filled up the filesystem. We added more space to the MON nodes but store.db keeps growing. Right now it's ~220GiB on the two MON nodes that are active. We shut down on MON node when it hit ~98GiB and it seems that it trimmed its local store.db down to 102MiB and now also keeps growing again. Checking the keys in store.db while the MON is offline shows a lot of "logm" and "osdmap" keys: ceph-monstore-tool <path> dump-keys|awk '{print $1}'|uniq -c 86 auth 2 config 11 health 275929 logm 55 mds_health 1 mds_metadata 602 mdsmap 599 mgr 1 mgr_command_descs 3 mgr_metadata 209 mgrstat 461 mon_config_key 1 mon_sync 7 monitor 1 monitor_store 7 monmap 454 osd_metadata 1 osd_pg_creating 4804 osd_snap 138366 osdmap 538 paxos 5 pgmap I already tried compacting it with "ceph tell ..." and "ceph-monstore-tool <path> compact" but it stayed the same size. Also copying it with "ceph-monstore-tool <path> store-copy <new-path>" just created a copy of the same size. Out cluster is currently in WARN status because we are low on space and several OSDs are in a backfill_full state. Could this be related? Regards, Michael

3 years, 9 months

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2020