April 2021 - ceph-users - lists.ceph.io

by Robert Sander

Hi, I have a test cluster now running on Pacific with the cephadm orchestrator and upstream container images. In the Dashboard on the services tab I created a new service for NFS. The containers got deployed. But when I go to the NFS tab and try to create a new NFS share the Dashboard only returns a 500 error: Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.146+0000 7f64468d1700 0 [dashboard ERROR exception] Internal Server Error Apr 05 19:38:49 ceph01 bash[35064]: Traceback (most recent call last): Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 46, in dashboard_exception_handler Apr 05 19:38:49 ceph01 bash[35064]: return handler(*args, **kwargs) Apr 05 19:38:49 ceph01 bash[35064]: File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ Apr 05 19:38:49 ceph01 bash[35064]: return self.callable(*self.args, **self.kwargs) Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 694, in inner Apr 05 19:38:49 ceph01 bash[35064]: ret = func(*args, **kwargs) Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/controllers/nfsganesha.py", line 265, in fsals Apr 05 19:38:49 ceph01 bash[35064]: return Ganesha.fsals_available() Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/ganesha.py", line 154, in fsals_available Apr 05 19:38:49 ceph01 bash[35064]: if RgwClient.admin_instance().is_service_online() and \ Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 301, in admin_instance Apr 05 19:38:49 ceph01 bash[35064]: return RgwClient.instance(daemon_name=daemon_name) Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 241, in instance Apr 05 19:38:49 ceph01 bash[35064]: RgwClient._daemons = _get_daemons() Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 53, in _get_daemons Apr 05 19:38:49 ceph01 bash[35064]: raise NoRgwDaemonsException Apr 05 19:38:49 ceph01 bash[35064]: dashboard.services.rgw_client.NoRgwDaemonsException: No RGW service is running. Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.150+0000 7f64468d1700 0 [dashboard ERROR request] [::ffff:10.0.44.42:39898] [GET] [500] [0.030s] [admin] [513.0B] /ui-api/nfs-ganesha/fsals Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.150+0000 7f64468d1700 0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "e89b8519-352f-4e44-a364-6e6faf9dc533"} '] I have no radosgateways in that cluster (currently). There are the pools for radosgw (.rgw.root etc) but no running instance. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

3 years

2
2
0 0

Cephfs: Migrating Data to a new Data Pool

by ceph＠fionera.de

Hello everyone, I currently have a CephFS running with about 60TB of Data. I created it with a replicated pool as default pool, an erasure coded one as additional data pool like it is described in the docs. Now I want to migrate the data from the replicated pool, to the new erasure coded one. I couldn't find any docs and was wondering if its even possible currently. Thank you very much, Fionera

3 years

3
4
0 0

understanding orchestration and cephadm

by Gary Molenkamp

Good morning all, I'm experimenting with ceph orchestration and cephadm after using ceph-deploy for several years, and I have a hopefully simple question. I've converted a basic nautilus cluster over to cephadm+orchestration and I tried adding, then removing a monitor. However, when I removed the host using 'ceph orch host rm', it removed two mons. I may have missed something in the adoption/upgrade that has left the cluster in a bad state. Any advice/pointers/clarification would be of assistance. Details: A nautilus cluster with two mons (I know this is not correct for quorum), a mgr, and a handful of osds. I went though the adoption process and enabled the ceph orch backend. [root@osdev-ctrl2 ~]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID mgr.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f e73c19b51a09 mon.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f a6bfc27221f0 mon.osdev-net1 osdev-net1 running (18h) 50s ago 18h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f f66e2bef3d44 osd.0 osdev-stor1 running (17h) 50s ago 17h 15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f ac59dbdc267c ... [root@osdev-ctrl2 ~]# ceph orch status Backend: cephadm Available: True [root@osdev-ctrl2 ~]# ceph orch host ls HOST ADDR LABELS STATUS osdev-ctrl2 osdev-ctrl2 mon mgr osdev-net1 osdev-net1 mon osdev-stor1 osdev-stor1 osd [root@osdev-ctrl2 ~]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID mgr 1/1 9m ago 20h label:mgr docker.io/ceph/ceph:v15.2.10 5b724076c58f mon 2/2 9m ago 20h label:mon docker.io/ceph/ceph:v15.2.10 5b724076c58f I then added a new mon host: [root@osdev-ctrl2 ~]# ceph orch host add osdev-ctrl3 mon It did not spawn a mon container on osdev-ctrl3 until I defined the public network in the config: [root@osdev-ctrl2 ~]# ceph config set global public_network 10.10.10.0/24 At this point all is good with three running mon as expected. Now I wanted to delete the mon using [root@osdev-ctrl2 ~]# ceph orch host rm osdev-ctrl3 This had the effect of: 1. removing the osdev-ctrl3 mon from 'ceph orch ls' and 'ceph orch ps' 2. the mon on osdev-ctrl3 is still running, and is part of 'ceph -s' but reported as not managed by cephadm 3. (Big issue) the mon running on osdev-net1 was completely destroyed. Any ideas what is going on? Sorry for the long post, but I tried to be as clear as possible. -- Gary Molenkamp Computer Science/Science Technology Services Systems Administrator University of Western Ontario molenkam(a)uwo.ca http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566

3 years

3
3
0 0

RGW S3 user.rgw.olh.pending - Can not overwrite on 0 byte objects rgw sync leftovers.

by by morphin

Hello. I was have one-way multisite S3 cluster and we've seen issues with rgw-sync due to sharding problems and I've stopped the multisite sync. This is not the topic just a knowledge about my story. I have some leftover 0 byte objects in destination and I'm trying to overwrite them with Rclone "path to path". But somehow I can not overwrite these objects. If I delete with rclone or rados rm and do rclone copy again, I got the result below. Rclone gives error but the object is created again "0 byte" with pending attrs. Why is this happening? I think somehow I need to clean these objects and copy from source again but how? What is "user.rgw.olh.pending" ? [root@SRV1]# radosgw-admin --id radosgw.prod1 object stat --bucket=mybucket --object=images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f { "name": "images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f", "size": 0, "tag": "713li30rvcrjfwhctx894mj7vf1wa1a8", "attrs": { "user.rgw.manifest": "", "user.rgw.olh.idtag": "v1m9jy4cjck38ptel09qebsbb10pe2af", "user.rgw.olh.info": "\u0001\u0001�", "user.rgw.olh.pending.00000000606b04728gs23ecq11b3i3l1": "\u0001\u0001\u0008", "user.rgw.olh.pending.00000000606b0472bfhdzxeb9wesd8t7": "\u0001\u0001\u0008", "user.rgw.olh.pending.00000000606b0472fv06t1dob3vmo4da": "\u0001\u0001\u0008", "user.rgw.olh.pending.00000000606b0472lql6c9o88rt211r9": "\u0001\u0001\u0008", "user.rgw.olh.ver": "" } } [root@SRV1]# rados listxattr -p prod.rgw.buckets.data c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f user.rgw.idtag user.rgw.olh.idtag user.rgw.olh.info user.rgw.olh.ver [root@SRV1]# rados -p prod.rgw.buckets.data stat c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f prod.rgw.buckets.data/c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f mtime 2021-04-05 17:10:55.000000, size 0

3 years

1
0
0 0

"unable to find any IP address in networks"

by Stephen Smith6

3 years

1
1
0 0

cephadm:: how to change the image for services

by Adrian Sevcenco

Hi! How/where can i change the image configured for a service? I tried to modify /var/lib/ceph/<fsid>/<service_name>/unit.{image,run} but after restarting ceph orch ps shows that the service use the same old image. What other configuration locations are there for the ceph components beside /etc/ceph (which is quite sparse) and /var/lib/ceph/<fsid> ? Thank you! Adrian

3 years

2
3
0 0

First 6 nodes cluster with Octopus

by mabi

Hello, I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra. 1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04 instead or should I stick with 18.04? 2) The documentation recommends using Cephadm for new deployments, so I will use that but I read that with Cephadm everything is running in containers, so is this the new way to go? Or is Ceph in containers kind of still experimental? 3) As I will be needing cephfs I will also need MDS servers so with a total of 6 nodes I am planning the following layout: Node 1: MGR+MON+MDS Node 2: MGR+MON+MDS Node 3: MGR+MON+MDS Node 4: OSD Node 5: OSD Node 6: OSD Does this make sense? I am mostly interested in stability and HA with this setup. 4) Is there any special kind of demand in terms of disks on the MGR+MON+MDS nodes? Or can I use have my OS disks on these nodes? As far as I understand the MDS will create a metadata pool on the OSDs. Thanks for the hints. Best, Mabi

3 years

4
7
0 0

Real world Timings of PG states

by Nico Schottelius

Good morning, I was wondering if there are any timing indications as to how long a PG should "usually" stay in a certain state? For instance, how long should a pg stay in - peering (seconds - minutes?) - activating (seconds?) - srubbing (+deep) The scrub process obviously depends on the number of objects in the PG, however is the same true for peering and activation? Since Nautilus we see longer (minutes long) peering states in the cluster, which we did not see before. Thanks for your input and hav a good start into the week! Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch

3 years

1
0
0 0

Is metadata on SSD or bluestore cache better?

by Kai Börnert

Hi, I hope this mailgroup is ok for this kind of questions, if not please ignore. I'm currently in the process of planning a smaller ceph cluster mostly for cephfs use. The budget still allows for some SSD's in addition to the required harddisks. I see two options on how to use those, a) Make SSD only pools for the cephfs metadata b) Give every OSD a SSD for the bluestore cache I was not able to find any suggestions or benchmarks so far, does anyone has further resources or insight into those options? Greetings, Kai

3 years

2
1
0 0

Re: Upgrade and lost osds Operation not permitted

by Behzad Khoshbakhti

running as ceph user and not root. Following is the startup configuration which can be found via the https://paste.ubuntu.com/p/2kV8KhrRfV/. [Unit] Description=Ceph object storage daemon osd.%i PartOf=ceph-osd.target After=network-online.target local-fs.target time-sync.target Before=remote-fs-pre.target ceph-osd.target Wants=network-online.target local-fs.target time-sync.target remote-fs-pre.target ceph-osd.target [Service] Environment=CLUSTER=ceph EnvironmentFile=-/etc/default/ceph ExecReload=/bin/kill -HUP $MAINPID ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i LimitNOFILE=1048576 LimitNPROC=1048576 LockPersonality=true MemoryDenyWriteExecute=true # Need NewPrivileges via `sudo smartctl` NoNewPrivileges=false PrivateTmp=true ProtectClock=true ProtectControlGroups=true ProtectHome=true ProtectHostname=true ProtectKernelLogs=true ProtectKernelModules=true # flushing filestore requires access to /proc/sys/vm/drop_caches ProtectKernelTunables=false ProtectSystem=full Restart=on-failure RestartSec=10 RestrictSUIDSGID=true StartLimitBurst=3 StartLimitInterval=30min TasksMax=infinity [Install] WantedBy=ceph-osd.target When I issue the following command, the ceph osd starts successfully. However, when it is failed when launching from systemctl. root@osd03:~# /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph 2021-04-05T11:24:08.823+0430 7f91772c5f00 -1 osd.2 496 log_to_monitors {default=true} 2021-04-05T11:24:09.943+0430 7f916f7b9700 -1 osd.2 496 set_numa_affinity unable to identify public interface 'ens160' numa node: (0) Success On Mon, Apr 5, 2021, 10:51 AM Behzad Khoshbakhti <khoshbakhtib(a)gmail.com> wrote: > running as ceph user > > On Mon, Apr 5, 2021, 10:49 AM Anthony D'Atri <anthony.datri(a)gmail.com> > wrote: > >> Running as root, or as ceph? >> >> > On Apr 4, 2021, at 3:51 AM, Behzad Khoshbakhti <khoshbakhtib(a)gmail.com> >> wrote: >> > >> > It worth mentioning as I issue the following command, the Ceph OSD >> starts >> > and joins the cluster: >> > /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup >> ceph >> > >> > >> > >> > On Sun, Apr 4, 2021 at 3:00 PM Behzad Khoshbakhti < >> khoshbakhtib(a)gmail.com> >> > wrote: >> > >> >> Hi all, >> >> >> >> As I have upgrade my Ceph cluster from 15.2.10 to 16.2.0, during the >> >> manual upgrade using the precompiled packages, the OSDs was down with >> the >> >> following messages: >> >> >> >> root@osd03:/var/lib/ceph/osd/ceph-2# ceph-volume lvm activate --all >> >> --> Activating OSD ID 2 FSID 2d3ffc61-e430-4b89-bcd4-105b2df26352 >> >> Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 >> >> Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph >> prime-osd-dir >> >> --dev >> >> >> /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352 >> >> --path /var/lib/ceph/osd/ceph-2 --no-mon-config >> >> Running command: /usr/bin/ln -snf >> >> >> /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352 >> >> /var/lib/ceph/osd/ceph-2/block >> >> Running command: /usr/bin/chown -h ceph:ceph >> /var/lib/ceph/osd/ceph-2/block >> >> Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1 >> >> Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 >> >> Running command: /usr/bin/systemctl enable >> >> ceph-volume@lvm-2-2d3ffc61-e430-4b89-bcd4-105b2df26352 >> >> Running command: /usr/bin/systemctl enable --runtime ceph-osd@2 >> >> Running command: /usr/bin/systemctl start ceph-osd@2 >> >> --> ceph-volume lvm activate successful for osd ID: 2 >> >> >> >> Content of /var/log/ceph/ceph-osd.2.log >> >> 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 set uid:gid to 64045:64045 >> >> (ceph:ceph) >> >> 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 ceph version 16.2.0 >> >> (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process >> >> ceph-osd, pid 5484 >> >> 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 pidfile_write: ignore >> empty >> >> --pid-file >> >> 2021-04-04T14:54:56.625+0430 7f4afbac0f00 -1* >> >> bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to >> open >> >> /var/lib/ceph/osd/ceph-2/block: (1) Operation not permitted* >> >> 2021-04-04T14:54:56.625+0430 7f4afbac0f00 -1 *** ERROR: unable to open >> >> OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or >> directory* >> >> >> >> >> >> root@osd03:/var/lib/ceph/osd/ceph-2# systemctl status ceph-osd@2 >> >> â— ceph-osd(a)2.service - Ceph object storage daemon osd.2 >> >> Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled; >> >> vendor preset: enabled) >> >> Active: failed (Result: exit-code) since Sun 2021-04-04 14:55:06 >> >> +0430; 50s ago >> >> Process: 5471 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh >> >> --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS) >> >> Process: 5484 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} >> --id >> >> 2 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) >> >> Main PID: 5484 (code=exited, status=1/FAILURE) >> >> >> >> Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Scheduled >> restart >> >> job, restart counter is at 3. >> >> Apr 04 14:55:06 osd03 systemd[1]: Stopped Ceph object storage daemon >> osd.2. >> >> Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Start request >> >> repeated too quickly. >> >> Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Failed with >> result >> >> 'exit-code'. >> >> Apr 04 14:55:06 osd03 systemd[1]: Failed to start Ceph object storage >> >> daemon osd.2. >> >> root@osd03:/var/lib/ceph/osd/ceph-2# >> >> >> >> root@osd03:~# lsblk >> >> NAME MAJ:MIN RM SIZE RO TYPE >> MOUNTPOINT >> >> fd0 2:0 1 4K 0 disk >> >> loop0 7:0 0 55.5M 1 loop >> >> /snap/core18/1988 >> >> loop1 7:1 0 69.9M 1 loop >> >> /snap/lxd/19188 >> >> loop2 7:2 0 55.5M 1 loop >> >> /snap/core18/1997 >> >> loop3 7:3 0 70.4M 1 loop >> >> /snap/lxd/19647 >> >> loop4 7:4 0 32.3M 1 loop >> >> /snap/snapd/11402 >> >> loop5 7:5 0 32.3M 1 loop >> >> /snap/snapd/11107 >> >> sda 8:0 0 80G 0 disk >> >> ├─sda1 8:1 0 1M 0 part >> >> ├─sda2 8:2 0 1G 0 part /boot >> >> └─sda3 8:3 0 79G 0 part >> >> └─ubuntu--vg-ubuntu--lv 253:0 0 69.5G 0 lvm / >> >> sdb 8:16 0 16G 0 disk >> >> └─sdb1 8:17 0 16G 0 part >> >> >> >> >> └─ceph--9d37674b--a269--4239--aa9e--66a3c74df76c-osd--block--2d3ffc61--e430--4 >> >> b89--bcd4--105b2df26352 >> >> 253:1 0 16G 0 lvm >> >> root@osd03:~# >> >> >> >> root@osd03:/var/lib/ceph/osd/ceph-2# mount | grep -i ceph >> >> tmpfs on /var/lib/ceph/osd/ceph-2 type tmpfs (rw,relatime) >> >> root@osd03:/var/lib/ceph/osd/ceph-2# >> >> >> >> any help is much appreciated >> >> -- >> >> >> >> Regards >> >> Behzad Khoshbakhti >> >> Computer Network Engineer (CCIE #58887) >> >> >> >> >> > >> > -- >> > >> > Regards >> > Behzad Khoshbakhti >> > Computer Network Engineer (CCIE #58887) >> > +989128610474 >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users(a)ceph.io >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >>

3 years

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2021