Hi - this just started happening in the past few days using Ceph Pacific 16.2.4 via cephadmin (Podman containers)
The dashboard is returning
No active ceph-mgr instance is currently running the dashboard. A failover may be in progress. Retrying in 5 seconds...
And ceph status returns
cluster:
id: fe3a7cb0-69ca-11eb-8d45-c86000d08867
health: HEALTH_WARN
Module 'dashboard' has failed dependency: cannot import name 'AuthManager'
clock skew detected on mon.cube
services:
mon: 3 daemons, quorum story,cube,rhel1 (age 46h)
mgr: cube.tvlgnp(active, since 47h), standbys: rhel1.zpzsjc, story.gffann
mds: 2/2 daemons up, 1 standby
osd: 13 osds: 13 up (since 46h), 13 in (since 46h)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 497 pgs
objects: 1.50M objects, 2.1 TiB
usage: 6.2 TiB used, 32 TiB / 38 TiB avail
pgs: 497 active+clean
io:
client: 255 B/s rd, 2.7 KiB/s wr, 0 op/s rd, 0 op/s wr
The only thing that has happened on the cluster was one of the servers was rebooted. No configuration changes were performed
Any suggestions?
Thanks,
rob
Hi,
I try to connect my new ceph cluster (octopus) with my kubernetes
system. Therefor I followed the setup guide form the official documentation:
https://docs.ceph.com/en/octopus/rbd/rbd-kubernetes/
The csi-rbdplugin-provisioner is running successful on all my kubernetes
worker nodes (as far as I can see).
Now I try to deploy the nginx example :
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ceph-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: ceph
---
apiVersion: v1
kind: Pod
metadata:
name: csi-rbd-demo-pod
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- name: mypvc
mountPath: /var/lib/www/html
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: ceph-pvc
readOnly: false
A Persitence volume is created
$ kubectl get pv
....
pvc-5c64ed45-adde-4fe7-9b38-9d4c7a8f7d34 1Gi RWO
Delete Bound default/ceph-pvc ceph 7m15s
and also the persitence volume claim:
$ kubectl get pvc
NAME STATUS VOLUME
CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-pvc Bound pvc-5c64ed45-adde-4fe7-9b38-9d4c7a8f7d34 1Gi
RWO ceph 8m14s
But the POD is not deployed, because of the following error message:
MountVolume.MountDevice failed for volume
"pvc-5c64ed45-adde-4fe7-9b38-9d4c7a8f7d34" : kubernetes.io/csi:
attacher.MountDevice failed to create newCsiDriverClient: driver name
rbd.csi.ceph.com not found in the list of registered CSI drivers
Can someone help me to understand the meaning of this error message?
Did I need to install something else ? Maybe a ceph-fs-plugin...?
Thanks for any help
===
Ralph
--
Hi all
We get a new samba smb fileserver who mounts our cephfs for exporting some shares. What might be a good or better network setup for that server?
Should I configure two interfaces - one for the smb share export towards our workstations and desktops and one towards the ceph cluster?
Or would it be „ok“ for all traffic to be on one interface?
The server has 40G ports.
Thanks for your suggestions and feedback . Regards . Götz
I cannot enable cephadm because it cannot find remoto lib.
Even when I installed it using "pip3 install remoto" and then installed ir
from the deb package build from the git sources at
https://github.com/alfredodeza/remoto/
If I type "import remoto" in a python3 prompt it works.
--
Alfrenovsky
I am running the Ceph ansible script to install ceph version Stable-6.0
(Pacific).
When running the sample yml file that was supplied by the github repo it
runs fine up until the "ceph-mon : check if monitor initial keyring already
exists" step. There it will hang for 30-40 minutes before failing.
From my understanding ceph ansible should be creating this keyring and
using it for communication between monitors, so does anyone know why the
playbook would have a hard time with this step?
Thanks in advance!
Hi
We're running ceph nautilus 14.2.21 (going to octopus latest in a few
weeks) as volume and instance backend for our openstack vm's. Our
clusters run somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's
as journal and db device
Currently we do not have our vm's capped on iops and throughput. We
regularly get slowops warnings (once or twice per day) and wonder
whether there are more users with sort of the same setup that do
throttle their openstack vm's.
- What kind of numbers are used in the field for IOPS and throughput
limiting?
- As a side question, is there an easy way to get rid of the slowops
warning besides restarting the involved osd. Otherwise the warning seems
to stay forever
Regards
Marcel
Hello,
I have installed and bootsraped a Ceph manager node via cephadm and the
options:
--initial-dashboard-user admin --initial-dashboard-password
[PASSWORD] --dashboard-password-noupdate
Everything works fine. I also have the Grafana Board to monitor my
cluster. But the access to Grafana is open for anonymous users because
of the grafana.ini template with the option:
[auth.anonymous]
enabled = true
I can't figure out how to tweak the default grafana.ini file. Can
someone help me how to do this?
I tried to do this with the command:
# ceph config-key set mgr/cephadm/services/grafana/grafana.ini \
-i /tmp//grafana.ini.j2
# ceph orch reconfig grafana
But without any effect. I also did not really understand where I should
place the grafana.ini file on my Host?
Thanks for any help
===
Ralph
--
Hi,
we currently run into an issue where a rbd ls for a namespace returns ENOENT for some of the images in that namespace.
/usr/bin/rbd --conf=XXX --id XXX ls 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55cacccc2390 fail: (2) No such file or directory
2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55caccd2b920 fail: (2) No such file or directory
2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55caccd9b4e0 fail: (2) No such file or directory
rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such file or directory
rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such file or directory
rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such file or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55cacce07b00 fail: (2) No such file or directory
rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such file or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55caccd7cce0 fail: (2) No such file or directory
rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such file or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 0x55cacce336f0 fail: (2) No such file or directory
rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such file or directory
[{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":31457280000,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
rbd: listing images failed: (2) No such file or directory
The way to trigger this state was that the images which show "No such file or directory" were deleted with rbd rm, but the operation was interrupted (rbd process was killed) due to a timeout.
What is the best way to recover from this and how to properly clean up?
Release is nautilus 14.2.20
Thanks,
Peter
Hello,
I replaced an OSD disk on one of my Nautilus OSD node which created a new osd number. Now ceph shows that there is one cephadm stray daemon (the old OSD #1 which I replaced) and which I can't remove as you can see below:
# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.1 on host ceph1e not managed by cephadm
# ceph orch daemon rm osd.1 --force
Error EINVAL: Unable to find daemon(s) ['osd.1']
Is there another command I am missing?
Best regards,
Mabi