Hi,
I'm really lost with my Ceph system. I built a small cluster for home
usage which has two uses for me: I want to replace an old NAS and I want
to learn about Ceph so that I have hands-on experience. We're using it
in our company but I need some real-life experience without risking any
company or customers data. That's my preferred way of learning.
The cluster consists of 3 Raspberry Pis plus a few VMs running on
Proxmox. I'm not using Proxmox' built in Ceph because I want to focus on
Ceph and not just use it as a preconfigured tool.
All hosts are running Fedora (x86_64 and arm64) and during an Upgrade
from F36 to F37 my cluster suddenly showed all PGs as unavailable. I
worked nearly a week to get it back online and I learned a lot about
Ceph management and recovery. The cluster is back but I still can't
access my data. Maybe you can help me?
Here are my versions:
[ceph: root@ceph04 /]# ceph versions
{
"mon": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
},
"mgr": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
},
"osd": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 5
},
"mds": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 4
},
"overall": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 15
}
}
Here's MDS status output of one MDS:
[ceph: root@ceph04 /]# ceph tell mds.mds01.ceph05.pqxmvt status
2023-01-14T15:30:28.607+0000 7fb9e17fa700 0 client.60986454
ms_handle_reset on v2:192.168.23.65:6800/2680651694
2023-01-14T15:30:28.640+0000 7fb9e17fa700 0 client.60986460
ms_handle_reset on v2:192.168.23.65:6800/2680651694
{
"cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4",
"whoami": 0,
"id": 60984167,
"want_state": "up:replay",
"state": "up:replay",
"fs_name": "cephfs",
"replay_status": {
"journal_read_pos": 0,
"journal_write_pos": 0,
"journal_expire_pos": 0,
"num_events": 0,
"num_segments": 0
},
"rank_uptime": 1127.54018615,
"mdsmap_epoch": 98056,
"osdmap_epoch": 12362,
"osdmap_epoch_barrier": 0,
"uptime": 1127.957307273
}
It's staying like that for days now. If there was a counter moving, I
just would wait but it doesn't change anything and alle stats says, the
MDS aren't working at all.
The symptom I have is that Dashboard and all other tools I use say, it's
more or less ok. (Some old messages about failed daemons and scrubbing
aside). But I can't mount anything. When I try to start a VM that's on
RDS I just get a timeout. And when I try to mount a CephFS, mount just
hangs forever.
Whatever command I give MDS or journal, it just hangs. The only thing I
could do, was take all CephFS offline, kill the MDS's and do a "ceph fs
reset <fs name> --yes-i-really-mean-it". After that I rebooted all
nodes, just to be sure but I still have no access to data.
Could you please help me? I'm kinda desperate. If you need any more
information, just let me know.
Cheers,
Thomas
--
Thomas Widhalm
Lead Systems Engineer
NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510
https://www.netways.de | thomas.widhalm(a)netways.de
** stackconf 2023 - September - https://stackconf.eu **
** OSMC 2023 - November - https://osmc.de **
** New at NWS: Managed Database - https://nws.netways.de/managed-database **
** NETWAYS Web Services - https://nws.netways.de **
Good morning everyone.
On this Thursday night we went through an accident, where they accidentally renamed the .data pool of a File System making it instantly inaccessible, when renaming it again to the correct name it was possible to mount and list the files, but could not read or write. When trying to write, the FS returned as Read Only, when trying to read it returned Operation not allowed.
After a period of breaking my head I tried to mount with the ADMIN user and everything worked correctly.
I tried to remove the authentication of the current user through `ceph auth rm`, I created a new user through `ceph fs authorize <fs_name> client.<user> / rw` and it continued the same way, I also tried to recreate it through `ceph auth get-or-create` and nothing different happened, it stayed exactly the same.
After setting `allow *` in mon, mds and osd I was able to mount, read and write again with the new user.
I can understand why the File System stopped after renaming the pool, what I don't understand is why users are unable to perform operations on FS even with RW or any other user created.
What could have happened behind the scenes to not be able to perform IO even with the correct permissions? Or did I apply incorrect permissions that caused this problem?
Right now everything is working, I would really like to understand what happened, because I didn't find anything documented about this type of incident.
Hi,
I have a healthy (test) cluster running 17.2.5:
root@cephtest20:~# ceph status
cluster:
id: ba37db20-2b13-11eb-b8a9-871ba11409f6
health: HEALTH_OK
services:
mon: 3 daemons, quorum cephtest31,cephtest41,cephtest21 (age 2d)
mgr: cephtest22.lqzdnk(active, since 4d), standbys: cephtest32.ybltym, cephtest42.hnnfaf
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 48 osds: 48 up (since 4d), 48 in (since 4M)
rgw: 2 daemons active (2 hosts, 1 zones)
tcmu-runner: 6 portals active (3 hosts)
data:
volumes: 1/1 healthy
pools: 17 pools, 513 pgs
objects: 28.25k objects, 4.7 GiB
usage: 26 GiB used, 4.7 TiB / 4.7 TiB avail
pgs: 513 active+clean
io:
client: 4.3 KiB/s rd, 170 B/s wr, 5 op/s rd, 0 op/s wr
CephFS is mounted and can be used without any issue.
But I get an error when I when querying its status:
root@cephtest20:~# ceph fs status
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/status/module.py", line 159, in handle_fs_status
assert metadata
AssertionError
The dashboard's filesystem page shows no error and displays
all information about cephfs.
Where does this AssertionError come from?
Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de
Tel: 030-405051-43
Fax: 030-405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Hello,
Asking for help with an issue. Maybe someone has a clue about what's
going on.
Using ceph 15.2.17 on Proxmox 7.3. A big VM had a snapshot and I removed
it. A bit later, nearly half of the PGs of the pool entered snaptrim and
snaptrim_wait state, as expected. The problem is that such operations
ran extremely slow and client I/O was nearly nothing, so all VMs in the
cluster got stuck as they could not I/O to the storage. Taking and
removing big snapshots is a normal operation that we do often and this
is the first time I see this issue in any of my clusters.
Disks are all Samsung PM1733 and network is 25G. It gives us plenty of
performance for the use case and never had an issue with the hardware.
Both disk I/O and network I/O was very low. Still, client I/O seemed to
get queued forever. Disabling snaptrim (ceph osd set nosnaptrim) stops
any active snaptrim operation and client I/O resumes back to normal.
Enabling snaptrim again makes client I/O to almost halt again.
I've been playing with some settings:
ceph tell 'osd.*' injectargs '--osd-max-trimming-pgs 1'
ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep 30'
ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep-ssd 30'
ceph tell 'osd.*' injectargs '--osd-pg-max-concurrent-snap-trims 1'
None really seemed to help. Also tried restarting OSD services.
This cluster was upgraded from 14.2.x to 15.2.17 a couple of months. Is
there any setting that must be changed which may cause this problem?
I have scheduled a maintenance window, what should I look for to
diagnose this problem?
Any help is very appreciated. Thanks in advance.
Victor
My ceph cluster became unstable yesterday after zincati (CoreOS's
auto-updater) updated one of my nodes from 37.20221225.3.0 to
37.20230110.3.1(*). The symptom was slow ops in my cephfs mds which
started immediately the OSDs on this node became in and up. Excluding
the OSDs on this node worked round the problem. Note that the node is
also running a mon and client workloads which use ceph. Also note that
the OSD came up and (IIUC) were participating in recovering their data
to other OSDs. The problem only started when I allowed them to be in.
I rolled back the OS update and the problem was immediately resolved.
Unfortunately I didn't keep the OSD logs, but they lead me to this
thread from ceph-users:
https://www.mail-archive.com/ceph-users@ceph.io/msg18474.html . I
wonder if we have an issue with a very recent kernel update.
I should be able to reproduce if it's likely to be of use to anybody,
but for now I've rolled back this OS update and disabled automatic
updating on my other nodes.
Matt
(*) The complete list of changes:
$ rpm-ostree db diff
d477f98d52bf707d4282f6835b85bed3d60e305a0cf6eb8effd4db4b89607f05
fc214c16d248686d4cf2bb3050b59c559f091692d7af3b07ef896f1b8ab2f161
ostree diff commit from:
d477f98d52bf707d4282f6835b85bed3d60e305a0cf6eb8effd4db4b89607f05
ostree diff commit to:
fc214c16d248686d4cf2bb3050b59c559f091692d7af3b07ef896f1b8ab2f161
Upgraded:
bash 5.2.9-3.fc37 -> 5.2.15-1.fc37
btrfs-progs 6.0.2-1.fc37 -> 6.1.2-1.fc37
clevis 18-12.fc37 -> 18-14.fc37
clevis-dracut 18-12.fc37 -> 18-14.fc37
clevis-luks 18-12.fc37 -> 18-14.fc37
clevis-systemd 18-12.fc37 -> 18-14.fc37
container-selinux 2:2.193.0-1.fc37 -> 2:2.198.0-1.fc37
containerd 1.6.12-1.fc37 -> 1.6.14-2.fc37
containers-common 4:1-73.fc37 -> 4:1-76.fc37
containers-common-extra 4:1-73.fc37 -> 4:1-76.fc37
coreutils 9.1-6.fc37 -> 9.1-7.fc37
coreutils-common 9.1-6.fc37 -> 9.1-7.fc37
crun 1.7.2-2.fc37 -> 1.7.2-3.fc37
curl 7.85.0-4.fc37 -> 7.85.0-5.fc37
dnsmasq 2.87-3.fc37 -> 2.88-1.fc37
ethtool 2:6.0-1.fc37 -> 2:6.1-1.fc37
fwupd 1.8.8-1.fc37 -> 1.8.9-1.fc37
git-core 2.38.1-1.fc37 -> 2.39.0-1.fc37
grub2-common 1:2.06-63.fc37 -> 1:2.06-72.fc37
grub2-efi-x64 1:2.06-63.fc37 -> 1:2.06-72.fc37
grub2-pc 1:2.06-63.fc37 -> 1:2.06-72.fc37
grub2-pc-modules 1:2.06-63.fc37 -> 1:2.06-72.fc37
grub2-tools 1:2.06-63.fc37 -> 1:2.06-72.fc37
grub2-tools-minimal 1:2.06-63.fc37 -> 1:2.06-72.fc37
kernel 6.0.15-300.fc37 -> 6.0.18-300.fc37
kernel-core 6.0.15-300.fc37 -> 6.0.18-300.fc37
kernel-modules 6.0.15-300.fc37 -> 6.0.18-300.fc37
libcurl-minimal 7.85.0-4.fc37 -> 7.85.0-5.fc37
libgpg-error 1.45-2.fc37 -> 1.46-1.fc37
libgusb 0.4.2-1.fc37 -> 0.4.3-1.fc37
libksba 1.6.2-1.fc37 -> 1.6.3-1.fc37
libpcap 14:1.10.1-4.fc37 -> 14:1.10.2-1.fc37
libpwquality 1.4.4-11.fc37 -> 1.4.5-1.fc37
libsmbclient 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37
libwbclient 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37
moby-engine 20.10.20-1.fc37 -> 20.10.21-1.fc37
ncurses 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37
ncurses-base 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37
ncurses-libs 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37
net-tools 2.0-0.63.20160912git.fc37 -> 2.0-0.64.20160912git.fc37
rpm-ostree 2022.16-2.fc37 -> 2022.19-2.fc37
rpm-ostree-libs 2022.16-2.fc37 -> 2022.19-2.fc37
samba-client-libs 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37
samba-common 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37
samba-common-libs 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37
selinux-policy 37.16-1.fc37 -> 37.17-1.fc37
selinux-policy-targeted 37.16-1.fc37 -> 37.17-1.fc37
tpm2-tss 3.2.0-3.fc37 -> 3.2.1-1.fc37
Removed:
cracklib-dicts-2.9.7-30.fc37.x86_64
--
Matthew Booth
Due to the ongoing South African energy crisis
<https://en.wikipedia.org/wiki/South_African_energy_crisis> our datacenter
experienced sudden power loss. We are running ceph 17.2.5 deployed with
cephadm. Two of our OSDs did not start correctly, with the error:
# ceph-bluestore-tool fsck --path
/var/lib/ceph/ed7b2c16-b053-45e2-a1fe-bf3474f90508/osd.27/
2023-01-15T08:38:04.289+0200 7f2a2a03c540 -1
bluestore::NCB::__restore_allocator::No Valid allocation info on disk
(empty file)
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time
2023-01-15T08:39:31.304968+0200
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED
ceph_assert(collection_ref)
2023-01-15T08:39:31.298+0200 7f2a2a03c540 -1
bluestore::NCB::read_allocation_from_onodes::stray object
2#55:ffffffff:::2000055f327.00002287:head# not owned by any collection
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14f) [0x7f2a2acc07c6]
2: /usr/lib/ceph/libceph-common.so.2(+0x27c9d8) [0x7f2a2acc09d8]
3: (BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)+0xa24) [0x560d6baf5754]
4: (BlueStore::reconstruct_allocations(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)+0x5f) [0x560d6baf66ff]
5: (BlueStore::read_allocation_from_drive_on_startup()+0x99)
[0x560d6baf68b9]
6: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long,
std::less<unsigned long>, std::allocator<std::pair<unsigned long const,
unsigned long> > >*)+0xaca) [0x560d6bb0c15a]
7: (BlueStore::_open_db_and_around(bool, bool)+0x35c) [0x560d6bb380dc]
8: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x250) [0x560d6bb3a8c0]
9: main()
10: __libc_start_main()
11: _start()
*** Caught signal (Aborted) **
in thread 7f2a2a03c540 thread_name:ceph-bluestore-
2023-01-15T08:39:31.306+0200 7f2a2a03c540 -1
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::read_allocation_from_onodes(SimpleBitmap*,
BlueStore::read_alloc_stats_t&)' thread 7f2a2a03c540 time
2023-01-15T08:39:31.304968+0200
/build/ceph-17.2.5/src/os/bluestore/BlueStore.cc: 18968: FAILED
ceph_assert(collection_ref)
(complete log
https://gist.github.com/pvanheus/5c57455cacdc91afc9ce27fd489cae25)
Is there a way to recover from this? Or should I accept the OSDs as lost
and rebuild them?
Thanks,
Peter
Dear Ceph users,
my cluster is build with old hardware on a gigabit network, so I often
experience warnings like OSD_SLOW_PING_TIME_BACK. These in turn triggers
alert mails too often, forcing me to disable alerts which is not
sustainable. So my question is: is it possible to tell Ceph to ignore
(or at least to not send alerts for) a given class of warnings?
Thank you,
Nicola
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not replacing the
old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have
direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm "
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the disk
not being used as OSD 232?
Best
Ken