Hi,
Today while debugging something we had a few questions that might lead
to improving the cephfs forward scrub docs:
https://docs.ceph.com/en/latest/cephfs/scrub/
tldr:
1. Should we document which sorts of issues that the forward scrub is
able to fix?
2. Can we make it more visible (in docs) that scrubbing is not
supported with multi-mds?
3. Isn't the new `ceph -s` scrub task status misleading with multi-mds?
Details here:
1) We found a CephFS directory with a number of zero sized files:
# ls -l
...
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:58
upload_fc501199e3e7abe6b574101cf34aeefb.png
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 12:23
upload_fce4f55348185fefa0abdd8d11095ba8.gif
-rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:54
upload_fd95b8358851f0dac22fb775046a6163.png
...
The user claims that those files were non-zero sized last week. The
sequence of zero sized files includes *all* files written between Nov
2 and 9.
The user claims that his client was running out of memory, but this is
now fixed. So I suspect that his ceph client (kernel
3.10.0-1127.19.1.el7.x86_64) was not behaving well.
Anyway, I noticed that even though the dentries list 0 bytes, the
underlying rados objects have data, and the data looks good. E.g.
# rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx
200212e68b5.00000000
# file 200212e68b5.00000000
200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA,
non-interlaced
So I managed to recover the files doing something like this (using an
input file mapping inode to filename) [see PS 0].
But I'm wondering if a forward scrub is able to fix this sort of
problem directly?
Should we document which sorts of issues that the forward scrub is able to fix?
I anyway tried to scrub it, which led to:
# ceph tell mds.cephflax-mds-xxx scrub start /volumes/_nogroup/xxx
recursive repair
Scrub is not currently supported for multiple active MDS. Please
reduce max_mds to 1 and then scrub.
So ...
2) Shouldn't we update the doc to mention loud and clear that scrub is
not currently supported for multiple active MDS?
3) I was somehow surprised by this, because I had thought that the new
`ceph -s` multi-mds scrub status implied that multi-mds scrubbing was
now working:
task status:
scrub status:
mds.x: idle
mds.y: idle
mds.z: idle
Is it worth reporting this task status for cephfs if we can't even scrub them?
Thanks!!
Dan
[0]
mkdir -p recovered
while read -r a b; do
for i in {0..9}
do
echo "rados stat --cluster=flax --pool=cephfs_data
--namespace=xxx" $(printf "%x" $a).0000000$i "&&" "rados get
--cluster=flax --pool=cephfs_data --namespace=xxx" $(printf "%x"
$a).0000000$i $(printf "%x" $a).0000000$i
done
echo cat $(printf "%x" $a).* ">" $(printf "%x" $a)
echo mv $(printf "%x" $a) recovered/$b
done < inones_fnames.txt
Hi,
We have a problem with a PG that was inconsistent, currently the PG in
our cluster have 3 copies.
It was not possible for us to repair this pg with "ceph pg repair" (This
PG is in osd 14,1,2) so we deleted some of the copies of osd 14 with the
following command.
ceph-objectstore-tool --data-path /var/lib/ceph/osd.14/ --pgid 22.f --op
remove --force
This caused an automatic attempt to create the missing copy entering the
backfilling state, but when doing this it crashed osd 1 and 2 and threw
the IOPS to 0, freezing the cluster.
Is there any way to remove this entire pg or try to recreate the missing
copy or ignore it completely? It causes instability in the cluster.
Thank you, I await comments
--
Untitled Document
------------------------------------------------------------------------
Gabriel I. Medve
Hi,
I caught up with Sage's talk on what to expect in Pacific (
https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention
of ceph-ansible at all.
Is it going to continue to be supported? We use it (and uncontainerised
packages) for all our clusters, so I'd be a bit alarmed if it was going
to go away...
Regards,
Matthew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Hi all:
ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)
I have a strange question, I just create a multiple site for Ceph cluster.
But I notice the old data of source cluster is not synced. Only new data
will be synced into second zone cluster.
Is there anything I need to do to enable full sync for bucket or this is a
bug?
Thanks
Hi all,
this is a follow-up on "reboot breaks OSDs converted from ceph-disk to ceph-volume simple".
I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and "simple activate". Somewhere along the way, the OSDs meta-data gets rigged and the prominent symptom is that the symlink block is changes from a part-uuid target to an unstable device name target like:
before conversion:
block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7
after conversion:
block -> /dev/sdj2
This is a huge problem as the "after conversion" device names are unstable. I have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly re-assigned devices will refuse to start with:
2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248
Please help me with getting out of this mess.
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi!
After upgrading MONs and MGRs successfully, the first OSD host I upgraded on Ubuntu Bionic from 14.2.16 to 15.2.10
shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot.
RocksDB complains "Corruption: unknown WriteBatch tag".
The initial crash/corruption occured when the automatic fsck was ran, and when it committed the changes for a lot of "zombie spanning blobs".
Tracker issue with logs: https://tracker.ceph.com/issues/50017
Anyone else encountered this error? I've "suspended" the upgrade for now :)
-- Jonas
We have a ceph octopus cluster running 15.2.6, its indicating a near full
osd which I can see is not weighted equally with the rest of the osds. I
tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a
little bit, but unlike the nautilus clusters, I see no data movement when
issuing the command. If I run a ceph osd tree, it shows the reweight
setting, but no data movement appears to be occurring.
Is there some new thing in ocotopus I am missing? I looked through the
release notes for .7, .8 and .9 and didn't see any fixes that jumped out as
resolving a bug related to this. The Octopus cluster was deployed using
ceph-ansible and upgraded to 15.2.6. I plan to upgrade to 15.2.9 in the
coming month.
Any thoughts?
Regards,
-Brent
Existing Clusters:
Test: Ocotpus 15.2.5 ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways,
2 iscsi gateways
UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways
Hi
I'm in a bit of a panic :-(
Recently we started attempting to configure a radosgw to our ceph
cluster, which was until now only doing cephfs (and rbd wss working as
well). We were messing about with ceph-ansible, as this was how we
originally installed the cluster. Anyway, it installed nautilus 14.2.18
on the radosgw and I though it would be good to pull up the rest of the
cluster to that level as well using our tried and tested ceph upgrade
script (it basically does an update of all ceph nodes one by one and
checks whether ceph is ok again before doing the next)
After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...
The message deceptively is: HEALTH_WARN Reduced data availability: 5568
pgs inactive
That's all PGs!
I tried as a desperate measure to upgrade one ceph OSD node, but that
broke as well, the osd service on that node gets an interrupt from the
kernel....
the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
"mon": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
},
"mds": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
},
"overall": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
}
}
12 OSDs are down
# ceph -s
cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
Reduced data availability: 5568 pgs inactive
services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
remapped pgs
data:
pools: 12 pools, 5568 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
5568 unknown
progress:
Rebalancing after osd.103 marked in
[..............................]