I have a cephfs secondary (non-root) data pool with unfound and degraded
objects that I have not been able to recover[1]. I created an
additional data pool and used "setfattr -n ceph.dir.layout.pool' and a
very long rsync to move the files off of the degraded pool and onto the
new pool. This has completed, and using find + 'getfattr -n
ceph.file.layout.pool', I verified that no files are using the old pool
anymore. No ceph.dir.layout.pool attributes point to the old pool either.
However, the old pool still reports that there are objects in the old
pool, likely the same ones that were unfound/degraded from before:
https://pastebin.com/qzVA7eZr
Based on a old message from the mailing list[2], I checked the MDS for
stray objects (ceph daemon mds.ceph4 dump cache file.txt ; grep -i stray
file.txt) and found 36 stray entries in the cache:
https://pastebin.com/MHkpw3DV. However, I'm not certain how to map
these stray cache objects to clients that may be accessing them.
'rados -p fs.data.archive.frames ls' shows 145 objects. Looking at the
parent of each object shows 2 strays:
for obj in $(cat rados.ls.txt) ; do echo $obj ; rados -p
fs.data.archive.frames getxattr $obj parent | strings ; done
[...]
10000020fa1.00000000
10000020fa1
stray6
10000020fbc.00000000
10000020fbc
stray6
[...]
...before getting stuck on one object for over 5 minutes (then I gave up):
1000005b1af.00000083
What can I do to make sure this pool is ready to be safely deleted from
cephfs (ceph fs rm_data_pool archive fs.data.archive.frames)?
--Mike
[1]https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QHFOGEKXK7VDNNSKR74BA6IIMGGIXBXA/#7YQ6SSTESM5LTFVLQK3FSYFW5FDXJ5CF
[2]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005233.h…
Hi,
We see that we have 5 'remapped' PGs, but are unclear why/what to do about
it. We shifted some target ratios for the autobalancer and it resulted in
this state. When adjusting ratio, we noticed two OSDs go down, but we just
restarted the container for those OSDs with podman, and they came back up.
Here's status output:
###################
root@ceph01:~# ceph status
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
cluster:
id: 41bb9256-c3bf-11ea-85b9-9e07b0435492
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w)
mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl,
ceph02.gcglcg, ceph04.smbdew, ceph05.yropto
osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs
data:
pools: 3 pools, 1057 pgs
objects: 18.00M objects, 69 TiB
usage: 119 TiB used, 2.0 PiB / 2.1 PiB avail
pgs: 1056 active+clean
1 active+clean+scrubbing+deep
io:
client: 859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr
root@ceph01:~#
###################
When I look at ceph pg dump, I don't see any marked as remapped:
###################
root@ceph01:~# ceph pg dump |grep remapped
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
dumped all
root@ceph01:~#
###################
Any idea what might be going on/how to recover? All OSDs are up. Health is
'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on Podman
2.0.3.
I guess as a sort of follow up from my previous post. Our Nautilus (14.2.16 on ubuntu 18.04) cluster had some sort of event that caused many of the machines to have memory errors. The aftermath is that initially some OSDs had (and continue to have) this error https://tracker.ceph.com/issues/48827 others won't start for various reasons.
The OSDs that *will* start are badly behind the current epoch for the most part.
It sounds very similar to this:
https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-ha…
We are having trouble getting things back online.
I think the path forward is to:
-set noup/nodown/noout/nobackfill/and wait for the OSDs that run to come up; we were making good progress yesterday until some of the OSDs crashed with OOM errors. We are again moving forward but understandably nervous.
-export the PGs from questionable OSDs and and then rebuild the OSDs; import the PGs if necessary (very likely). Repeat until we are up.
Any suggestions for increasing speed? We are using noup/nobackfill/norebalance/pause but the epoch catchup is taking a very long time. Any tips for keeping the epoch from moving forward or speeding up the OSDs catching up? How can we estimate how long it should take?
Thank you for any ideas or assistance anyone can provide.
Will
Hi,
I have a CEPH 15.2.4 running in a docker. How to configure for use a
specific data pool? i try put the follow line in the ceph.conf but the
changes not working. .
[client.myclient]
rbd default data pool = Mydatapool
I need it to configure for erasure pool with cloudstack
Can help me? , where is the ceph conf we i need configure?
Thanks.
--
Untitled Document
Hi
Thanks for the reply.
cephadm runs ceph containers automatically. How to set privileged mode
in ceph container?
--
> El 23/9/20 a las 13:24, Daniel Gryniewicz escribió:
>> NFSv3 needs privileges to connect to the portmapper. Try running
>> your docker container in privileged mode, and see if that helps.
>>
>> Daniel
>>
>> On 9/23/20 11:42 AM, Gabriel Medve wrote:
>>> Hi,
>>>
>>> I have a CEPH 15.2.5 running in a docker , i configure nfs ganesha
>>> with nfs version 3 but i can not mount it.
>>> If configure ganesha with nfs version 4 i can mounted without
>>> problems but i need the version 3 .
>>>
>>> The error is mount.nfs: Protocol not supported
>>>
>>> Can help me?
>>>
>>> Thanks.
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> --
> Untitled Document
Is it possible to disable checking on 'x pool(s) have no replicas
configured', so I don't have this HEALTH_WARN constantly.
Or is there some other disadvantage of keeping some empty 1x replication
test pools?
Hi everyone!
I'm facing a weird issue with one of my CEPH clusters:
OS: CentOS - 8.2.2004 (Core)
CEPH: Nautilus 14.2.11 - stable
RBD using erasure code profile (K=3; m=2)
When I want to format one of my RBD image (client side) I've got the
following kernel messages multiple time with different sector IDs:
*[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
result -1 *
At first I thought about a faulty disk BUT the monitoring system is not
showing anything faulty so I decided to run manual tests on all my OSDs to
look at disk health using smartctl etc.
None of them is marked as not healthy and actually they don't get any
counter with faulty sectors/read or writes and the Wear Level is 99%
So, the only particularity of this image is it is a 80Tb image, but it
shouldn't be an issue as we already have that kind of image size used on
another pool.
If anyone have a clue at how I could sort this out, I'll be more than happy
^^
Kind regards!
Dear cephers,
I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow.
Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity.
This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal.
The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below.
Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way?
Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions.
For example, yesterday the client session count per MON was:
ceph-01: 1339 (leader)
ceph-02: 189 (peon)
ceph-03: 839 (peon)
I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem.
Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).
Thanks for any clues!
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hello fellow CEPH-users,
currently we are updating our CEPH(14.2.16) and making changes to some
config settings.
TLDR: is there a way to make a graceful MDS active node shutdown without
loosing the caps, open files and client connections? Something like
handover active state, promote standby to active, ...?
Sadly we run into some difficulties when restarting MDS Nodes. While we
had two active nodes and one standby we initially though that this would
have a nice handover when restarting the active rank ... sadly we saw
how the node was going through the states:
replay-reconnect-rejoin-active as nicely visualized here
https://docs.ceph.com/en/latest/cephfs/mds-states/
This left some nodes going into timeouts until the standby node has gone
into the active state again, most probably since the cephfs hast already
some 600k folders and 3M files and from the client side it took more
than 30s.
So before the next MDS the FS config where changed to one active and one
standby-replay node, the idea was that since the MDS replay nodes
follows the active one the handover would be smoother. The active state
was reached faster, but we still noticed some hiccups on the clients
while the new active MDS was waiting for clients to reconnect(state
up:reconnect) after the failover.
The next idea was to do a manual node promotion, graceful shutdown or
something similar - where the open caps and sessions would be handed
over ... but I did not find any hint in the docs regarding this
functionality.
But, this should somehow be possible (imho), since when adding a second
active mds node (max_mds 2) and then removing it again (max_mds 1) the
rank 1 node goes to stopping-state and hands over all clients/caps to
rank 0 without interruptions for the clients.
Therefore my question: how can one gracefully shutdown an active rank 0
mds node or promote an standby node to the active state without loosing
open files/caps or client sessions?
Thanks in advance,
M