I guess as a sort of follow up from my previous post. Our Nautilus (14.2.16 on ubuntu 18.04) cluster had some sort of event that caused many of the machines to have memory errors. The aftermath is that initially some OSDs had (and continue to have) this error https://tracker.ceph.com/issues/48827 others won't start for various reasons.
The OSDs that *will* start are badly behind the current epoch for the most part.
It sounds very similar to this:
We are having trouble getting things back online.
I think the path forward is to:
-set noup/nodown/noout/nobackfill/and wait for the OSDs that run to come up; we were making good progress yesterday until some of the OSDs crashed with OOM errors. We are again moving forward but understandably nervous.
-export the PGs from questionable OSDs and and then rebuild the OSDs; import the PGs if necessary (very likely). Repeat until we are up.
Any suggestions for increasing speed? We are using noup/nobackfill/norebalance/pause but the epoch catchup is taking a very long time. Any tips for keeping the epoch from moving forward or speeding up the OSDs catching up? How can we estimate how long it should take?
Thank you for any ideas or assistance anyone can provide.
I have a CEPH 15.2.4 running in a docker. How to configure for use a
specific data pool? i try put the follow line in the ceph.conf but the
changes not working. .
rbd default data pool = Mydatapool
I need it to configure for erasure pool with cloudstack
Can help me? , where is the ceph conf we i need configure?
Thanks for the reply.
cephadm runs ceph containers automatically. How to set privileged mode
in ceph container?
> El 23/9/20 a las 13:24, Daniel Gryniewicz escribió:
>> NFSv3 needs privileges to connect to the portmapper. Try running
>> your docker container in privileged mode, and see if that helps.
>> On 9/23/20 11:42 AM, Gabriel Medve wrote:
>>> I have a CEPH 15.2.5 running in a docker , i configure nfs ganesha
>>> with nfs version 3 but i can not mount it.
>>> If configure ganesha with nfs version 4 i can mounted without
>>> problems but i need the version 3 .
>>> The error is mount.nfs: Protocol not supported
>>> Can help me?
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> Untitled Document
Is it possible to disable checking on 'x pool(s) have no replicas
configured', so I don't have this HEALTH_WARN constantly.
Or is there some other disadvantage of keeping some empty 1x replication
I'm facing a weird issue with one of my CEPH clusters:
OS: CentOS - 8.2.2004 (Core)
CEPH: Nautilus 14.2.11 - stable
RBD using erasure code profile (K=3; m=2)
When I want to format one of my RBD image (client side) I've got the
following kernel messages multiple time with different sector IDs:
*[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
result -1 *
At first I thought about a faulty disk BUT the monitoring system is not
showing anything faulty so I decided to run manual tests on all my OSDs to
look at disk health using smartctl etc.
None of them is marked as not healthy and actually they don't get any
counter with faulty sectors/read or writes and the Wear Level is 99%
So, the only particularity of this image is it is a 80Tb image, but it
shouldn't be an issue as we already have that kind of image size used on
If anyone have a clue at how I could sort this out, I'll be more than happy
I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow.
Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity.
This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal.
The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below.
Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way?
Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions.
For example, yesterday the client session count per MON was:
ceph-01: 1339 (leader)
ceph-02: 189 (peon)
ceph-03: 839 (peon)
I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem.
Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).
Thanks for any clues!
AIT Risø Campus
Bygning 109, rum S14
Hello fellow CEPH-users,
currently we are updating our CEPH(14.2.16) and making changes to some
TLDR: is there a way to make a graceful MDS active node shutdown without
loosing the caps, open files and client connections? Something like
handover active state, promote standby to active, ...?
Sadly we run into some difficulties when restarting MDS Nodes. While we
had two active nodes and one standby we initially though that this would
have a nice handover when restarting the active rank ... sadly we saw
how the node was going through the states:
replay-reconnect-rejoin-active as nicely visualized here
This left some nodes going into timeouts until the standby node has gone
into the active state again, most probably since the cephfs hast already
some 600k folders and 3M files and from the client side it took more
So before the next MDS the FS config where changed to one active and one
standby-replay node, the idea was that since the MDS replay nodes
follows the active one the handover would be smoother. The active state
was reached faster, but we still noticed some hiccups on the clients
while the new active MDS was waiting for clients to reconnect(state
up:reconnect) after the failover.
The next idea was to do a manual node promotion, graceful shutdown or
something similar - where the open caps and sessions would be handed
over ... but I did not find any hint in the docs regarding this
But, this should somehow be possible (imho), since when adding a second
active mds node (max_mds 2) and then removing it again (max_mds 1) the
rank 1 node goes to stopping-state and hands over all clients/caps to
rank 0 without interruptions for the clients.
Therefore my question: how can one gracefully shutdown an active rank 0
mds node or promote an standby node to the active state without loosing
open files/caps or client sessions?
Thanks in advance,
I respond to the list, as it may help others.
I also reorder the response.
> On Mon, Jan 18, 2021 at 2:41 PM Gilles Mocellin <
> gilles.mocellin(a)nuagelibre.org> wrote:
> > Hello Cephers,
> > On a new cluster, I only have 2 RBD block images, and the Dashboard
> > doesn't manage to list them correctly.
> > I have this message :
> > Warning
> > Displaying previously cached data for pool veeam-repos.
> > Sometime it disappears, but as soon as I reload or return to the listing
> > page, it's there.
> > What I've seen, is a high CPU load due to ceph-mgr on the active
> > manager.
> > And also stack-traces like this :
> > dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to
> > retrieve data
> > I also see that, when I try to edit an image :
> > 2021-01-18T11:13:26.383+0100 7f00199ca700 0 [dashboard ERROR
> > frontend.error]
> > (https://fidcl-mrs4-sto-sds.fidcl.cloud:8443/#/block/rbd/edit/veeam-> > repos%252Fveeam-repo2-vol1
> > <https://fidcl-mrs4-sto-sds.fidcl.cloud:8443/#/block/rbd/edit/veeam-repos%
> > 252Fveeam-repo2-vol1>): Cannot read property 'features_name' of
> > undefined
> > TypeError: Cannot read property 'features_name' of undefined
> > But that's perhaps just becaus I open an Edit window on the image and it
> > does not have the datas.
> > The Edit window is empty, and I can't edit things, especially, I wan't
> > to resize the image.
> > --
> > Gilles
Le jeudi 21 janvier 2021, 21:56:58 CET Ernesto Puerta a écrit :
> Hey Gilles,
> If I'm not wrong, that exception (ViewCacheNoDataException) happens when
> the dashboard is unable to gather all required data from Ceph within a
> defined timeout (5 secs I think, since the UI refreshes the data every ~5
> It'd be great if you could provide the steps to reproduce it and some
> insights into your environment (number of RBD pools, number of RBD images,
> snapshots, etc.).
> Kind Regards,
As it is now, it always hapens, on the image listing, I have the Warning and
the list is not always up to date, if I create an image, I must wait very long
to see it.
Also, I can not edit the 2 big images I have. Perhaps the size is important,
they are 2 images of 40 TB.
If I create a 1 GB test image, I can edit and resize it.
But impossible withe the big image, the windows opens but all the fields are
Also, if it can matter, the images use a data pool (EC 3+2).
I have 2 pools, a replicated one for metadatas veeam-repos (replic x3), and a
data pool veeam-repos.data (EC 3+2).
My cluster has 6 nodes with AMD 16 cores CPU, 128 GB RAM, 10 8 TB HDD.
So 60 OSD. Soon doubling everything to 12 nodes.
Usage, as the pool and image names can tell, is to mount RBD image as a XFS
filesystem for a Veeam Backup Repository (krbd, because nbd-rbd tailed
regularly, especially during fstrim).
We are testing our S3 Ceph endpoints and we are not satisfied with its
speed. Our results are something between around 120 - 150 MB/s depending
on small/bigger files. This is good for 1Gbps connection, but not for
10GE or more.
We've tried the most recent versions of the AWS CLI, s3cmd, s4cmd, s3fs
... programs. Of course we are using multipart upload/download which is
precondition for parallel upload/download. Also we tried multi-thread
(25 or more threads) transfer in s4cmd but still we don't get proper
For proof of concept that high speed can be achieved we have written
small script in bash which uses multi-part & parallel transfer and can
saturate at least 10GE without problem.
I would like to ask you, if you know proper program and its parameters,
so we can saturate n x 10GE if needed?
We are using the latest nautilus.
S3 gateways have much more computer power and bandwidth to internet then
it is used right now.