In the Nautilus manual it recommends >= 4.14 kernel for multiple active
MDSes. What are the potential issues for running the 4.4 kernel with
multiple MDSes? We are in the process of upgrading the clients, but at
times overrun the capacity of a single MDS server.
MULTIPLE ACTIVE METADATA SERVERS
<https://docs.ceph.com/docs/nautilus/cephfs/kernel-features/#multiple-active…>
The feature has been supported since the Luminous release. It is
recommended to use Linux kernel clients >= 4.14 when there are multiple
active MDS.
Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Hi Manuel,
My replica is 2, hence about 10TB of unaccounted usage.
Andrei
----- Original Message -----
> From: "EDH - Manuel Rios" <mriosfer(a)easydatahost.com>
> To: "Andrei Mikhailovsky" <andrei(a)arhont.com>
> Sent: Tuesday, 28 April, 2020 23:57:20
> Subject: RE: rados buckets copy
> Is your replica x3? 9x3 27... plus some overhead rounded....
>
> Ceph df show including replicas , bucket stats just bucket usage no replicas.
>
> -----Mensaje original-----
> De: Andrei Mikhailovsky <andrei(a)arhont.com>
> Enviado el: miércoles, 29 de abril de 2020 0:55
> Para: ceph-users <ceph-users(a)ceph.io>
> Asunto: [ceph-users] rados buckets copy
>
> Hello,
>
> I have a problem with radosgw service where the actual disk usage (ceph df shows
> 28TB usage) is way more than reported by the radosgw-admin bucket stats (9TB
> usage). I have tried to get to the end of the problem, but no one seems to be
> able to help. As a last resort I will attempt to copy the buckets, rename them
> and remove the old buckets.
>
> What is the best way of doing this (probably on a high level) so that the copy
> process doesn't carry on the wasted space to the new buckets?
>
> Cheers
>
> Andrei
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
> ceph-users-leave(a)ceph.io
Dear all,
Two days ago I added very few disks to a ceph cluster and run into a problem I have never seen before when doing that. The entire cluster was deployed with mimic 13.2.2 and recently upgraded to 13.2.8. This is the first time I added OSDs under 13.2.8.
I had a few hosts that I needed to add 1 or 2 OSDs to and I started with one that needed 1. Procedure was as usual:
ceph osd set norebalance
deploy additional OSD
The OSD came up and PGs started peering, so far so good. To my surprise, however, I started seeing health-warnings about slow ping times:
Long heartbeat ping times on back interface seen, longest is 1171.910 msec
Long heartbeat ping times on front interface seen, longest is 1180.764 msec
After peering it looked like it got better and I waited it out until the messages were gone. This took a really long time, at least 5-10 minutes.
I went on to the next host and deployed 2 new OSDs this time. Same as above, but with much worse consequences. Apparently, the ping times exceeded a timeout for a very short moment and an OSD was marked out for ca. 2 seconds. Now all hell broke loose. I got health errors with the dreaded "backfill_toofull", undersized PGs and a large amount of degraded objects. I don't know what is causing what, but I ended up with data loss by just adding 2 disks.
We have dedicated network hardware and each of the OSD hosts has 20GBit front and 40GBit back network capacity (LACP trunking). There are currently no more than 16 disks per server. The disks were added to an SSD pool. There was no traffic nor any other exceptional load on the system. I have ganglia resource monitoring on all nodes and cannot see a single curve going up. Network, CPU utilisation, load, everything below measurement accuracy. The hosts and network are quite overpowered and dimensioned to host many more OSDs (in future expansions).
I have three questions, ordered by how urgently I need an answer:
1) I need to add more disks next week and need a workaround. Will something like this help avoiding the heartbeat time-out:
ceph osd set noout
ceph osd set nodown
ceph osd set norebalance
2) The "lost" shards of the degraded objects were obviously still on the cluster somewhere. Is there any way to force the cluster to rescan OSDs for the shards that went orphan during the incident?
3) This smells a bit like a bug that requires attention. I was probably just lucky that I only lost 1 shard per PG. Has something similar reported before? Is this fixed in 13.2.10? Is it something new? Any settings that need to be looked at? If logs need to be collected, I can do so during my next attempt. However, I cannot risk data integrity of a production cluster and, therefore, probably not run the original procedure again.
Many thanks for your help and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi all,
I am trying to setup an active-active NFS Ganesha cluster (with two Ganeshas (v3.0) running in Docker containers). I could manage to get two Ganesha daemons running using the rados_cluster backend for active-active deployment. I have the grace db within the cephfs metadata pool in an own namespace which keeps track on the node status.
Now, I can mount the exposed filesystem over NFS (v4.1, v4.2) with both daemons. So far so good. __
Testing high availability resulted in an unexpected behavior for that I am not sure whether it is intentional or whether it is a configuration problem.
Problem:
If both are running, no E or N flags are set within the grace db, as I expect. Once, one host goes down (or is taken down) ALL clients cannot read nor write to the mounted filesystem, even the clients which are not connected to dead ganesha. In the db, I see that the dead ganesha has state NE and the active has E. This state is what I expect from the Ganesha documentation. Nevertheless, I would assume that the clients connected to the active daemon are not blocked. This state is not cleaned up by itself (e.g. after the grace period).
I can unlock this situation by 'lifting' the dead node with a direct db call (using ganesha-rados-grace tool). But within an active-active deployment this is not suitable.
The ganesha config looks like:
------------
NFS_CORE_PARAM
{
Enable_NLM = false;
Protocols = 4;
}
NFSv4
{
RecoveryBackend = rados_cluster;
Minor_Versions = 1,2;
}
RADOS_KV
{
pool = "cephfsmetadata";
nodeid = "a" ;
namespace = "grace";
UserId = "ganesha";
Ceph_Conf = "/etc/ceph/ceph.conf";
}
MDCACHE {
Dir_Chunk = 0;
NParts = 1;
Cache_Size = 1;
}
EXPORT
{
Export_ID=101;
Protocols = 4;
Transports = TCP;
Path = PATH;
Pseudo = PSEUDO_PATH;
Access_Type = RW;
Attr_Expiration_Time = 0;
Squash = no_root_squash;
FSAL {
Name = CEPH;
User_Id = "ganesha";
Secret_Access_Key = CEPHXKEY;
}
}
LOG {
Default_Log_Level = "FULL_DEBUG";
}
------------
Does anyone have similar problems? Or if this behavior is by purpose, can you explain to me why this is the case?
Thank you in advance for your time and thoughts.
Kind regards,
Michael
Hi,
I just deployed a new cluster with cephadm instead of ceph-deploy. In tyhe past, If i change ceph.conf for tweaking, i was able to copy them and apply to all servers. But i cannot find this on new cephadm tool.
I did few changes on ceph.conf but ceph is unaware of those changes. How can i apply them? I've used it with docker.
Thanks,
Gencer.
Hi Eric,
Would it be possible to use it with an older cluster version (like
running new radosgw-admin in the container, connecting to the cluster
on 14.2.X)?
Kind regards / Pozdrawiam,
Katarzyna Myrek
czw., 16 kwi 2020 o 19:58 EDH - Manuel Rios
<mriosfer(a)easydatahost.com> napisał(a):
>
> Hi Eric,
>
>
>
> Are there any ETA for get those script backported maybe in 14.2.10?
>
>
>
> Regards
>
> Manuel
>
>
>
>
>
> De: Eric Ivancich <ivancich(a)redhat.com>
> Enviado el: jueves, 16 de abril de 2020 19:05
> Para: Katarzyna Myrek <katarzyna(a)myrek.pl>; EDH - Manuel Rios <mriosfer(a)easydatahost.com>
> CC: ceph-users(a)ceph.io
> Asunto: Re: [ceph-users] RGW and the orphans
>
>
>
> There is currently a PR for an “orphans list” capability. I’m currently working on the testing side to make sure it’s part of our teuthology suite.
>
>
>
> See: https://github.com/ceph/ceph/pull/34148
>
>
>
> Eric
>
>
>
>
>
> On Apr 16, 2020, at 9:26 AM, Katarzyna Myrek <katarzyna(a)myrek.pl> wrote:
>
>
>
> Hi
>
> Thanks for the quick response.
>
> To be honest my cluster is getting full because of that trash and I am
> at the point where I have to do the removal manually ;/.
>
> Kind regards / Pozdrawiam,
> Katarzyna Myrek
>
> czw., 16 kwi 2020 o 13:09 EDH - Manuel Rios
> <mriosfer(a)easydatahost.com> napisał(a):
>
>
> Hi,
>
> From my experience orphans find didn't work since several releases ago, and command should be re-coded or deprecated because its not running.
>
> Im our cases it loops over generated shards until RGW daemon crash.
>
> Interested into this post, in our case orphans find takes more than 24 hours into start loop over shards, but never pass the shard 0 or 1.
>
> CEPH RGW devs, should provide any workaround script/ new tool or something to maintain our rgw clusters. Because with the last bugs all rgw cluster got a ton of trash, wasting resources and money.
>
> And manual cleaning is not trivial and easy.
>
> Waiting for more info,
>
> Manuel
>
>
> -----Mensaje original-----
> De: Katarzyna Myrek <katarzyna(a)myrek.pl>
> Enviado el: jueves, 16 de abril de 2020 12:38
> Para: ceph-users(a)ceph.io
> Asunto: [ceph-users] RGW and the orphans
>
> Hi
>
> Is there any new way to find and remove orphans from RGW pools on Nautilus? I have found info that "orphans find" is now deprecated?
>
> I can see that I have tons of orphans in one of our clusters. Was wondering how to safely remove them - make sure that they are really orphans.
> Does anyone have a good method for that?
>
> My cluster mostly has orphans from multipart uploads.
>
>
> Kind regards / Pozdrawiam,
> Katarzyna Myrek
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Hi all,
I'm planning to upgrade one on my Ceph Cluster currently on Luminous
12.2.13 / Debian Stretch (updated).
On this cluster, Luminous is packaged from the official Ceph repo (deb
https://download.ceph.com/debian-luminous/ stretch main)
I would like to upgrade it with Debian Buster and Nautilus using the
croit.io repository (deb https://mirror.croit.io/debian-nautilus/ buster
main)
I already prepared the steps procedure but I just want to verify one
step regarding the upgrade of the ceph packages.
Do I have to upgrade ceph in the same time than Debian or do i have to
upgrade ceph after the Debian upgrade from Stretch to Buster ?
1) In the first case :
* Replace stretch by buster in /etc/apt/sources.list
* Modify the ceph.list repo by croit.io one
* Upgrade the entire nodes
2) In the second case (upgrade Debian then Ceph)
* Replace stretch by buster in /etc/apt/sources.list
* keep the /etc/apt/sources.list.d/ceph.list as it is
* Upgrade and reboot the nodes
* replace the ceph.list file by croit.io
* upgrade the ceph packages
* restarting the Ceph services (in the right order MON -> MGR -> OSD
-> MDS)
Thanks a lot for your advices
Regards,
Hervé
I try to install ceph octopus rpms and some dependent packages still pull
in from untrusted sources
Total
8.2 MB/s | 70 MB 00:08
warning: /var/cache/dnf/copr:copr.fedorainfracloud.org:ktdreyer:ceph-el8-0161c520b1c9fbf7/packages/python3-cheroot-8.2.1-1.el8.noarch.rpm:
Header V3 RSA/SHA1 Signature, key ID 2a8a41ec: NOKEY
Copr repo for ceph-el8 owned by ktdreyer
2.8 kB/s | 1.0 kB 00:00
Importing GPG key 0x2A8A41EC:
Userid : "ktdreyer_ceph-el8 (None) <ktdreyer#
ceph-el8(a)copr.fedorahosted.org>"
Fingerprint: 64D3 346F FD63 5D2D B5D0 5873 1014 BDBE 2A8A 41EC
From :
https://download.copr.fedorainfracloud.org/results/ktdreyer/ceph-el8/pubkey…
Is this ok [y/N]:
I'm sorry. I can't accept this or run Ceph Octopus until all dependencies
come from the ceph repo or the epel repo and built/signed correctly.
This isn't our first rodeo so why the change in the release process?
Thanks!
Hi everybody (again),
We recently had a lot of osd crashs (more than 30 osd crashed). This is
now fixed, but it triggered a huge rebalancing+recovery.
More or less in the same time, we noticed that the ceph crash ls (or
whatever other ceph crash command) hangs forever and never returns.
And finally, the recovery process stops regularly (after ~1 hour) but it
can be restarted by reseting the mgr daemon (systemctl restart
ceph-mgr.target on the active manager).
There is nothing in the logs (the manager still works, the service is
up, the dashboard is accessible but simply the recovery stops).
We also tryed to reboot the managers, but it doesn't solve the problem.
I guess theses two problems should be linked, but not sure.
Does anybody have a clue ?
Thanks.
F.