Hi All,
In reference to this page from the Ceph documentation:
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
that page it says that you can run the following commands:
~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~
This will allow `client.x` to access both `dir1` and `dir2`.
So, having a use case where we need to do this, we are, HOWEVER, getting
the following error on running the 2nd command on a Reef 18.2.2 cluster:
`Error EINVAL: client.x already has fs capabilities that differ from
those supplied. To generate a new auth key for client.x, first remove
client.x from configuration files, execute 'ceph auth rm client.x', then
execute this command again.`
Something we're doing wrong, or is the doco "out of date" (mind you,
that's from the "latest" version of the doco, and the "reef" version),
or is something else going on?
Thanks in advance for the help
Cheers
Dulux-Oz
Dear all
We have an HDD ceph cluster that could do with some more IOPS. One
solution we are considering is installing NVMe SSDs into the storage
nodes and using them as WAL- and/or DB devices for the Bluestore OSDs.
However, we have some questions about this and are looking for some
guidance and advice.
The first one is about the expected benefits. Before we undergo the
efforts involved in the transition, we are wondering if it is even worth
it. How much of a performance boost one can expect when adding NVMe SSDs
for WAL-devices to an HDD cluster? Plus, how much faster than that does
it get with the DB also being on SSD. Are there rule-of-thumb number of
that? Or maybe someone has done benchmarks in the past?
The second question is of more practical nature. Are there any
best-practices on how to implement this? I was thinking we won't do one
SSD per HDD - surely an NVMe SSD is plenty fast to handle the traffic
from multiple OSDs. But what is a good ratio? Do I have one NVMe SSD per
4 HDDs? Per 6 or even 8? Also, how should I chop-up the SSD, using
partitions or using LVM? Last but not least, if I have one SSD handle
WAL and DB for multiple OSDs, losing that SSD means losing multiple
OSDs. How do people deal with this risk? Is it generally deemed
acceptable or is this something people tend to mitigate and if so how?
Do I run multiple SSDs in RAID?
I do realize that for some of these, there might not be the one perfect
answer that fits all use cases. I am looking for best practices and in
general just trying to avoid any obvious mistakes.
Any advice is much appreciated.
Sincerely
Niklaus Hofer
--
stepping stone AG
Wasserwerkgasse 7
CH-3011 Bern
Telefon: +41 31 332 53 63
www.stepping-stone.ch
niklaus.hofer(a)stepping-stone.ch
Hi,
as the documentation sends mixed signals in
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv…
"Note
Binding to IPv4 is enabled by default, so if you just add the option to
bind to IPv6 you’ll actually put yourself into dual stack mode."
and
https://docs.ceph.com/en/latest/rados/configuration/msgr2/#address-formats
"Note
The ability to bind to multiple ports has paved the way for dual-stack
IPv4 and IPv6 support. That said, dual-stack operation is not yet
supported as of Quincy v17.2.0."
just the quick questions:
Is a dual stacked networking with IPv4 and IPv6 now supported or not?
From which version on is it considered stable?
Are OSDs now able to register themselves with two IP addresses in the
cluster map? MONs too?
Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
Hi,
I have problem to answer to this question:
Why CEPH is better than other storage solutions?
I know this high level texts about
- scalability,
- flexibility,
- distributed,
- cost-Effectiveness
What convince me, but could be received also against, is ceph as a product has everything what I need it mean:
block storage (RBD),
file storage (CephFS),
object storage (S3, Swift)
and "plugins" to run NFS, NVMe over Fabric, NFS on object storage.
Also many other features which are usually sold as a option (mirroring, geo replication, etc) in paid solutions.
I have problem to write it done piece by piece.
I want convince my managers we are going in good direction.
Why not something from robin.io or purestorage, netapp, dell/EMC. From opensource longhorn or openEBS.
If you have ideas please write it.
Thanks,
S.
Hi All,
We have a somewhat serious situation where we have a cephfs filesystem
(18.2.1), and 2 active MDSs (one standby). ThI tried to restart one of
the active daemons to unstick a bunch of blocked requests, and the
standby went into 'replay' for a very long time, then RAM on that MDS
server filled up, and it just stayed there for a while then eventually
appeared to give up and switched to the standby, but the cycle started
again. So I restarted that MDS, and now I'm in a situation where I see
this:
# ceph fs status
slugfs - 29 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 replay slugfs.pr-md-01.xdtppo 3958k 57.1k 12.2k 0
1 resolve slugfs.pr-md-02.sbblqq 0 3 1 0
POOL TYPE USED AVAIL
cephfs_metadata metadata 997G 2948G
cephfs_md_and_data data 0 87.6T
cephfs_data data 773T 175T
STANDBY MDS
slugfs.pr-md-03.mclckv
MDS version: ceph version 18.2.1
(7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
It just stays there indefinitely. All my clients are hung. I tried
restarting all MDS daemons and they just went back to this state after
coming back up.
Is there any way I can somehow escape this state of indefinite
replay/resolve?
Thanks so much! I'm kinda nervous since none of my clients have
filesystem access at the moment...
cheers,
erich
Hello,
We are tracking PR #56805:
https://github.com/ceph/ceph/pull/56805
And the resolution of this item would potentially fix a pervasive and
ongoing issue that needs daily attention in our cephfs cluster. I was
wondering if it would be included in 18.2.3 which I *think* should be
released soon? Is there any way of knowing if that is true?
Thanks again,
erich
Hi,
We have recently upgraded one of our clusters from Quincy 17.2.6 to Reef 18.2.1, since then we have had 3 instances of our RGWs stop processing requests. We have 3 hosts that run a single instance of RGW on each, and all 3 just seem to stop processing requests at the same time causing our storage to become unavailable. A restart or redeploy of the RGW service brings them back ok. The cluster was deployed using ceph ansible, but since we have adopted it to cephadm which is how the upgrade was performed.
We have enabled debug logging as there was nothing out of the ordinary in normal logs and are currently sifting through them from the last crash.
We are just wondering if it possible to run Quincy RGWs instead of Reef as we didn't have this issue prior to the upgrade?
We have 3 clusters in a multisite setup, we are holding off on upgrading the other 2 clusters due to this issue.
Thanks
Iain
Iain Stott
OpenStack Engineer
Iain.Stott(a)thg.com
[THG Ingenuity Logo]<https://www.thg.com>
www.thg.com<https://www.thg.com/>
[LinkedIn]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk> [Instagram] <https://www.instagram.com/thg> [X] <https://twitter.com/thgplc?lang=en>