Hi,
I have a CEPH 15.2.4 running in a docker. How to configure for use a
specific data pool? i try put the follow line in the ceph.conf but the
changes not working. .
[client.myclient]
rbd default data pool = Mydatapool
I need it to configure for erasure pool with cloudstack
Can help me? , where is the ceph conf we i need configure?
Thanks.
--
Untitled Document
Hi
Thanks for the reply.
cephadm runs ceph containers automatically. How to set privileged mode
in ceph container?
--
> El 23/9/20 a las 13:24, Daniel Gryniewicz escribió:
>> NFSv3 needs privileges to connect to the portmapper. Try running
>> your docker container in privileged mode, and see if that helps.
>>
>> Daniel
>>
>> On 9/23/20 11:42 AM, Gabriel Medve wrote:
>>> Hi,
>>>
>>> I have a CEPH 15.2.5 running in a docker , i configure nfs ganesha
>>> with nfs version 3 but i can not mount it.
>>> If configure ganesha with nfs version 4 i can mounted without
>>> problems but i need the version 3 .
>>>
>>> The error is mount.nfs: Protocol not supported
>>>
>>> Can help me?
>>>
>>> Thanks.
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> --
> Untitled Document
Hi all,
I get these log messages all the time, sometimes also directly to the terminal:
kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)
The cluster is healthy and the MDS complaining is actually both, configured and running as a standby-replay daemon. These messages show up at least every hour, but sometimes with much higher frequency. The cluster seems healthy though.
A google search did not bring up anything useful.
Can anyone shed some light on what this message means?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi.
I'm a newbie in CephFS and I have some questions about how per-MDS journals
work.
In Sage's paper (osdi '06), I read that each MDSs has its own journal and
it lazily flushes metadata modifications on OSD cluster.
What I'm wondering is that some directory operations like rename work with
multiple metadata and It may work on two or more MDSs and their journals,
so I think it needs some mechanisms to construct a transaction that works
on multiple journals like some distributed transaction mechanisms.
Could anybody explains how per-MDS journals work in such directory
operations? or recommends some references about it?
Thanks.
kyujin.
Hello,
We have functional ceph swarm with a pair of S3 rgw in front that uses
A.B.C.D domain to be accessed.
Now a new client asks to have access using the domain : E.C.D, but to
already existing buckets. This is not a scenario discussed in the docs.
Apparently, looking at the code and by trying it, rgw does not support
multiple domains for the variable rgw_dns_name.
But reading through parts of the code, I am no dev, and my c++ is 25 years
rusty, I get the impression that maybe we could just add a second pair of
rgw S3 servers that would give service to the same buckets, but using a
different domain.
Am I wrong ? Let's say this works, is this an unconscious behaviour that
the ceph team would remove down the road ?
Is there another solution that I might have missed ? We do not have
multi-zone and there are no plans for it. And Cname (rgw_resolve_cname)
seems to only be of use when using static sites (again, from my poor code
reading abilities).
Thank you
--
**AVERTISSEMENT** : Ce courriel et les pièces qui y sont jointes sont
destinés exclusivement au(x) destinataire(s) mentionné(s) ci-dessus et
peuvent contenir de l’information privilégiée ou confidentielle. Si vous
avez reçu ce courriel par erreur, ou s’il ne vous est pas destiné, veuillez
le mentionner immédiatement à l’expéditeur et effacer ce courriel ainsi que
les pièces jointes, le cas échéant. La copie ou la redistribution non
autorisée de ce courriel peut être illégale. Le contenu de ce courriel ne
peut être interprété qu’en conformité avec les lois et règlements qui
régissent les pouvoirs des diverses instances décisionnelles compétentes de
la Ville de Montréal.
Hi.
I'm a newbie in CephFS and I have some questions about how per-MDS journals work.
In Sage's paper (osdi '06), I read that each MDSs has its own journal and it lazily flushes metadata modifications on OSD cluster.
What I'm wondering is that some directory operations like rename work with multiple metadata and It may work on two or more MDSs and their journals,
so I think it needs some mechanisms to construct a transaction that works on multiple journals like some distributed transaction mechanisms.
Could anybody explains how per-MDS journals work in such directory operations? or recommends some references about it?
Thanks.
kyujin.
Hello -
We're using libRADOS directly for our communication between services. Some
of the features are faster and more featured for our use cases than an S3
gateway.
But we do want to leverage the ES Metadata search.
It appears that the Metadata search is built on the object gateway.
Question is - do files which are written directly to an OSD get replicated
using the gateway, or is it only files which are written through the
gateway that get replicated?
Thanks.
Cary
Bonjour,
TL;DR: Is it more advisable to work on Ceph internals to make it friendly to this particular workload or write something similar to EOS[0] (i.e Rocksdb + Xrootd + RBD)?
This is a followup of two previous mails[1] sent while researching this topic. In a nutshell, the Software Heritage project[1] currently has ~750TB and 10 billions objects, 75% of which have a size smaller than 16KB and 50% have a size smaller than 4KB. But they only account for ~5% of the 750TB: 25% of the objects have a size > 16KB and total ~700TB. The objects can be compressed by ~50% and 750TB only needs 350TB of actual storage. (if you're interested in the details see [2]).
Let say those 10 billions objects are stored in a single 4+2 erasure coded pool with bluestore compression set for objects that have a size > 32KB and the smallest allocation size for bluestore set to 4KB[3]. The 750TB won't use the expected 350TB but about 30% more, i.e. ~450TB (see [4] for the maths). This space amplification is because storing a 1 byte object uses the same space as storing a 16KB object (see [5] to repeat the experience at home). In a 4+2 erasure coded pool, each of the 6 chunks will use no less than 4KB because that's the smallest allocation size for bluestore. That's 4 * 4KB = 16KB even when all that is needed is 1 byte.
It was suggested[6] to have two different pools: one with a 4+2 erasure pool and compression for all objects with a size > 32KB that are expected to compress to 16KB. And another with 3 replicas for the smaller objects to reduce space amplification to a minimum without compromising on durability. A client looking for the object could make two simultaneous requests to the two pools. They would get 404 from one of them and the object from the other.
Another workaround, is best described in the "Finding a needle in Haystack: Facebook’s photo storage"[9] paper and essentially boils down to using a database to store a map between the object name and its location. That does not scale out (writing the database index is the bottleneck) but it's simple enough and is successfully implemented in EOS[0] with >200PB worth of data and in seaweedfs[10], another promising object store software based on the same idea.
Instead of working around the problem, maybe Ceph could be modified to make better use of the immutability of these objects[7], a hint that is apparently only used to figure out how to best compress it and for checksum calculation[8]. I honestly have not clue how difficult it would be. All I know is that it's not easy otherwise it would have been done already: there seem to be a general need for efficiently (space wise and performance wise) storing large quantities of objects smaller than 4KB.
Is it more advisable to:
* work on Ceph internals to make it friendly to this particular workload or,
* write another implementation of "Finding a needle in Haystack: Facebook’s photo storage"[9] based on RBD[11]?
I'm currently leaning toward working on Ceph internals but there are pros and cons to both approaches[12]. And since all this is still very new to me, there also is the possibility that I'm missing something. Maybe it's *super* difficult to improve Ceph in this way. I should try to figure that out sooner rather than later.
I realize it's a lot to take in and unless you're facing the exact same problem there is very little chance you read that far :-) But if you did... I'm *really* interested to hear what yout think. In any case I'll report back to this thread once a decision has been made.
Cheers
[0] https://eos-web.web.cern.ch/eos-web/
[1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AEMW6O7WVJF…https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RHQ5ZCHJISX…
[2] https://forge.softwareheritage.org/T3054
[3] https://github.com/ceph/ceph/blob/3f5e778ad6f055296022e8edabf701b6958fb602/…
[4] https://forge.softwareheritage.org/T3052#58864
[5] https://forge.softwareheritage.org/T3052#58917
[6] https://forge.softwareheritage.org/T3052#58876
[7] https://docs.ceph.com/en/latest/rados/api/librados/#c.@3.LIBRADOS_ALLOC_HIN…
[8] https://forge.softwareheritage.org/T3055
[9] https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf
[10] https://github.com/chrislusf/seaweedfs/wiki/Components
[11] https://forge.softwareheritage.org/T3049
[12] https://forge.softwareheritage.org/T3054#58977
--
Loïc Dachary, Artisan Logiciel Libre