Hello,
I have a ceph object cluster (12.2.11) that I am unable to figure out how to link a bucket to a new user when tenants are involved. If no tenant is mentioned (default tenant) I am able to link the bucket to a new user in the default tenant just fine. (e.g.: radosgw-admin bucket link –bucket=testbucket –uid=testuser)
As soon as I try to either move a bucket from a user in one tenant to another user in the same tenant, or move a bucket from one tenant to a user in another tenant, I get a failure:
radosgw-admin bucket link --bucket=’tenant1/tenantbucket' --uid='tenant1$bentest456'
failure: (2) No such file or directory:
2020-06-17 10:20:45.466579 7f42dd6ccdc0 0 could not get bucket info for bucket=tenant1/tenantbucket
Yes, I’ve verified tenant1/tenantbucket exists.
I’ve tried searching previous posts and have been following the RH doco here to no avail: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/o…
Any help would be appreciated. Trying to solve specifically for moving between users in the same tenant, but would like to solve for other scenarios as well (moving between tenants, etc.)
Thanks,
Ben
Hi all.
Is there any way that I could calculate how much time it takes to add
OSD to my cluster and get rebalanced or how much it takes to out OSD
from my cluster?
Thanks.
Hi guys,
I can mount my cephfs via mount command and access it without any problem.
Now I want to integrate it in autofs which is used on our cluster.
It seems this is not a popular approach and I found only this link:
https://drupal.star.bnl.gov/STAR/blog/mpoat/how-mount-cephfs
I followed the link but could not get it to work. I am wondering if this is
possible at all?
We are using CentOS 7.8 and the ceph cluster is running nautilus 14.2.9
Regards,
Derrick
Hi
I wonder if there is any (theoretical) advantage running a separate
backend network next to the public network (through vlan separation) over
a single interface
I googled a lot and while some blogs advice to do so, they do not give any
argument that supports this statement
Any insights on this is much appreciated
Thanks
Marcel
Hi Brett,
So how far apart are your buildings and what is the network connectivity between the buildings? I am going to assume they are close and you have lots of bandwidth.
There are a couple of options depending on the protocol and the distance between the buildings.
You could build an EC cluster with something like 4:6 so 4 data pieces and 6 parity pieces (Assuming you have 5 nodes in each DC).
With this setup you can then have a failure of an entire DC and still have access to your data with protection. This is basically achieved by building the correct crush map rules which place half the data in one DC and the other half in the other DC.
You would need to think about where you would put a third monitor in this case.
The down side of this is you could be reading data from either DC. Not sure where your workloads are.
There is another alternative of this which is to use LRC which creates the ability to rebuild the data with a DC this helps when it comes to rebuilds but doesn't help with where to read data from.
The other option would be replication. So build two separate clusters and you can configure S3 to replicate to the second site. Or setup rsync to replicate if using CephFS, not pretty but an option.
Darren
From: Brett Randall <brett.randall(a)gmail.com>
Date: Wednesday, 10 June 2020 at 15:20
To: ceph-users(a)ceph.io <ceph-users(a)ceph.io>
Subject: [ceph-users] Combining erasure coding and replication?
Hi all
We are looking at setting up our first ever Ceph cluster to replace Gluster as our media asset storage and production system. The Ceph cluster will have 5pb of usable storage. Whether we use it as object-storage, or put CephFS in front of it, is still TBD.
Obviously we’re keen to protect this data well. Our current Gluster setup utilises RAID-6 on each of the nodes and then we have a single replica of each brick. The Gluster bricks are split between buildings so that the replica is guaranteed to be in another premises. By doing it this way, we guarantee that we can have a decent number of disk or node failures (even an entire building) before we lose both connectivity and data.
Our concern with Ceph is the cost of having three replicas. Storage may be cheap but I’d rather not buy ANOTHER 5pb for a third replica if there are ways to do this more efficiently. Site-level redundancy is important to us so we can’t simply create an erasure-coded volume across two buildings – if we lose power to a building, the entire array would become unavailable. Likewise, we can’t simply have a single replica – our fault tolerance would drop way down on what it is right now.
Is there a way to use both erasure coding AND replication at the same time in Ceph to mimic the architecture we currently have in Gluster? I know we COULD just create RAID6 volumes on each node and use the entire volume as a single OSD, but that this is not the recommended way to use Ceph. So is there some other way?
Apologies if this is a nonsensical question, I’m still trying to wrap my head around Ceph, CRUSH maps, placement rules, volume types, etc etc!
TIA
Brett
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hi,
I have a question regarding Ceph CRUSH. I have been going through Crush.h
file. It says that *struct crush_bucket **buckets * (below) is an array of
pointers. My understanding is that this particular array of pointers is a
collection of addresses of six scalar values namely __s32 id; __u16
type; __u8 alg, __u8 hash, __u32 weight, __u32 size and the reason it
has double pointer **buckets because it also points to another pointer
namely __s32 *items? Please correct me if I am wrong.
/** @ingroup API
*
* A crush map define a hierarchy of crush_bucket that end with leaves
* (buckets and leaves are called items) and a set of crush_rule to
* map an integer to items with the crush_do_rule() function.
*
*/
struct crush_map {
/*! An array of crush_bucket pointers of size __max_buckets__.
* An element of the array may be NULL if the bucket was removed
with
* crush_remove_bucket(). The buckets must be added with
crush_add_bucket().
* The bucket found at __buckets[i]__ must have a crush_bucket.id
== -1-i.
*/
struct crush_bucket **buckets;
/*! An array of crush_rule pointers of size __max_rules__.
* An element of the array may be NULL if the rule was removed
(there is
* no API to do so but there may be one in the future). The rules
must be added
* with crush_add_rule().
*/
struct crush_rule **rules;
__s32 max_buckets; /*!< the size of __buckets__ */
__u32 max_rules; /*!< the size of __rules__ */
/*! The value of the highest item stored in the crush_map + 1
*/
__s32 max_devices;
/*! Backward compatibility tunable. It implements a bad solution
* and must always be set to 0 except for backward compatibility
* purposes
*/
__u32 choose_local_tries;
/*! Backward compatibility tunable. It implements a bad solution
* and must always be set to 0 except for backward compatibility
* purposes
*/
__u32 choose_local_fallback_tries;
/*! Tunable. The default value when the CHOOSE_TRIES or
* CHOOSELEAF_TRIES steps are omitted in a rule. See the
* documentation for crush_rule_set_step() for more
* information
*/
__u32 choose_total_tries;
/*! Backward compatibility tunable. It should always be set
* to 1 except for backward compatibility. Implemented in 2012
* it was generalized late 2013 and is mostly unused except
* in one border case, reason why it must be set to 1.
*
* Attempt chooseleaf inner descent once for firstn mode; on
* reject retry outer descent. Note that this does *not*
* apply to a collision: in that case we will retry as we
* used to.
*/
__u32 chooseleaf_descend_once;
/*! Backward compatibility tunable. It is a fix for bad
* mappings implemented in 2014 at
* https://github.com/ceph/ceph/pull/1185. It should always
* be set to 1 except for backward compatibility.
*
* If non-zero, feed r into chooseleaf, bit-shifted right by
* (r-1) bits. a value of 1 is best for new clusters. for
* legacy clusters that want to limit reshuffling, a value of
* 3 or 4 will make the mappings line up a bit better with
* previous mappings.
*/
__u8 chooseleaf_vary_r;
/*! Backward compatibility tunable. It is an improvement that
* avoids unnecessary mapping changes, implemented at
* https://github.com/ceph/ceph/pull/6572 and explained in
* this post: "chooseleaf may cause some unnecessary pg
* migrations" in October 2015
*
https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg26075.html
* It should always be set to 1 except for backward compatibility.
*/
__u8 chooseleaf_stable;
/*! @cond INTERNAL */
/* This value is calculated after decode or construction by
the builder. It is exposed here (rather than having a
'build CRUSH working space' function) so that callers can
reserve a static buffer, allocate space on the stack, or
otherwise avoid calling into the heap allocator if they
want to. The size of the working space depends on the map,
while the size of the scratch vector passed to the mapper
depends on the size of the desired result set.
Nothing stops the caller from allocating both in one swell
foop and passing in two points, though. */
size_t working_size;
#ifndef __KERNEL__
/*! @endcond */
/*! Backward compatibility tunable. It is a fix for the straw
* scaler values for the straw algorithm which is deprecated
* (straw2 replaces it) implemented at
* https://github.com/ceph/ceph/pull/3057. It should always
* be set to 1 except for backward compatibility.
*
*/
__u8 straw_calc_version;
/*! @cond INTERNAL */
/*
* allowed bucket algs is a bitmask, here the bit positions
* are CRUSH_BUCKET_*. note that these are *bits* and
* CRUSH_BUCKET_* values are not, so we need to or together (1
* << CRUSH_BUCKET_WHATEVER). The 0th bit is not used to
* minimize confusion (bucket type values start at 1).
*/
__u32 allowed_bucket_algs;
__u32 *choose_tries;
#endif
/*! @endcond */
};
BR
Hi,
we had bad blocks on one OSD and around the same time a network switch
outage, which seems to have caused some corruption on the mon service.
> # ceph -s
cluster:
id: d7c5c9c7-a227-4e33-ab43-3f4aa1eb0630
health: HEALTH_WARN
1 daemons have recently crashed
14097 slow ops, oldest one blocked for 56417 sec,
mon.server6 has slow ops
mon server6 is low on available space
services:
mon: 3 daemons, quorum server6,server3,server5 (age 15h)
mgr: server4(active, since 3w), standbys: server6, server5
mds: xpool:1 {0=server6=up:active} 1 up:standby
osd: 21 osds: 21 up (since 15h), 20 in (since 16h)
data:
pools: 17 pools, 941 pgs
objects: 6.80M objects, 18 TiB
usage: 34 TiB used, 20 TiB / 54 TiB avail
pgs: 940 active+clean
1 active+clean+scrubbing+deep
io:
client: 23 MiB/s rd, 980 KiB/s wr, 30 op/s rd, 141 op/s wr
14097 slow ops, oldest one blocked for 56417 sec, mon.server6 has slow ops
The mon ops log looks like:
https://gist.github.com/poelzi/45f31f26f6a83f6406bb43553e0c237a
It seems, that the mds transactions don't finish, while waiting for
mdsmap. In the mds server, there are no ops in flight, nor any errors in
the log file.
What is the proper way to repair this ?
kind regards
poelzi
I have installed simple ceph system with two nodes (ceph100, ceph101)
with cephadm and ceph orch host add command. I put the ssh-copy-id -f -i
/etc/ceph/ceph.pub key to the second host (ceph101). I can execute the
ceph -s command from the first host(ceph100) but when I execute the
command in the second host(ceph101), I get the following error.
Error initializing cluster client: ObjectNotFound('RADOS object not
found (error calling conf_read_file)')
Also, when I execute the 'ceph orch ps' command the output seems
suspicious to me.
NAME HOST STATUS REFRESHED AGE
VERSION IMAGE NAME IMAGE ID CONTAINER ID
mon.ceph101 ceph101 starting - -
<unknown> <unknown> <unknown> <unknown>
Has anyone any idea what could be the problem or anyone give me a fine
link for the octopus cephadm installation?
Regards.
I'm happy to announce the another release of the go-ceph API
bindings. This is a regular release following our every-two-months release
cadence.
https://github.com/ceph/go-ceph/releases/tag/v0.4.0
The bindings aim to play a similar role to the "pybind" python bindings in the
ceph tree but for the Go language. These API bindings require the use of cgo.
There are already a few consumers of this library in the wild and the
ceph-csi project is starting to make use of this library.
Specific questions, comments, bugs etc are best directed at our github issues
tracker.
---
John Mulligan
phlogistonjohn(a)asynchrono.us
jmulligan(a)redhat.com