Hello,
I have a Nautilus cluster with a cephfs volume, on grafana, it shows that cephfs_data pool is almost full[1] but if I give a look to the pool
usage, it looks like I have plenty of space. Which metrics are used by grafana?
1. https://framapic.org/5r7J86s55x6k/jGSIsjEUPYMU.png
pool usage:
> artemis@icitsrv5:~$ ceph df detail
> RAW STORAGE:
> CLASS SIZE AVAIL USED RAW USED %RAW USED
> hdd 662 TiB 296 TiB 366 TiB 366 TiB 55.32
> TOTAL 662 TiB 296 TiB 366 TiB 366 TiB 55.32
>
> POOLS:
> POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
> .rgw.root 3 8.1 KiB 15 2.8 MiB 0 63 TiB N/A N/A 15 0 B 0 B
> default.rgw.control 4 0 B 8 0 B 0 63 TiB N/A N/A 8 0 B 0 B
> default.rgw.meta 5 26 KiB 85 16 MiB 0 63 TiB N/A N/A 85 0 B 0 B
> default.rgw.log 6 0 B 207 0 B 0 63 TiB N/A N/A 207 0 B 0 B
> cephfs_data 7 113 TiB 139.34M 186 TiB 49.47 138 TiB N/A N/A 139.34M 0 B 0 B
> cephfs_metadata 8 54 GiB 10.21M 57 GiB 0.03 63 TiB N/A N/A 10.21M 0 B 0 B
> default.rgw.buckets.data 9 122 TiB 54.57M 173 TiB 47.70 138 TiB N/A N/A 54.57M 0 B 0 B
> default.rgw.buckets.index 10 2.6 GiB 19.97k 2.6 GiB 0 63 TiB N/A N/A 19.97k 0 B 0 B
> default.rgw.buckets.non-ec 11 67 MiB 186 102 MiB 0 63 TiB N/A N/A 186 0 B 0 B
> device_health_metrics 12 1.2 MiB 145 1.2 MiB 0 63 TiB N/A N/A 145 0 B 0 B
Best,
--
Yoann Moulin
EPFL IC-IT
I just upgraded a cluster that I inherited from Jewel to Luminous and
trying to work through the new warnings/errors.
I got the message about 3 OMAP objects being too big, all of them in the
default.rgw.buckets.index pool. I expected that dynamic sharding should
kick in, but no luck after several days. I looked at
$ radosgw-admin reshard list
[2020-03-02 04:27:22.303601 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000000
2020-03-02 04:27:22.305403 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000001
2020-03-02 04:27:22.307038 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000002
2020-03-02 04:27:22.317932 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000003
2020-03-02 04:27:22.348383 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000004
2020-03-02 04:27:22.349212 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000005
2020-03-02 04:27:22.349853 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000006
2020-03-02 04:27:22.350490 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000007
2020-03-02 04:27:22.351256 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000008
2020-03-02 04:27:22.351843 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000009
2020-03-02 04:27:22.353225 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000010
2020-03-02 04:27:22.353910 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000011
2020-03-02 04:27:22.367161 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000012
2020-03-02 04:27:22.367741 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000013
2020-03-02 04:27:22.368498 7f8bb58b8e40 -1 ERROR: failed to list reshard
log entries, oid=reshard.0000000014
]
And searching on the Internet indicates tha Luminous added a new "reshard"
namespace that the radosgw user needs access to. I'm not sure which pool
this namespace was added to (being from Jewel there are a slew of rgw
pools) and I'm not sure which rados-gw user it's talking about. I can't
find a keyring for rados-admin, but it works so I assume that it is using
the admin key ring. The permissions are open. I even appended the namespace
option to the admin caps like follows:
[client.admin]
key = SECRET
caps mds = "allow *"
caps mon = "allow *"
caps: [osd] allow * namespace="*"
But I get a new error:
$ radosgw-admin reshard list
2020-03-02 05:31:40.875642 7fdb983f7e40 0 failed reading realm info: ret
-1 (1) Operation not permitted
Any nudge in the right direction would be helpful. I manually sharded the
indexes, but I'd really like to have it done automatically from now on so I
don't have to worry about it.
Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Hello,
On a Nautilus cluster, I'd like to move monitors from bare metal servers to VMs to prepare a migration.
I have added 3 new monitors on 3 VMs and I'd like to stop the 3 old monitors daemon. But I soon as I stop the 3rd old monitor, the cluster stuck
because the election of a new monitor fails.
The 3 old monitors are in 14.2.4-1xenial
The 3 new monitors are in 14.2.7-1bionic
> 2020-03-09 16:06:00.167 7fc4a3138700 1 mon.icvm0017(a)3(peon).paxos(paxos active c 20918592..20919120) lease_timeout -- calling new election
> 2020-03-09 16:06:02.143 7fc49f931700 1 mon.icvm0017@3(probing) e4 handle_auth_request failed to assign global_id
Did I miss something?
In attachment : some logs and ceph.conf
Thanks for your help.
Best,
--
Yoann Moulin
EPFL IC-IT
Hi, (nautilus, 14.2.8, whole cluster)
I doodled with adding a second cephfs and the project got canceled. I removed the unused cephfs with "ceph fs rm dream --yes-i-really-mean-it" and that worked as expected. I have a lingering health warning though which won't clear.
The original cephfs1 volume exists and is healthy:
[root@cephmon-03]# ceph fs ls
name: cephfs1, metadata pool: stp.cephfs_metadata, data pools: [stp.cephfs_data ]
[root@cephmon-03]# ceph mds stat
cephfs1:3 {0=cephmon-03=up:active,1=cephmon-02=up:active,2=cephmon-01=up:active}
[root@cephmon-03]# ceph health detail
HEALTH_WARN insufficient standby MDS daemons available
MDS_INSUFFICIENT_STANDBY insufficient standby MDS daemons available
have 0; want 1 more
[root@cephmon-03]#
I have not yet deleted the pools for 'dream', the second cephfs definition. (There is nothing in it.)
Before deleting the pools is there a command to clear this warning?
Thanks!
peter
Peter Eisch
Senior Site Reliability Engineer
T1.612.445.5135
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.
v2.64
Hi,
When upgrading a cluster from Luminous to Nautilus I followed a page on ceph.com. I need to do another cluster and while I tagged the link, the page no longer exists.
https://docs.ceph.com/master/releases/nautilus/#nautilus-old-upgrade
Might anyone have either an updated link or point me to how to find the contents of best steps?
Thanks,
peter
Peter Eisch
Senior Site Reliability Engineer
T1.612.445.5135
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.
v2.64
Hi team, I'm planning to invest in hardware for a PoC and I would like your
feedback before the purchase:
The goal is to deploy a *16TB* storage cluster, with *3 replicas* thus *3
nodes*.
System configuration: https://pcpartpicker.com/list/cfDpDx ($400 USD per
node)
Some notes about the configuration:
- 4-core processor for 4 OSD daemons
- 8GB RAM for the first 4TB of storage, that will increase to 16GB of
RAM when 16TB of storage.
- Motherboard:
- 4 x SATA 6 Gb/s (one per each OSD disk)
- 2 x PCI-E x1 Slots (1 will be used for an additional Gigabit
Ethernet)
- 1 x M.2 Slots for the host OS
- Ram can increase up-to 32 GB, and another SATA 6b/s controller can
be added on PCI-E x1 for growth up to *32TB*
As noted, the plan is to deploy nodes with *4TB* and gradually add *12TB* as
needed, memory also should be increased to *16GB* after *8TB* threshold.
Edit
Questions to validate before the purchase
1. Does the hardware components make sense for the *16TB* growth projection?
2. Is it easy to gradually add more capacity to each node (*4TB* each time
per node)?
Thanks for your support!
--
Ignacio Ocampo
I am wondering if there exists a tool, faster than "rados export", that
can copy and restore read-only pools (to/from another pool or file system).
It looks like "rados export" is very slow because it is single threaded
(the best I can tell, --workers doesn't make a difference).
Vlad
On Fri, Mar 6, 2020 at 1:06 AM M Ranga Swami Reddy <swamireddy(a)gmail.com>
wrote:
> Hello,
> Can we get the IOPs of any rbd image/volume?
>
> For ex: I have created volumes via OpenStack Cinder. Want to know
> the IOPs of these volumes.
>
> In general - we can get pool stats, but not seen the per volumes stats.
>
> Any hint here? Appreciated.
>