March 2021 - ceph-users - lists.ceph.io

what is quickest way to generate a new key for a user?

by Marc

what is quickest way to generate a new key for a user?

3 years, 2 months

1
0
0 0

Best practices for OSD on bcache

by Norman.Kern

Hi, guys I am testing ceph on bcache devices, I found the performance is not good as expected. Does anyone have any best practices for it? Thanks.

3 years, 2 months

5
11
0 0

Metadata for LibRADOS

by Cary FitzHugh

Hello - We're trying to use native libRADOS and the only challenge we're running into is searching metadata. Using the rgw metadata sync seems to require all data to be pushed through the rgw, which is not something we're interested in setting up at the moment. Are there hooks or features of libRADOS which could be leveraged to enable syncing of metadata to an external system (elastic-search / postgres / etc)? Is there a way to listen to a stream of updates to a pool in real-time, with some guarantees I wouldn't miss things? Are there any features like this in libRADOS? Thank you

3 years, 2 months

5
7
0 0

Re: Questions RE: Ceph/CentOS/IBM

by Marc

> This is wrong. Ceph 15 runs on CentOS 7 just fine, but without the > dashboard. > I also hope that ceph is keeping support for el7 till it is eol in 2024. So I have enough time to figure out what OS to choose.

3 years, 2 months

2
2
0 0

CephFS: side effects of not using ceph-mgr volumes / subvolumes

by Sebastian Knust

Hi, Assuming a cluster (currently octopus, might upgrade to pacific once released) serving only CephFS and that only to a handful of kernel and fuse-clients (no OpenStack, CSI or similar): Are there any side effects of not using the ceph-mgr volumes module abstractions [1], namely subvolumes and subvolume groups, that I have to consider? I would still only mount subtrees of the whole (single) CephFS file system and have some clients which mount multiple disjunct subtrees. Quotas would only be set on the subtree level which I am mounting, likewise file layouts. Snapshots (via mkdir in .snap) would be used on the mounting level or one level above. Background: I don't require the abstraction features per se. Some restrictions (e.g. subvolume group snapshots not being supported) seem to me to be caused only by the abstraction layer and not the underlying CephFS. For my specific use case I require snapshots on the subvolume group layer. It therefore seems better to just forego the abstraction as a whole and work on bare CephFS. Cheers Sebastian [1] https://docs.ceph.com/en/octopus/cephfs/fs-volumes/

3 years, 2 months

2
1
0 0

cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

by Stefan Kooman

Hi, On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with Ceph Octopus 15.2.9 packages installed. The MDS server is running Nautilus 14.2.16. Messenger v2 has been enabled. Poort 3300 of the monitors is reachable from the client. At mount time we get the following: > Mar 2 09:01:14 kernel: Key type ceph registered > Mar 2 09:01:14 kernel: libceph: loaded (mon/osd proto 15/24) > Mar 2 09:01:14 kernel: FS-Cache: Netfs 'ceph' registered for caching > Mar 2 09:01:14 kernel: ceph: loaded (mds proto 32) > Mar 2 09:01:14 kernel: libceph: mon4 (1)[mond addr]:6789 session established > Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec > Mar 2 09:01:14 kernel: ceph: corrupt mdsmap > Mar 2 09:01:14 kernel: ceph: error decoding mdsmap -22 > Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec > Mar 2 09:01:14 kernel: libceph: corrupt full osdmap (-22) epoch 98764 off 6357 (0000000027a57a75 of 00000000d3075952-00000000e307797f) > Mar 2 09:02:15 kernel: ceph: No mds server is up or the cluster is laggy The /etc/ceph/ceph.conf has been adjusted to reflect the messenger v2 changes. ms_bind_ipv6=trie, ms_bind_ipv4=false. The kernel client still seems to be use the v1 port though (although since 5.11 v2 should be supported). Has anyone seen this before? Just guessing here, but could it that the client tries to speak v2 protocol on v1 port? Thanks, Stefan

3 years, 2 months

3
14
0 0

Monitor leveldb growing without bound v14.2.16

by Lincoln Bryant

Hi list, We recently had a cluster outage over the weekend where several OSDs were inaccessible over night for several hours. When I found the cluster in the morning, the monitors' root disks (which contained both the monitor's leveldb and the Ceph logs) had completely filled. After restarting OSDs, cleaning out the monitors' logs, moving /var/lib/ceph to dedicated disks on the mons, and starting recovery (in which there was 1 unfound object that I marked lost, if that has any relevancy), the leveldb continued/continues to grow without bound. The cluster has all PGs in active+clean at this point, yet I'm accumulating what seems like approximately ~1GB/hr of new leveldb data. Two of the monitors (a, c) are in quorum, while the third (b) has been synchronizing for the last several hours, but doesn't seem to be able to catch up. Mon 'b' has been running for 4 hours now in the 'synchronizing' state. The mon's log has many messages about compacting and deleting files, yet we never exit the synchronization state. The ceph.log is also rapidly accumulating complaints that the mons are slow (not surprising, I suppose, since the levelDBs are ~100GB at this point). I've found that using monstore tool to do compaction on mons 'a' and 'c' thelps but is only a temporary fix. Soon the database inflates again and I'm back to where I started. Thoughts on how to proceed here? Some ideas I had: - Would it help to add some new monitors that use RocksDB? - Stop a monitor and dump the keys via monstoretool, just to get an idea of what's going on? - Increase mon_sync_max_payload_size to try to move data in larger chunks? - Drop down to a single monitor, and see if normal compaction triggers and stops growing unbounded? - Stop both 'a' and 'c', compact them, start them, and immediately start 'b' ? Appreciate any advice. Regards, Lincoln

3 years, 2 months

4
4
0 0

"optimal" tunables on release upgrade

by Matthew Vernon

Hi, Having been slightly caught out by tunables on my Octopus upgrade[0], can I just check that if I do ceph osd crush tunables optimal That will update the tunables on the cluster to the current "optimal" values (and move a lot of data around), but that this doesn't mean they'll change next time I upgrade the cluster or anything like that? It's not quite clear from the documentation whether the next time "optimal" tunables change that'll be applied to a cluster where I've set tunables thus, or if tunables are only ever changed by a fresh invocation of ceph osd crush tunables... [I assume the same answer applies to "default"?] Regards, Matthew [0] I foolishly thought a cluster initially installed as Jewel would have jewel tunables -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

3 years, 2 months

2
1
0 0

Octopus auto-scale causing HEALTH_WARN re object numbers

by Matthew Vernon

Hi, I've upgraded our test cluster to Octopus, and enabled the auto-scaler. It's nearly finished: PG autoscaler decreasing pool 11 PGs from 1024 to 32 (4d) [==========================..] (remaining: 3h) But I notice it looks to be making pool 11 smaller when HEALTH_WARN thinks it should be larger: root@sto-t1-1:~# ceph health detail HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs not deep-scrubbed in time [WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average pool default.rgw.buckets.data objects per pg (313153) is more than 23.4063 times cluster average (13379) ...which seems like the wrong thing for the auto-scaler to be doing. Is this a known problem? Regards, Matthew More details: ceph df: root@sto-t1-1:~# ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 993 TiB 782 TiB 210 TiB 211 TiB 21.22 TOTAL 993 TiB 782 TiB 210 TiB 211 TiB 21.22 --- POOLS --- POOL ID STORED OBJECTS USED %USED MAX AVAIL .rgw.root 2 69 KiB 4 1.4 MiB 0 220 TiB default.rgw.control 3 1.1 MiB 8 3.3 MiB 0 220 TiB default.rgw.data.root 4 115 KiB 14 3.6 MiB 0 220 TiB default.rgw.gc 5 5.3 MiB 32 23 MiB 0 220 TiB default.rgw.log 6 31 MiB 184 96 MiB 0 220 TiB default.rgw.users.uid 7 249 KiB 8 1.8 MiB 0 220 TiB default.rgw.buckets.data 11 23 GiB 10.02M 2.0 TiB 0.30 220 TiB rgwtls 13 54 KiB 3 843 KiB 0 220 TiB pilot-metrics 14 285 MiB 2.60M 476 GiB 0.07 220 TiB pilot-images 15 40 GiB 4.97k 122 GiB 0.02 220 TiB pilot-volumes 16 192 GiB 48.90k 577 GiB 0.09 220 TiB pilot-vms 17 125 GiB 33.79k 376 GiB 0.06 220 TiB default.rgw.users.keys 18 111 KiB 5 1.5 MiB 0 220 TiB default.rgw.buckets.index 19 4.0 GiB 246 12 GiB 0 220 TiB rbd 20 39 TiB 10.09M 116 TiB 14.88 220 TiB default.rgw.buckets.non-ec 21 344 KiB 1 1.0 MiB 0 220 TiB rgw-ec 22 7.0 TiB 1.93M 11 TiB 1.57 441 TiB rbd-ec 23 45 TiB 11.73M 67 TiB 9.22 441 TiB default.rgw.users.email 24 23 MiB 1 69 MiB 0 220 TiB pilot-backups 25 73 MiB 3 219 MiB 0 220 TiB device_health_metrics 26 51 MiB 186 153 MiB 0 220 TiB root@sto-t1-1:~# ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE .rgw.root 70843 3.0 992.7T 0.0000 1.0 32 on default.rgw.control 1116k 3.0 992.7T 0.0000 1.0 32 on default.rgw.data.root 115.1k 3.0 992.7T 0.0000 1.0 32 on default.rgw.gc 5379k 3.0 992.7T 0.0000 1.0 32 on default.rgw.log 32036k 3.0 992.7T 0.0000 1.0 32 on default.rgw.users.uid 248.7k 3.0 992.7T 0.0000 1.0 32 on default.rgw.buckets.data 23894M 3.0 992.7T 0.0001 1.0 32 on rgwtls 55760 3.0 992.7T 0.0000 1.0 32 on pilot-metrics 285.3M 3.0 992.7T 0.0000 1.0 32 on pilot-images 41471M 3.0 992.7T 0.0001 1.0 32 on pilot-volumes 192.3G 3.0 992.7T 0.0006 1.0 32 on pilot-vms 124.6G 3.0 992.7T 0.0004 1.0 32 on default.rgw.users.keys 111.1k 3.0 992.7T 0.0000 1.0 32 on default.rgw.buckets.index 4090M 3.0 992.7T 0.0000 1.0 32 on rbd 39430G 3.0 992.7T 0.1164 1.0 1024 on default.rgw.buckets.non-ec 344.3k 3.0 992.7T 0.0000 1.0 32 on rgw-ec 7175G 1.5 992.7T 0.0106 1.0 64 on rbd-ec 45806G 1.5 992.7T 0.0676 1.0 1024 on default.rgw.users.email 23530k 3.0 992.7T 0.0000 1.0 32 on pilot-backups 74699k 3.0 992.7T 0.0000 1.0 32 on device_health_metrics 52128k 3.0 992.7T 0.0000 1.0 32 on -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

3 years, 2 months

1
1
0 0

bug in latest cephadm bootstrap: got an unexpected keyword argument 'verbose_on_failure'

by Philip Brown

Seems like someone is not testing cephadm on centos 7.9 Just tried installing cephadm from the repo, and ran cephadm bootstrap --mon-ip=xxx it blew up, with ceph TypeError: __init__() got an unexpected keyword argument 'verbose_on_failure' just after the firewall section. I happen to have a test cluser from a few months ago, and compared the code. Some added, in line 2348, " out, err, ret = call([self.cmd, '--permanent', '--query-port', tcp_port], verbose_on_failure=False)" this made the init fail, on my centos 7.9 system, freshly installed and updated today. # cephadm version ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable) Simply commenting out that line makes it complete the cluster init like I remember. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 2 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021