Hello -
We're trying to use native libRADOS and the only challenge we're running
into is searching metadata.
Using the rgw metadata sync seems to require all data to be pushed through
the rgw, which is not something we're interested in setting up at the
moment.
Are there hooks or features of libRADOS which could be leveraged to enable
syncing of metadata to an external system (elastic-search / postgres / etc)?
Is there a way to listen to a stream of updates to a pool in real-time,
with some guarantees I wouldn't miss things?
Are there any features like this in libRADOS?
Thank you
> This is wrong. Ceph 15 runs on CentOS 7 just fine, but without the
> dashboard.
>
I also hope that ceph is keeping support for el7 till it is eol in 2024. So I have enough time to figure out what OS to choose.
Hi,
Assuming a cluster (currently octopus, might upgrade to pacific once
released) serving only CephFS and that only to a handful of kernel and
fuse-clients (no OpenStack, CSI or similar): Are there any side effects
of not using the ceph-mgr volumes module abstractions [1], namely
subvolumes and subvolume groups, that I have to consider?
I would still only mount subtrees of the whole (single) CephFS file
system and have some clients which mount multiple disjunct subtrees.
Quotas would only be set on the subtree level which I am mounting,
likewise file layouts. Snapshots (via mkdir in .snap) would be used on
the mounting level or one level above.
Background: I don't require the abstraction features per se. Some
restrictions (e.g. subvolume group snapshots not being supported) seem
to me to be caused only by the abstraction layer and not the underlying
CephFS. For my specific use case I require snapshots on the subvolume
group layer. It therefore seems better to just forego the abstraction as
a whole and work on bare CephFS.
Cheers
Sebastian
[1] https://docs.ceph.com/en/octopus/cephfs/fs-volumes/
Hi,
On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP
Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with
Ceph Octopus 15.2.9 packages installed. The MDS server is running
Nautilus 14.2.16. Messenger v2 has been enabled. Poort 3300 of the
monitors is reachable from the client. At mount time we get the following:
> Mar 2 09:01:14 kernel: Key type ceph registered
> Mar 2 09:01:14 kernel: libceph: loaded (mon/osd proto 15/24)
> Mar 2 09:01:14 kernel: FS-Cache: Netfs 'ceph' registered for caching
> Mar 2 09:01:14 kernel: ceph: loaded (mds proto 32)
> Mar 2 09:01:14 kernel: libceph: mon4 (1)[mond addr]:6789 session established
> Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec
> Mar 2 09:01:14 kernel: ceph: corrupt mdsmap
> Mar 2 09:01:14 kernel: ceph: error decoding mdsmap -22
> Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec
> Mar 2 09:01:14 kernel: libceph: corrupt full osdmap (-22) epoch 98764 off 6357 (0000000027a57a75 of 00000000d3075952-00000000e307797f)
> Mar 2 09:02:15 kernel: ceph: No mds server is up or the cluster is laggy
The /etc/ceph/ceph.conf has been adjusted to reflect the messenger v2
changes. ms_bind_ipv6=trie, ms_bind_ipv4=false. The kernel client still
seems to be use the v1 port though (although since 5.11 v2 should be
supported).
Has anyone seen this before? Just guessing here, but could it that the
client tries to speak v2 protocol on v1 port?
Thanks,
Stefan
Hi list,
We recently had a cluster outage over the weekend where several OSDs were inaccessible over night for several hours. When I found the cluster in the morning, the monitors' root disks (which contained both the monitor's leveldb and the Ceph logs) had completely filled.
After restarting OSDs, cleaning out the monitors' logs, moving /var/lib/ceph to dedicated disks on the mons, and starting recovery (in which there was 1 unfound object that I marked lost, if that has any relevancy), the leveldb continued/continues to grow without bound. The cluster has all PGs in active+clean at this point, yet I'm accumulating what seems like approximately ~1GB/hr of new leveldb data.
Two of the monitors (a, c) are in quorum, while the third (b) has been synchronizing for the last several hours, but doesn't seem to be able to catch up. Mon 'b' has been running for 4 hours now in the 'synchronizing' state. The mon's log has many messages about compacting and deleting files, yet we never exit the synchronization state.
The ceph.log is also rapidly accumulating complaints that the mons are slow (not surprising, I suppose, since the levelDBs are ~100GB at this point).
I've found that using monstore tool to do compaction on mons 'a' and 'c' thelps but is only a temporary fix. Soon the database inflates again and I'm back to where I started.
Thoughts on how to proceed here? Some ideas I had:
- Would it help to add some new monitors that use RocksDB?
- Stop a monitor and dump the keys via monstoretool, just to get an idea of what's going on?
- Increase mon_sync_max_payload_size to try to move data in larger chunks?
- Drop down to a single monitor, and see if normal compaction triggers and stops growing unbounded?
- Stop both 'a' and 'c', compact them, start them, and immediately start 'b' ?
Appreciate any advice.
Regards,
Lincoln
Hi,
Having been slightly caught out by tunables on my Octopus upgrade[0],
can I just check that if I do
ceph osd crush tunables optimal
That will update the tunables on the cluster to the current "optimal"
values (and move a lot of data around), but that this doesn't mean
they'll change next time I upgrade the cluster or anything like that?
It's not quite clear from the documentation whether the next time
"optimal" tunables change that'll be applied to a cluster where I've set
tunables thus, or if tunables are only ever changed by a fresh
invocation of ceph osd crush tunables...
[I assume the same answer applies to "default"?]
Regards,
Matthew
[0] I foolishly thought a cluster initially installed as Jewel would
have jewel tunables
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Seems like someone is not testing cephadm on centos 7.9
Just tried installing cephadm from the repo, and ran
cephadm bootstrap --mon-ip=xxx
it blew up, with
ceph TypeError: __init__() got an unexpected keyword argument 'verbose_on_failure'
just after the firewall section.
I happen to have a test cluser from a few months ago, and compared the code.
Some added, in line 2348,
" out, err, ret = call([self.cmd, '--permanent', '--query-port', tcp_port], verbose_on_failure=False)"
this made the init fail, on my centos 7.9 system, freshly installed and updated today.
# cephadm version
ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)
Simply commenting out that line makes it complete the cluster init like I remember.
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com