Hello everyone,
Given that BlueStore has been the default and more widely used
objectstore since quite some time, we would like to understand whether
we can consider deprecating FileStore in our next release, Quincy and
remove it in the R release. There is also a proposal [0] to add a
health warning to report FileStore OSDs.
We discussed this topic in the Ceph Month session today [1] and there
were no objections from anybody on the call. I wanted to reach out to
the list to check if there are any concerns about this or any users
who will be impacted by this decision.
Thanks,
Neha
[0] https://github.com/ceph/ceph/pull/39440
[1] https://pad.ceph.com/p/ceph-month-june-2021
Hi,
In the discussion after the Ceph Month talks yesterday, there was a bit
of chat about cephadm / containers / packages. IIRC, Sage observed that
a common reason in the recent user survey for not using cephadm was that
it only worked on containerised deployments. I think he then went on to
say that he hadn't heard any compelling reasons why not to use
containers, and suggested that resistance was essentially a user
education question[0].
I'd like to suggest, briefly, that:
* containerised deployments are more complex to manage, and this is not
simply a matter of familiarity
* reducing the complexity of systems makes admins' lives easier
* the trade-off of the pros and cons of containers vs packages is not
obvious, and will depend on deployment needs
* Ceph users will benefit from both approaches being supported into the
future
We make extensive use of containers at Sanger, particularly for
scientific workflows, and also for bundling some web apps (e.g.
Grafana). We've also looked at a number of container runtimes (Docker,
singularity, charliecloud). They do have advantages - it's easy to
distribute a complex userland in a way that will run on (almost) any
target distribution; rapid "cloud" deployment; some separation (via
namespaces) of network/users/processes.
For what I think of as a 'boring' Ceph deploy (i.e. install on a set of
dedicated hardware and then run for a long time), I'm not sure any of
these benefits are particularly relevant and/or compelling - Ceph
upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud
Archive) provide .debs of a couple of different Ceph releases per Ubuntu
LTS - meaning we can easily separate out OS upgrade from Ceph upgrade.
And upgrading the Ceph packages _doesn't_ restart the daemons[1],
meaning that we maintain control over restart order during an upgrade.
And while we might briefly install packages from a PPA or similar to
test a bugfix, we roll those (test-)cluster-wide, rather than trying to
run a mixed set of versions on a single cluster - and I understand this
single-version approach is best practice.
Deployment via containers does bring complexity; some examples we've
found at Sanger (not all Ceph-related, which we run from packages):
* you now have 2 process supervision points - dockerd and systemd
* docker updates (via distribution unattended-upgrades) have an
unfortunate habit of rudely restarting everything
* docker squats on a chunk of RFC 1918 space (and telling it not to can
be a bore), which coincides with our internal network...
* there is more friction if you need to look inside containers
(particularly if you have a lot running on a host and are trying to find
out what's going on)
* you typically need to be root to build docker containers (unlike packages)
* we already have package deployment infrastructure (which we'll need
regardless of deployment choice)
We also currently use systemd overrides to tweak some of the Ceph units
(e.g. to do some network sanity checks before bringing up an OSD), and
have some tools to pair OSD / journal / LVM / disk device up; I think
these would be more fiddly in a containerised deployment. I'd accept
that fixing these might just be a SMOP[2] on our part.
Now none of this is show-stopping, and I am most definitely not saying
"don't ship containers". But I think there is added complexity to your
deployment from going the containers route, and that is not simply a
"learn how to use containers" learning curve. I do think it is
reasonable for an admin to want to reduce the complexity of what they're
dealing with - after all, much of my job is trying to automate or
simplify the management of complex systems!
I can see from a software maintainer's point of view that just building
one container and shipping it everywhere is easier than building
packages for a number of different distributions (one of my other hats
is a Debian developer, and I have a bunch of machinery for doing this
sort of thing). But it would be a bit unfortunate if the general thrust
of "let's make Ceph easier to set up and manage" was somewhat derailed
with "you must use containers, even if they make your life harder".
I'm not going to criticise anyone who decides to use a container-based
deployment (and I'm sure there are plenty of setups where it's an
obvious win), but if I were advising someone who wanted to set up and
use a 'boring' Ceph cluster for the medium term, I'd still advise on
using packages. I don't think this makes me a luddite :)
Regards, and apologies for the wall of text,
Matthew
[0] I think that's a fair summary!
[1] This hasn't always been true...
[2] Simple (sic.) Matter of Programming
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Today I visited Ceph's official site and found that links to
`resources` page seemed to be missing.
https://ceph.io/en/
In addition, this page is no longer exists.
https://ceph.io/resources/
Could you tell me where did they are moved?
Thanks,
Satoru
Hello.
Im looking for proper way for the nic bonding for ceph.
I was using the bonding driver with default settings
ad_select=stable(default) and hash algorithm layer2.
With this settings osds are using only one port and lacp only usefull for
different nodes because layer2 hash algorithm using Mac.
I've changed ad_select to bandwitdh and both nic is in use now but layer2
hash prevents dual nic usage for between two nodes (because layer2 using
only Mac ).
People advice using layer2+3 for best performance but it has no effect on
osds because mac and ip is the same.
I've tried layer3+4 to split by ports instead mac and it works. But i dont
know what will the effect and also my switch is layer2.
With "iperf -Parallel 2" now i can reach 19gbit on 2x 10gbit nics.
I think there is no way to use both nic without parallel Usage with
different ports. If I use same port and too many process I can only use one
port at a time.
What settings are you using?
What is the best for ceph?
Hello,
I am setting up user quotas and I would like to enable the check on raw
setting for my user's quota. I can't find any documentation on how to
change this setting in any of the ceph documents. Do any of you know how to
change this setting? Possibly using radosgw-admin?
Thanks in advance!
Jared Jacob
Dear All,
I have deployed the latest CEPH Pacific release in my lab and started to check out the new ?stable? NFS Ganesha features. First of all I'm a bit confused which method to actually use to deploy the NFS cluster:
cephadm or ceph nfs cluster create?
I used "nfs cluster create" for now and noticed a minor problem in the docs.
https://docs.ceph.com/en/latest/cephfs/fs-nfs-exports/#cephfs-nfs
The command is stated as:
$ ceph nfs cluster create <clusterid> [<placement>] [--ingress --virtual-ip <ip>]
while as it needs a type (cephfs) to be specified
nfs cluster create <type> <clusterid> [<placement>] : Create an NFS Cluster
Also I can't manage to use the --ingress --virtual-ip parameter. Every time I try to use it I get this:
[root@cephboot~]# ceph nfs cluster create cephfs ec9e031a-cd10-11eb-a3c3-005056b7db1f --ingress --virtual-ip 192.168.9.199
Invalid command: Unexpected argument '--ingress'
nfs cluster create <type> <clusterid> [<placement>] : Create an NFS Cluster
Error EINVAL: invalid command
So i just deployed a NFS cluster without a VIP. Maybe I'm missing something?
What about this note in the docs:
>> From Pacific, the nfs mgr module must be enabled prior to use. <<
I can't find any info on how to enable it. Maybe this is already the case?
ceph nfs cluster create cephfs ec9e031a-cd10-11eb-a3c3-005056b7db1f "cephnode01"
This seems to be working fine. I managed to connect a CentOS 7 VM and I can access the NFS export just fine. Great stuff.
For testing I tried to attach the same NFS export to a standalone ESXi 6.5 Server. This also works, but its diskspace is shown as 0 bytes:
I'm not sure if this supported or I'm missing something. I could not find any clear info in the docs only some reddit posts where users mentioned that they were able to use it with VMware.
Thanks and Best Regards,
Oliver
Hello Folks,
We are running Ceph Octopus 15.2.13 release and would like to use the disk
prediction module. So far issues we faced are:
1. Ceph documentation does not mention to install
`ceph-mgr-diskprediction-local.noarch`
2. Even if I install the needed package, after mgr restart, it does not
appear on Ceph cluster. Detailed log is here:
gist:b687798ea97ef13e36d466f2d7b1470a
<https://gist.github.com/juztas/b687798ea97ef13e36d466f2d7b1470a> . Ceph -s
shows [1].
Are you aware of this issue and are there any workarounds?
Thanks!
[1]
# ceph -s
cluster:
id: 12d9d70a-e993-464c-a6f8-4f674db35136
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum ceph-mon-cms-1,ceph-mon-cms-2,ceph-mon-cms-3
(age 2d)
mgr: no daemons active (since 11m)
mds: cephfs:1 {0=ceph-mds-cms-1=up:active} 1 up:standby
ceph health detail
HEALTH_WARN no active mgr
[WRN] MGR_DOWN: no active mgr
> but our experience so
> far has been a big improvement over the complexity of managing package
> dependencies across even just a handful of distros
Do you have some charts or docs that show this complexity problem, because I have problems understanding it.
This is very likely due to that my understanding of ceph internals is limited. For instance my view of the osd daemon. Now working with logical volumes for writing/reading data and then you have osd<->osd,mon,mgr communication. What dependency hell is there to be expected?
> (Podman has been
> the only real culprit here, tbh, but I give them a partial pass as the
> tool is relatively new.)
Is it not better for the sake of stability, security and future support to choose something with a proven record?
Hi,
on a large cluster with ~1600 OSDs, 60 servers and using 16+3 erasure
coded pools, the recovery after OSD failure (HDD) is quite slow. Typical
values are at 4GB/s with 125 ops/s and 32MB object sizes, which then
takes 6-8 hours, during that time the pgs are degraded. I tried to speed
it up with
osd advanced osd_max_backfills 32
osd advanced osd_recovery_max_active 10
osd advanced osd_recovery_op_priority 63
osd advanced osd_recovery_sleep_hdd 0.000000
which at least kept the iops/s at a constant level. The recovery does
not seem to be cpu or memory bound. Is there any way to speed it up?
While testing the recovery on replicated pools, it reached 50GB/s.
In contrast, replacing the failed drive with a new one and re-adding the
OSD is quite fast, with 1GB/s recovery rate of misplaced pgs, or
~120MB/s average HDD write speed, which is not very far from HDD throughput.
Regards,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-425-7074
-------------------------------------------------------------
Hello List,
oversudden i can not mount a specific rbd device anymore:
root@proxmox-backup:~# rbd map backup-proxmox/cluster5 -k
/etc/ceph/ceph.client.admin.keyring
/dev/rbd0
root@proxmox-backup:~# mount /dev/rbd0 /mnt/backup-cluster5/
(just never times out)
Any idea how to debug that mount? Tcpdump does show some active traffic.
Cheers,
Michael