Hi everyone,
I realized today that I didn't tell everyone about the cephadm conversion
on the lab cluster, so it seemed like a good time for an overall status
update.
~3 weeks ago I converted the lab cluster to use cephadm. This helped
shake out a number of issues with the upgrade process, and also ran into
some exciting snags. The main issue was that cephadm only works with
bluestore OSDs, and I had forgotten that there were lots of old filestore
OSDs still in the cluster. To get around this, I ended up dist-ugprading
several hosts from xenial to bionic so that the host packages could be
installed (this all happened mid-upgrade and an upgrade bug was preventing
the OSDs from peering). Once things had upgraded and stabilized, I
removed most of the old mira nodes from the cluster (the ones that had
all or mostly filestore OSDs) and rebalanced. Then I finished the cephadm
conversion.
Current status:
- everything is cephadm and container-based
- all OSDs are bluestore
- there are 2 remaining mira nodes in the cluster
- most of the hosts are still running xenial.
One of the nice things about cephadm is that there are few OS
dependencies--we just need podman or docker, python3, and LVM. So it's
/mostly/ fine that these machines are running xenial. But it's not ideal.
- We still have (and want) ceph-common on the host so that the ceph CLI
works. But we don't build octopus for xenial, so the nautilus packages
are still installed. There are a few new changes on the CLI side in
octopus so we should fix this at some point. I think this just means we
should dist-upgrade these machines to xenial.
- There are still two mira in the cluster that we may want to remove at
some point...
A few other changes:
- We have a few SSD-based OSDs, but the crush rules were still spreading
data over everything. I updated the data pools to use hdd only and the
cephfs metadata pool is now on SSDs only. This should have sped things up
a bit!
Cephadm stuff...
For a crash course on cephadm, see the new docs at
https://docs.ceph.com/docs/master/cephadm/
The main thing is that all of the daemons ar erunning in containers. If
you're on the host and want to stop/start things, it's
systemctl stop/start ceph-$fsid@$name
For example,
systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a(a)mon.reesi002
(The nice thing is that tab completion works for the unit name.)
There is also a ceph.target that will stop or start *all* ceph daemons,
either for the cluster (ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a.target)
or all clusters (ceph.target).
The cluster is configured to log to files like traditional ceph
deploysments. The only difference is that logs are in /var/log/ceph/$fsid.
Again, tab completion is your friend (esp when you remember that the fsid
for this cluster starts with "2").
I've been upgrading this cluster regularly (almost every day, if
not more) using the cephadm automated upgrades. That command is
just
ceph orch upgrade start --image quay.io/ceph-ci/ceph:octopus
Cephadm can automatically decide where to deploy daemons based on specs
you provide about placement, count, etc. You can view that with
root@reesi002:~# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME
IMAGE ID
alertmanager 1/1 9m ago 2d count:1 docker.io/prom/alertmanager:latest 0881eb8f169f
crash 9/9 9m ago 2d * quay.io/ceph-ci/ceph:octopus a94ff4985406
grafana 1/1 67s ago 2d count:1 docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6
mds.cephfs 4/4 9m ago 2d label:mds quay.io/ceph-ci/ceph:octopus d97834ddee42
mgr 2/2 7m ago 2d count:2 quay.io/ceph-ci/ceph:octopus a94ff4985406
mon 5/5 9m ago 2d count:5 quay.io/ceph-ci/ceph:octopus a94ff4985406
prometheus 1/1 9m ago 2d count:1 docker.io/prom/prometheus:latest e935122ab143
You can see the actual daemons with 'ceph orch ps'.
You can see that the mds.cephfs service (cephfs == the fs name) is tied to
label 'mds'. You can see host labels with
root@reesi002:~# ceph orch host ls
HOST ADDR LABELS STATUS
mira055 mira055
mira060 mira060
mira093 mira093
reesi001 reesi001 mon mds
reesi002 reesi002 mon mds
reesi003 reesi003 mon mds
reesi004 reesi004 mon mgr
reesi005 reesi005 mon mgr
reesi006 reesi006 mgr mds
(There are other labels set there that aren't getting used at the moment.)
TODOs...
- dist-upgrade everything to xenial.
- Upgrade the host packages. this is most easily done with
./cephadm add-repo --release octopus
./cephadm install cephadm ceph-common
which will install the packaged cephadm (so it's in the path and doesn't
have to be curled manually) and ceph-common (which has all the important
CLI commands). We should probably uninstall the other ceph
packages.
- We aren't deploying the fulling monitoring (prometheus etc) stack via
cephadm because some of those components are already installed
(node-exporter I think?) and I'm not sure how that was done or the right
way to remove them (or use them as is). Also at the moment there are a
few bugs in the config files for alertmanager and grafana that cephadm is
generating.
sage
If you keep your teuthology checkout in ~/src, this will clean up
checkouts older than 90 days.
find ~/src -mtime +90 -maxdepth 1 -exec rm -rvf {} \;
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway
Hey all,
I've gotten a couple requests in the past 24 hours asking how to "lock"
the new dev machines https://wiki.sepia.ceph.com/doku.php?id=hardware:vossi
These systems aren't in paddles so `teuthology-lock` isn't going to work
here. Is that something you all want?
My understanding is, historically, the rex and senta have been shared
machines where there is a chance devs can step on each others' toes. I
get the desire to have exclusive use of a machine but I don't want to
have to be the one to police machine-hogging.
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway