Hi, thanks to all suggestions.
Right now, it is step by step that works: going to bionic/nautilus …and from that like
Josh noted.
We encountered a problem which I'll post separately .
Best . Götz
Am 03.08.2023 um 15:44 schrieb Beaman, Joshua
<Joshua_Beaman(a)comcast.com>om>:
We went through this exercise, though our starting point was ubuntu 16.04 / nautilus. We
reduced our double builds as follows:
Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus
Upgrade all mons, mgrs., (and rgws optionally) to pacific
Convert each mon, mgr, rgw to cephadm and enable orchestrator
Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster
Drain and rebuild each osd host on focal and pacific
This has the advantage of only having to drain and rebuild the OSD hosts once. Double
building the control cluster hosts isn’t so bad, and orchestrator makes all of the ceph
parts easy once it’s enabled.
The biggest challenge we ran into was:
https://tracker.ceph.com/issues/51652 because we
still had a lot of filestore osds. It’s frustrating, but we managed to get through it
without much client interruption on a dozen prod clusters, most of which were 38 osd hosts
and 912 total osds each. One thing which helped, was, before beginning the osd host
builds, set all of the old osds primary-affinity to something <1. This way when the
new pacific (or octopus) osds join the cluster they will automatically be favored for
primary on their pgs. If a heartbeat timeout storm starts to get out of control, start by
setting nodown and noout. The flapping osds are the worst. Then figure out which osds
are the culprit and restart them.
Hopefully your nautilus osds are all bluestore and you won’t have this problem. We put
up with it, because the filestore to bluestore conversion was one of the most important
parts of this upgrade for us.
Best of luck, whatever route you take.
Regards,
Josh Beaman