There shouldn’t really be too much of a noticeable performance hit.
Some good documentation
here.
The general feeling is that we're stuck on luminous and that it's destructive to upgrade to anything else.
I refuse to believe that is true.
At least if we upgraded everything to 12.2.3 we'd have the 'balancer' stuff that came with I think 12.2.2...
Upgrades are definitely not destructive, however, they also aren’t trivial.
You can upgrade 2 releases at a time, but the distro’s those packages are for may vary release to release.
For example, if you were to want to get to Quincy from Luminous, you should be able to step from Luminous (12) to Nautilus (14), then to Pacific (16), and on to Quincy (17) if you wanted.
However, your Luminous install may be on Ubuntu 14.04 or 16.04, which you can immediately move to Nautilus with.
To get to Pacific, you’re going to then need to move to Ubuntu 18.04 (Nautilus compatible), and then on to Pacific.
If you then wanted to move to Quincy, you then need to upgrade to Ubuntu 20.04, before moving on to Quincy with 20.04.
This probably sounds daunting, and it is certainly non-trivial, but definitely doable if you take things in small steps, and should be possible with no downtime if planned out.
Also, there seems to be a belief that bluestore is an 'all-or-nothing' proposition
Yet I see that you can have a mixture of both in your deployments
You can mix filestore and bluestore OSDs in your cluster, however —
[…] and that it's impossible to migrate from filestore to bluestore.
[…] and it's indeed possible to migrate from filestore to bluestore.
If you have filestore OSDs, the only way to migrate them to bluestore is by destroying the OSD, and recreating it as bluestore, see
here.
This can be a time consuming process if you drain an OSD, let it backfill off, blow it away, recreate, and then bring data back.
This can also prove to be IO expensive as well if your ceph cluster is already IO saturated, due to all of the backfill IO on top of the client IO.
TL;DR -- there is a *lot* of fear of touching this thing because nobody is truly an 'expert' in it atm.
But not touching it is why we've gotten ourselves into a situation with broken stuff and horrendous performance.
Given how critical (and brittle) this infrastructure is sounding to your org, it might be best to pull in some
experts, and I think most on the mailing list would likely recommend Croit as a good place to start outside of any existing support contracts.
Hope thats helpful,
Reed