I think a few other things that could help would be `ceph osd df tree` which will show the hierarchy across different crush domains.
And if you’re doing something like erasure coded pools, or something other than replication 3, maybe `ceph osd crush rule dump` may provide some further context with the tree output.

Also, the cluster is running Luminous (12) which went EOL 3 years ago tomorrow.
So there are also likely a good bit of improvements all around under the hood to be gained by moving forward from Luminous.
Though, I would say take care of the scrub errors prior to doing any major upgrades, as well as checking your upgrade path (can only upgrade two releases at a time, if you have filestore OSDs, etc).

-Reed

On Feb 28, 2023, at 11:12 AM, Dave Ingram <dave@adaptable.sh> wrote:

There is a
lot of variability in drive sizes - two different sets of admins added
disks sized between 6TB and 16TB and I suspect this and imbalanced
weighting is to blame.

CEPH OSD DF:

(not going to paste that all in here): https://pastebin.com/CNW5RKWx

What else am I missing in terms of what to share with you all?

Thanks all,
-Dave
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io