Hi,
We've let our Ceph pool (Octopus) get into a bad state, with around 90%
full:
# ceph health
HEALTH_ERR 1/4 mons down, quorum
angussyd-kvm01,angussyd-kvm02,angussyd-kvm03; 3 backfillfull osd(s); 1 full
osd(s); 14 nearfull osd(s); Low space hindering backfill (add storage if
this doesn't resolve itself): 580 pgs backfill_toofull; Degraded data
redundancy: 1860769/9916650 objects degraded (18.764%), 597 pgs degraded,
580 pgs undersized; 323 pgs not deep-scrubbed in time; 189 pgs not scrubbed
in time; Full OSDs blocking recovery: 17 pgs recovery_toofull; 4 pool(s)
full; 1 pools have too many placement groups
At this point, even trying to run 'rbd rm" or "rbd du" seems to time
out.
(I am however, able to run "rbd ls -l" which shows me rbd image size - I
assume that's before taking into account thin-provisioning).
Is there any way to rescue this pool? Or at least some way to force delete
some of the large images?
Regards,
Victor