Dear Michael,
I was also wondering whether deleting the broken pool could clean up everything. The
difficulty is, that while migrating a pool to new devices is easy via a crush rule change,
migrating data between pools is not so easy. In particular, if you can't afford
downtime.
In case you can afford some downtime, it might be possible to migrate fast by creating a
new pool and use the pool copy command to migrate the data (rados cppool ...). Its
important that the FS is shutdown (no MDS active) during this copy process. After copy,
one could either rename the pools to have the copy match the fs data pool name, or change
the data pool at the top level directory. You might need to set some pool meta data by
hand, notably, the fs tag.
Having said that, I have no idea how a ceph fs reacts if presented with a replacement data
pool. Although I don't believe that meta data contains the pool IDs, I cannot exclude
that complication. The copy pool variant should be tested with an isolated FS first.
The other option is what you describe, create a new data pool, make the fs root placed on
this pool and copy every file onto itself. This should also do the trick. However, with
this method you will not be able to get rid of the broken pool. After the copy, you could,
however, reduce the number of PGs to below the unhealthy one and the broken PG(s) might
get deleted cleanly. Then you still have a surplus pool, but at least all PGs are clean.
I hope one of these will work. Please post your experience here.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Michael Thomas <wart(a)caltech.edu>
Sent: 22 November 2020 18:29:16
To: Frank Schilder; ceph-users(a)ceph.io
Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects
On 10/23/20 3:07 AM, Frank Schilder wrote:
Hi Michael.
I still don't see any traffic to the pool,
though I'm also unsure how much traffic is to be expected.
Probably not much. If ceph df shows that the pool contains some objects, I guess
that's sorted.
That osdmaptool crashes indicates that your cluster runs with corrupted internal data. I
tested your crush map and you should get complete PGs for the fs data pool. That you
don't and that osdmaptool crashes points at a corruption of internal data. I'm
afraid this is the point where you need support from ceph developers and should file a
tracker report (
https://tracker.ceph.com/projects/ceph/issues). A short description of the
origin of the situation with the osdmaptool output and a reference to this thread linked
in should be sufficient. Please post a link to the ticket here.
https://tracker.ceph.com/issues/48059
In parallel, you should probably open a new thread
focussed on the osd map corruption. Maybe there are low-level commands to repair it.
Will do.
You should wait with trying to clean up the unfound
objects until this is resolved. Not sure about adding further storage either. To me, this
sounds quite serious.
Another approach that I'm considering is to create a new pool using the
same set of OSDs, adding it to the set of cephfs data pools, and
migrating the data from the "broken" pool to the new pool.
I have some additional unused storage that I could add to this new pool,
if I can figure out the right crush rules to make sure they don't get
used for the "broken" pool too.
--Mike