New subject: Serious cluster issue - Incomplete PGs

9 Jan 2023

Thanks for the insight Eugen.

Here's what basically happened:

- Upgrade from Nautilus to Quincy via migration to new cluster on temp
hardware;
- Data from Nautilus migrated successfully to older / lab-type equipment
running Quincy;
- Nautilus Hardware rebuilt for Quincy, data migrated back;
- As data was migrating we set the older notes to maintenance mode and
started to drain them;
- After several days many OSDs were showing as spinning in "deleting"
status on portal and we were marked OUT;
- This point we made the incorrect assumption those OSDs were no longer
required and proceeded to remove those nodes / OSDs.

I understand Incomplete pages are basically lost.   And it's likely a
lengthy task to attempt to salvage data.

Backups will be challenging.   I honestly didn't anticipate this kind of
failure with ceph to be possible, we've been using it for several years now
and were encouraged by orchestrator and performance improvements in the 17
code branch.

The fact is of the Incomplete pages that have object counts > 0, there's
about 644 GB of data that's tied up in this mess.   There are other
incomplete PGs with object = 0 which I understand can be manually marked as
complete.   The cluster has a data usage of 61 TiB.   Of this I can
categorize about 14TB of critical data, 40 TB of data that is of medium /
high importance.

There's 14TB in RBD images that would be critical on an EC pool there are
other images, however of lower importance at this point;

There's also about a 20TB CephFS file system of lower data importance as
well.

Question - Can you kindly point me to procedures for:

- Identifying the pools / images / files that are affected by incomplete
pages;
- Extracting and reconstructing data for RBD images (these images are XFS
formatted filesystems);
- Extracting and reconstructing data for CephFS Files not affected by
incomplete PGs.

Much appreciated.

------------------------------

Date: Mon, 09 Jan 2023 10:12:49 +0000
From: Eugen Block &lt;eblock(a)nde.ag&gt;
Subject: [ceph-users] Re: Serious cluster issue - Incomplete PGs
To: ceph-users(a)ceph.io
Message-ID:
        &lt;20230109101249.Horde.hAHCWQijFMYLNdX8a2YQDVV(a)webmail.nde.ag&gt;
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes

Hi,

can you clarify what exactly you did to get into this situation? What
about the undersized PGs, any chance to bring those OSDs back online?
Regarding the incomplete PGs I'm not sure there's much you can do if
the OSDs are lost. To me it reads like you may have
destroyed/recreated more OSDs than you should have, just recreating
OSDs with the same IDs is not sufficient if you destroyed too many
chunks. Each OSD only contains a chunk of the PG due to the erasure
coding. I'm afraid those objects are lost and you would have to
restore from backup. To get the cluster into a healthy state again
there a couple of threads, e. g. [1], but recovering the lost chunks
from ceph will probably not work.

Regards,
Eugen

[1] https://www.mail-archive.com/ceph-users@ceph.io/msg14757.html

Zitat von Deep Dish &lt;deeepdish(a)gmail.com&gt;om>:

...
  Hello.   I really screwed up my ceph cluster.   Hoping
to get data off it
 so I can rebuild it.

 In summary, too many changes too quickly caused the cluster to develop
 incomplete pgs.  Some PGS were reporting that OSDs were to be probes.
 I've created those OSD IDs (empty), however this wouldn't clear
 incompletes.   Incompletes are part of EC pools.  Running 17.2.5.

 This is the overall state:

   cluster:

     id:     49057622-69fc-11ed-b46e-d5acdedaae33

     health: HEALTH_WARN

             Failed to apply 1 service(s): osd.dashboard-admin-1669078094056
...

             1 hosts fail cephadm check

             cephadm background work is paused

             Reduced data availability: 28 pgs inactive, 28 pgs incomplete

             Degraded data redundancy: 55 pgs undersized

             2 slow ops, oldest one blocked for 4449 sec, daemons
 [osd.25,osd.50,osd.51] have slow ops.

 These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph
 pg ls incomplete ]:

 2.35     23199         0          0        0  95980273664            0
       0  2477           incomplete    10s  2104'46277   28260:686871
  [44,4,37,3,40,32]p44    [44,4,37,3,40,32]p44
  2023-01-03T03:54:47.821280+0000  2022-12-29T18:53:09.287203+0000
         14  queued for deep scrub
 2.53     22821         0          0        0  94401175552            0
       0  2745  remapped+incomplete    10s  2104'45845   28260:565267
 [60,48,52,65,67,7]p60                 [60]p60
  2023-01-03T10:18:13.388383+0000  2023-01-03T10:18:13.388383+0000
        408  queued for scrub
 2.9f     22858         0          0        0  94555983872            0
       0  2736  remapped+incomplete    10s  2104'45636   28260:759872
  [56,59,3,57,5,32]p56                 [56]p56
  2023-01-03T10:55:49.848693+0000  2023-01-03T10:55:49.848693+0000
        376  queued for scrub
 2.be     22870         0          0        0  94429110272            0
       0  2661  remapped+incomplete    10s  2104'45561   28260:813759
  [41,31,37,9,7,69]p41                 [41]p41
  2023-01-03T14:02:15.790077+0000  2023-01-03T14:02:15.790077+0000
        360  queued for scrub
 2.e4     22953         0          0        0  94912278528            0
       0  2648  remapped+incomplete    20m  2104'46048   28259:732896
 [37,46,33,4,48,49]p37                 [37]p37
  2023-01-02T18:38:46.268723+0000  2022-12-29T18:05:47.431468+0000
         18  queued for deep scrub
 17.78    20169         0          0        0  84517834400            0
       0  2198  remapped+incomplete    10s  3735'53405  28260:1243673
  [4,37,2,36,66,0]p4                 [41]p41
  2023-01-03T14:21:41.563424+0000  2023-01-03T14:21:41.563424+0000
        348  queued for scrub
 17.d8    20328         0          0        0  85196053130            0
       0  1852  remapped+incomplete    10s  3735'54458  28260:1309564
  [38,65,61,37,58,39]p38                 [53]p53
  2023-01-02T18:32:35.371071+0000  2022-12-28T19:08:29.492244+0000
         21  queued for deep scrub

 At present I'm unable to reliably access my data due to incomplete pages
 above.  I'll post whatever outputs requested (won't post now as it can be
 rather verbose).  Is there hope?