PGs down - ceph-users

13 Dec 2020

I could use some input from more experienced folks…

First time seeing this behavior. I've been running ceph in production
(replicated) since 2016 or earlier.

This, however, is a small 3-node cluster for testing EC. Crush map rules
should sustain the loss of an entire node.
Here's the EC rule:

rule cephfs425 { id 6 type erasure min_size 3 max_size 6 step
set_chooseleaf_tries 40 step set_choose_tries 400 step take default step
choose indep 3 type host step choose indep 2 type osd step emit }

I had actual hardware failure on one node. Interestingly, this appears to
have resulted in data loss. OSDs began to crash in a cascade on other nodes
(i.e., nodes with no known hardware failure). Not a low RAM problem.

I could use some pointers about how to get the down PGs back up — I *think*
there are enough EC shards, even disregarding the OSDs that crash on start.

nautilus 14.2.15

 ceph osd tree
ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
 -1       54.75960 root default
-10       16.81067     host sumia
  1   hdd  5.57719         osd.1       up  1.00000 1.00000
  5   hdd  5.58469         osd.5       up  1.00000 1.00000
  6   hdd  5.64879         osd.6       up  1.00000 1.00000
 -7       16.73048     host sumib
  0   hdd  5.57899         osd.0       up  1.00000 1.00000
  2   hdd  5.56549         osd.2       up  1.00000 1.00000
  3   hdd  5.58600         osd.3       up  1.00000 1.00000
 -3       21.21844     host tower1
  4   hdd  3.71680         osd.4       up        0 1.00000
  7   hdd  1.84799         osd.7       up  1.00000 1.00000
  8   hdd  3.71680         osd.8       up  1.00000 1.00000
  9   hdd  1.84929         osd.9       up  1.00000 1.00000
 10   hdd  2.72899         osd.10      up  1.00000 1.00000
 11   hdd  3.71989         osd.11    down        0 1.00000
 12   hdd  3.63869         osd.12    down        0 1.00000

  cluster:
    id:     d0b4c175-02ba-4a64-8040-eb163002cba6
    health: HEALTH_ERR
            1 MDSs report slow requests
            4/4239345 objects unfound (0.000%)
            Too many repaired reads on 3 OSDs
            Reduced data availability: 7 pgs inactive, 7 pgs down
            Possible data damage: 4 pgs recovery_unfound
            Degraded data redundancy: 95807/24738783 objects degraded
(0.387%), 4 pgs degraded, 3 pgs undersized
            7 pgs not deep-scrubbed in time
            7 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum sumib,tower1,sumia (age 4d)
    mgr: sumib(active, since 7d), standbys: sumia, tower1
    mds: cephfs:1 {0=sumib=up:active} 2 up:standby
    osd: 13 osds: 11 up (since 3d), 10 in (since 4d); 3 remapped pgs

  data:
    pools:   5 pools, 256 pgs
    objects: 4.24M objects, 15 TiB
    usage:   24 TiB used, 24 TiB / 47 TiB avail
    pgs:     2.734% pgs not active
             95807/24738783 objects degraded (0.387%)
             47910/24738783 objects misplaced (0.194%)
             4/4239345 objects unfound (0.000%)
             245 active+clean
             7   down
             3   active+recovery_unfound+undersized+degraded+remapped
             1   active+recovery_unfound+degraded+repair

  progress:
    Rebalancing after osd.12 marked out
      [============================..]
    Rebalancing after osd.4 marked out
      [=============================.]

An snipped from an example down pg:
    "up": [
        3,
        2,
        5,
        1,
        8,
        9
    ],
    "acting": [
        3,
        2,
        5,
        1,
        8,
        9
    ],
<snip>
         ],
            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                11,
                12
            ],
            "peering_blocked_by": [
                {
                    "osd": 11,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let
us proceed"
                },
                {
                    "osd": 12,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let
us proceed"
                }
            ]
        },
        {

Oddly, these OSDs possibly did NOT experience hardware failure. However,
they won't start -- see pastebin for ceph-osd.11.log

https://pastebin.com/6U6sQJuJ

HEALTH_ERR 1 MDSs report slow requests; 4/4239345 objects unfound (0.000%);
Too many repaired reads on 3 OSDs; Reduced data availability
: 7 pgs inactive, 7 pgs down; Possible data damage: 4 pgs recovery_unfound;
Degraded data redundancy: 95807/24738783 objects degraded (0
.387%), 4 pgs degraded, 3 pgs undersized; 7 pgs not deep-scrubbed in time;
7 pgs not scrubbed in time
MDS_SLOW_REQUEST 1 MDSs report slow requests
    mdssumib(mds.0): 42 slow requests are blocked > 30 secs
OBJECT_UNFOUND 4/4239345 objects unfound (0.000%)
    pg 19.5 has 1 unfound objects
    pg 15.2f has 1 unfound objects
    pg 15.41 has 1 unfound objects
    pg 15.58 has 1 unfound objects
OSD_TOO_MANY_REPAIRS Too many repaired reads on 3 OSDs
    osd.9 had 9664 reads repaired
    osd.7 had 9665 reads repaired
    osd.4 had 12 reads repaired
PG_AVAILABILITY Reduced data availability: 7 pgs inactive, 7 pgs down
    pg 15.10 is down, acting [3,2,5,1,8,9]
    pg 15.1e is down, acting [5,1,9,8,2,3]
    pg 15.40 is down, acting [7,10,1,5,3,2]
    pg 15.4a is down, acting [0,3,5,6,9,10]
    pg 15.6a is down, acting [3,2,6,1,10,8]
    pg 15.71 is down, acting [3,2,1,6,8,10]
    pg 15.76 is down, acting [2,0,6,5,10,9]
PG_DAMAGED Possible data damage: 4 pgs recovery_unfound
    pg 15.2f is active+recovery_unfound+undersized+degraded+remapped,
acting [5,1,0,3,2147483647,7], 1 unfound
    pg 15.41 is active+recovery_unfound+undersized+degraded+remapped,
acting [5,1,0,3,2147483647,2147483647], 1 unfound
    pg 15.58 is active+recovery_unfound+undersized+degraded+remapped,
acting [10,2147483647,2,3,1,5], 1 unfound
    pg 19.5 is active+recovery_unfound+degraded+repair, acting
[3,2,5,1,8,10], 1 unfound
PG_DEGRADED Degraded data redundancy: 95807/24738783 objects degraded
(0.387%), 4 pgs degraded, 3 pgs undersized
    pg 15.2f is stuck undersized for 635305.932075, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[5,1,0,3,2147483647,7]
    pg 15.41 is stuck undersized for 364298.836902, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[5,1,0,3,2147483647,2147483647]
    pg 15.58 is stuck undersized for 384461.110229, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[10,2147483647,2,3,1,5]
    pg 19.5 is active+recovery_unfound+degraded+repair, acting
[3,2,5,1,8,10], 1 unfound
PG_NOT_DEEP_SCRUBBED 7 pgs not deep-scrubbed in time
    pg 15.76 not deep-scrubbed since 2020-10-21 14:30:03.935228
    pg 15.71 not deep-scrubbed since 2020-10-21 12:20:46.235792
    pg 15.6a not deep-scrubbed since 2020-10-21 07:52:33.914083
    pg 15.10 not deep-scrubbed since 2020-10-22 03:24:40.465367
    pg 15.1e not deep-scrubbed since 2020-10-22 10:37:36.169959
    pg 15.40 not deep-scrubbed since 2020-10-23 05:33:35.208748
    pg 15.4a not deep-scrubbed since 2020-10-22 05:14:06.981035
PG_NOT_SCRUBBED 7 pgs not scrubbed in time
    pg 15.76 not scrubbed since 2020-10-24 08:12:40.090831
    pg 15.71 not scrubbed since 2020-10-25 05:22:40.573572
    pg 15.6a not scrubbed since 2020-10-24 15:03:09.189964
    pg 15.10 not scrubbed since 2020-10-24 16:25:08.826981
    pg 15.1e not scrubbed since 2020-10-24 16:05:03.080127
    pg 15.40 not scrubbed since 2020-10-24 11:58:04.290488
    pg 15.4a not scrubbed since 2020-10-24 11:32:44.573551
-- 
Jeremy Austin
jhaustin(a)gmail.com