PGs degraded after osd restart - ceph-users

29 May 2020

Hi cephists,

We have a 10 node cluster running Nautilus 14.2.9

All objects are on EC pool. We have mgr balancer plugin in upmap mode
doing it's rebalancing:

    health: HEALTH_OK
    pgs:
             1985 active+clean
             190  active+remapped+backfilling
             65   active+remapped+backfill_wait
  io:
    client:   0 B/s wr, 0 op/s rd, 0 op/s wr
    recovery: 770 MiB/s, 463 objects/s

We have restarted osd.0 on one of our OSD nodes, and this was the
status immediately after:

```
    health: HEALTH_WARN
            1 osds down
            Degraded data redundancy: 4531479/531067647 objects
degraded (0.853%), 109 pgs degraded
```

Then OSD became UP again:

```
    health: HEALTH_WARN
            Degraded data redundancy: 4963207/531067545 objects
degraded (0.935%), 120 pgs degraded
```

And after a minute or so has passed it settled on:

```
    health: HEALTH_WARN
            Degraded data redundancy: 295515/531067347 objects
degraded (0.056%), 10 pgs degraded, 10 pgs undersized
```

upmap balancer was running during osd.0 restart, the restart was
successfull, without any issues.

This left us wondering - how could  a simple osd restart cause
degraded PGs?  Could this be related to the upmap balancer running?

Thanks!

-- 
Vyteni