Can you post the output of these commands:
ceph osd pool ls detail
ceph osd tree
ceph osd crush rule dump
-----Original Message-----
From: Frank Schilder <frans(a)dtu.dk>
Sent: Monday, August 3, 2020 9:19 AM
To: ceph-users <ceph-users(a)ceph.io>
Subject: [ceph-users] Re: Ceph does not recover from OSD restart
After moving the newly added OSDs out of the crush tree and back in again, I get to
exactly what I want to see:
cluster:
id: e4ece518-f2cb-4708-b00f-b6bf511e91d9
health: HEALTH_WARN
norebalance,norecover flag(s) set
53030026/1492404361 objects misplaced (3.553%)
1 pools nearfull
services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-01(active), standbys: ceph-03, ceph-02
mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay
osd: 297 osds: 272 up, 272 in; 307 remapped pgs
flags norebalance,norecover
data:
pools: 11 pools, 3215 pgs
objects: 177.3 M objects, 489 TiB
usage: 696 TiB used, 1.2 PiB / 1.9 PiB avail
pgs: 53030026/1492404361 objects misplaced (3.553%)
2902 active+clean
299 active+remapped+backfill_wait
8 active+remapped+backfilling
5 active+clean+scrubbing+deep
1 active+clean+snaptrim
io:
client: 69 MiB/s rd, 117 MiB/s wr, 399 op/s rd, 856 op/s wr
Why does a cluster with remapped PGs not survive OSD restarts without loosing track of
objects?
Why is it not finding the objects by itself?
A power outage of 3 hosts will halt everything for no reason until manual intervention.
How can I avoid this problem?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans(a)dtu.dk>
Sent: 03 August 2020 15:03:05
To: ceph-users
Subject: [ceph-users] Ceph does not recover from OSD restart
Dear cephers,
I have a serious issue with degraded objects after an OSD restart. The cluster was in a
state of re-balancing after adding disks to each host. Before restart I had "X/Y
objects misplaced". Apart from that, health was OK. I now restarted all OSDs of one
host and the cluster does not recover from that:
cluster:
id: xxx
health: HEALTH_ERR
45813194/1492348700 objects misplaced (3.070%)
Degraded data redundancy: 6798138/1492348700 objects degraded (0.456%), 85 pgs
degraded, 86 pgs undersized
Degraded data redundancy (low space): 17 pgs backfill_toofull
1 pools nearfull
services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-01(active), standbys: ceph-03, ceph-02
mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay
osd: 297 osds: 272 up, 272 in; 307 remapped pgs
data:
pools: 11 pools, 3215 pgs
objects: 177.3 M objects, 489 TiB
usage: 696 TiB used, 1.2 PiB / 1.9 PiB avail
pgs: 6798138/1492348700 objects degraded (0.456%)
45813194/1492348700 objects misplaced (3.070%)
2903 active+clean
209 active+remapped+backfill_wait
73 active+undersized+degraded+remapped+backfill_wait
9 active+remapped+backfill_wait+backfill_toofull
8 active+undersized+degraded+remapped+backfill_wait+backfill_toofull
4 active+undersized+degraded+remapped+backfilling
3 active+remapped+backfilling
3 active+clean+scrubbing+deep
1 active+clean+scrubbing
1 active+undersized+remapped+backfilling
1 active+clean+snaptrim
io:
client: 47 MiB/s rd, 61 MiB/s wr, 732 op/s rd, 792 op/s wr
recovery: 195 MiB/s, 48 objects/s
After restarting there should only be a small number of degraded objects, the ones that
received writes during OSD restart. What I see, however, is that the cluster seems to have
lost track of a huge amount of objects, the 0.456% degraded are 1-2 days worth of I/O. I
did reboots before and saw only a few thousand objects degraded at most. The output of
ceph health detail shows a lot of lines like these:
[root@gnosis ~]# ceph health detail
HEALTH_ERR 45804316/1492356704 objects misplaced (3.069%); Degraded data redundancy:
6792562/1492356704 objects degraded (0.455%), 85 pgs degraded, 86 pgs undersized; Degraded
data redundancy (low space): 17 pgs backfill_toofull; 1 pools nearfull OBJECT_MISPLACED
45804316/1492356704 objects misplaced (3.069%) PG_DEGRADED Degraded data redundancy:
6792562/1492356704 objects degraded (0.455%), 85 pgs degraded, 86 pgs undersized
pg 11.9 is stuck undersized for 815.188981, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[60,148,2147483647,263,76,230,87,169]
8...9
pg 11.48 is active+undersized+degraded+remapped+backfill_wait, acting
[159,60,180,263,237,3,2147483647,72]
pg 11.4a is stuck undersized for 851.162862, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[182,233,87,228,2,180,63,2147483647]
[...]
pg 11.22e is stuck undersized for 851.162402, current state
active+undersized+degraded+remapped+backfill_wait+backfill_toofull, last acting
[234,183,239,2147483647,170,229,1,86]
PG_DEGRADED_FULL Degraded data redundancy (low space): 17 pgs backfill_toofull
pg 11.24 is active+undersized+degraded+remapped+backfill_wait+backfill_toofull, acting
[230,259,2147483647,1,144,159,233,146]
[...]
pg 11.1d9 is active+remapped+backfill_wait+backfill_toofull, acting
[84,259,183,170,85,234,233,2]
pg 11.225 is active+undersized+degraded+remapped+backfill_wait+backfill_toofull,
acting [236,183,1,2147483647,2147483647,169,229,230]
pg 11.22e is active+undersized+degraded+remapped+backfill_wait+backfill_toofull,
acting [234,183,239,2147483647,170,229,1,86]
POOL_NEAR_FULL 1 pools nearfull
pool 'sr-rbd-data-one-hdd' has 164 TiB (max 200 TiB)
It looks like a lot of PGs are not receiving theire complete crush map placement, as if
the peering is incomplete. This is a serious issue, it looks like the cluster will see a
total storage loss if just 2 more hosts reboot - without actually having lost any storage.
The pool in question is a 6+2 EC pool.
What is going on here? Why are the PG-maps not restored to their values from before the
OSD reboot? The degraded PGs should receive the missing OSD IDs, everything is up exactly
as it was before the reboot.
Thanks for your help and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io _______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io