Hi all
I have a Ceph cluster (Nautilus 14.2.11) with 3 Ceph nodes.
A crash happened and all 3 Ceph nodes went down.
One (1) PG turned "active+clean+inconsistent", I tried to repair it. After the
repair, now shows "active+clean+inconsistent+failed_repair" for the PG in the
question and cannot bring the cluster to "active+clean".
How do I rescue the cluster? Is this a false positive?
Here are the detail:
All three Ceph nodes run ceph-mon, ceph-mgr, ceph-osd and ceph-mds.
1. ceph -s
health: HEALTH_ERR 3 scrub errors Possible data damage: 1 pg
inconsistent
pgs: 191 active+clean 1 active+clean+inconsistent
2. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg
inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg
inconsistent pg 3.b is active+clean+inconsistent, acting [0,1,2]
3. rados list-inconsistent-pg rbd[]
4. ceph pg deep-scrub 3.b
5. ceph pg repair 3.b
6. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg
inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg
inconsistent pg 3.b is active+clean+inconsistent+failed_repair, acting [0,1,2]
7. rados list-inconsistent-obj 3.b --format=json-pretty{ "epoch": 4769,
"inconsistents": []}
8. ceph pg 3.b list_unfound { "num_missing": 0, "num_unfound":
0, "objects": [], "more": false}
Appreciate your help.
ThanksSagara