[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2 Nov 2020

Hi Sagra,

looks like you have one on a new and 2 on an old version. Can you add the information
about which OSD each version resides?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Sagara Wijetunga &lt;sagarawmw(a)yahoo.com&gt;
Sent: 02 November 2020 10:10:02
To: ceph-users(a)ceph.io; Frank Schilder
Subject: Re: [ceph-users] Re: How to recover from
active+clean+inconsistent+failed_repair?

Hi Frank

...
  I'm not sure if my hypothesis can be correct. Ceph
sends an acknowledge of a write only after all copies are on disk. In other words, if PGs
end up on different versions after a power outage, one always needs to roll back. Since
you have two healthy OSDs in the PG and the PG is active (successfully peered), it might
just be a broken disk and read/write errors. I would focus on that. 
I tried to revert the PG as follows:

# ceph pg 3.b query | grep version
        "last_user_version": 2263481,
        "version": "4825'2264303",

        "last_user_version": 2263481,
        "version": "4825'2264301",

        "last_user_version": 2263481,
        "version": "4825'2264301",

ceph pg 3.b list_unfound

{
    "num_missing": 0,
    "num_unfound": 0,
    "objects": [],
    "more": false
}

# ceph pg 3.b mark_unfound_lost revert
pg has no unfound objects

# ceph pg 3.b revert
Invalid command: revert not in query
pg <pgid> query :  show details of a specific pg
Error EINVAL: invalid command

How to revert/rollback a PG?

...
  Another question, do you have write caches enabled
(disk cache and controller cache)? This is know to cause problems on power outages and
also degraded performance with ceph. You should check and disable any caches if necessary.

No. HDD is directly connected to motherboard.

Thank you

Sagara

2024

2023

2022

2021

2020

2019

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?