Historically I have often but not always found that removing / destroying the affected OSD
would clear the inconsistent PG. At one point the logged message was clear about who
reported and who was the perp, but then a later release broke that. Not sure what recent
releases say, since with Luminous I rarely saw them. Perhaps HDD behavior is more
conducive.
Depending on your device, unrecovered read errors may not warrant replacement — they often
represent routine slipped/ reallocated blocks. In such cases rewriting the data is
sufficient. With older releases redeploying the OSD ( or surgically excising affected
data ) would suffice.
With Nautilus I’m told that ceph-osd ( or BlueStore? ) will rewrite automagically and the
OSD will not need to be reprovisioned. It would still be a good idea to keep an eye on
escalating rates of reallocation / dwindling spares | percentage spares remaining. One
SSD mfg told me that when remaining spares get down to 13% performance will be impacted by
10% and the drive should be considered about to fail.
I’ve seen both an HDD model and an SSD model with design / firmware flaws that were
tickled by specific Ceph access patterns, so if you experience a pandemic there may be
more to it.
> On May 23, 2020, at 3:18 AM, Massimo Sgaravatto <massimo.sgaravatto(a)gmail.com>
wrote:
>
> When I see this problem usually:
>
> - I run pg repair
> - I remove the OSD from the cluster
> - I replace the disk
> - I recreate the OSD on the new disk
>
> Cheers, Massimo
>
>> On Wed, May 20, 2020 at 9:41 PM Peter Lewis <plewis(a)kdinfotech.com> wrote:
>>
>> Hello,
>>
>> I came across a section of the documentation that I don't quite
>> understand. In the section about inconsistent PGs it says if one of the
>> shards listed in `rados list-inconsistent-obj` has a read_error the disk is
>> probably bad.
>>
>> Quote from documentation:
>>
>>
https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/…
>> `If read_error is listed in the errors attribute of a shard, the
>> inconsistency is likely due to disk errors. You might want to check your
>> disk used by that OSD.`
>>
>> I determined that the disk is bad by looking at the output of smartctl. I
>> would think that replacing the disk by removing the OSD from the cluster
>> and allowing the cluster to recover would fix this inconsistency error
>> without having to run `ceph pg repair`.
>>
>> Can I just replace the OSD and the inconsistency will be resolved by the
>> recovery? Or would it be better to run `ceph pg repair` and then replace
>> the OSD associated with that bad disk?
>>
>> Thanks!
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io