When I see this problem usually:
- I run pg repair
- I remove the OSD from the cluster
- I replace the disk
- I recreate the OSD on the new disk
Cheers, Massimo
On Wed, May 20, 2020 at 9:41 PM Peter Lewis <plewis(a)kdinfotech.com> wrote:
Hello,
I came across a section of the documentation that I don't quite
understand. In the section about inconsistent PGs it says if one of the
shards listed in `rados list-inconsistent-obj` has a read_error the disk is
probably bad.
Quote from documentation:
https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/…
`If read_error is listed in the errors attribute of a shard, the
inconsistency is likely due to disk errors. You might want to check your
disk used by that OSD.`
I determined that the disk is bad by looking at the output of smartctl. I
would think that replacing the disk by removing the OSD from the cluster
and allowing the cluster to recover would fix this inconsistency error
without having to run `ceph pg repair`.
Can I just replace the OSD and the inconsistency will be resolved by the
recovery? Or would it be better to run `ceph pg repair` and then replace
the OSD associated with that bad disk?
Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io