dicks are expected to fail, and every once in a while i'll lose one, so
that was expected and didn't come as any surprise to me. Are you
suggesting failed drives almost always stay down and out?
On Thu, Sep 5, 2019 at 11:13 AM Ashley Merrick <singapore(a)amerrick.co.uk>
wrote:
I would suggest checking the logs and seeing the exact
reason its being
marked out.
If the disk is being hit hard and their is heavy I/O delays then Ceph may
see that as a delayed reply outside of the set windows and mark as out.
There is some variables that can be changed to give an OSD more time to
reply to a heartbeat, but I would definitely suggest checking the OSD log
at the time of the disk being marked out to see exactly what's going on.
As the last thing you want to do is just patch an actually issue if there
is one.
---- On Fri, 06 Sep 2019 02:11:06 +0800 * solarflow99(a)gmail.com
<solarflow99(a)gmail.com> * wrote ----
no, I mean ceph sees it as a failure and marks it out for a while
On Thu, Sep 5, 2019 at 11:00 AM Ashley Merrick <singapore(a)amerrick.co.uk>
wrote:
Is your HD actually failing and vanishing from the OS and then coming back
shortly?
Or do you just mean your OSD is crashing and then restarting it self
shortly later?
---- On Fri, 06 Sep 2019 01:55:25 +0800 * solarflow99(a)gmail.com
<solarflow99(a)gmail.com> * wrote ----
One of the things i've come to notice is when HDD drives fail, they often
recover in a short time and get added back to the cluster. This causes the
data to rebalance back and forth, and if I set the noout flag I get a
health warning. Is there a better way to avoid this?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io