On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
<zoltan(a)linux.vnet.ibm.com> wrote:
The 2^31-1 in there seems to indicate an overflow
somewhere - the way we
were able to figure out where exactly
is to query the PG and compare the "up" and "acting" sets - only
_one_
of them had the 2^31-1 number in place
of the correct OSD number. We restarted that and the PG started doing
its job and recovered.
no, this value is intentional (and shows up as 'None' on higher level
tools), it means no mapping could be found; check your crush map and
crush rule
Paul
>
> The issue seems to be going back to 2015:
>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001661.html
> however no solution...
>
> I'm more concerned about the cluster not being able to recover (it's a
> 4+2 EC pool across 12 hosts - plenty of room
> to heal) than about the weird print-out.
>
> The VMs who wanted to access data in any of the affected PGs of course
> died.
>
> Are we missing some settings to let the cluster self-heal even for EC
> pools? First EC pool in production :)
>
> Cheers,
> Zoltan
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io