[ceph-users] Re: EC PGs stuck activating, 2^31-1 as OSD ID, automatic recovery not kicking in

23 Nov 2019

On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
&lt;zoltan(a)linux.vnet.ibm.com&gt; wrote:

...
  The 2^31-1 in there seems to indicate an overflow
somewhere - the way we
 were able to figure out where exactly
 is to query the PG and compare the "up" and "acting" sets - only
_one_
 of them had the 2^31-1 number in place
 of the correct OSD number. We restarted that and the PG started doing
 its job and recovered. 
no, this value is intentional (and shows up as 'None' on higher level
tools), it means no mapping could be found; check your crush map and
crush rule

Paul

>
> The issue seems to be going back to 2015:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001661.html
> however no solution...
>
> I'm more concerned about the cluster not being able to recover (it's a
> 4+2 EC pool across 12 hosts - plenty of room
> to heal) than about the weird print-out.
>
> The VMs who wanted to access data in any of the affected PGs of course
> died.
>
> Are we missing some settings to let the cluster self-heal even for EC
> pools? First EC pool in production :)
>
> Cheers,
> Zoltan
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: EC PGs stuck activating, 2^31-1 as OSD ID, automatic recovery not kicking in