[ceph-users] Re: OSD state<Start>: transitioning to Stray

9 Dec 2019

An OSD that is down does not recover or backfill. Faster recovery or
backfill will not resolve down OSDs

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Dec 9, 2019 at 1:42 PM Thomas Schneider &lt;74cmonty(a)gmail.com&gt; wrote:

...
  Hi,

 I think I can speed-up the recovery / backfill.

 What is the recommended setting for
 osd_max_backfills
 osd_recovery_max_active
 ?

 THX

 Am 09.12.2019 um 13:36 schrieb Paul Emmerich:
  This message is expected.

 But your current situation is a great example of why having a separate
 cluster network is a bad idea in most situations.
 First thing I'd do in this scenario is to get rid of the cluster
 network and see if that helps

 Paul

 --
 Paul Emmerich

 Looking for help with your Ceph cluster? Contact us at https://croit.io

 croit GmbH
 Freseniusstr. 31h
 81247 München
 www.croit.io <http://www.croit.io>
 Tel: +49 89 1896585 90

 On Mon, Dec 9, 2019 at 11:22 AM Thomas Schneider &lt;74cmonty(a)gmail.com
 <mailto:74cmonty@gmail.com>> wrote:

     Hi,
     I had a failure on 2 of 7 OSD nodes.
     This caused a server reboot and unfortunately the cluster network
     failed
     to come up.

     This resulted in many OSD down situation.

     I decided to stop all services (OSD, MGR, MON) and to start them
     sequentially.

     Now I have multiple OSD marked as down although the service is
     running.
     None of these down OSDS is connected to the 2 nodes with failure.

     In the OSD logs I can see multiple entries like this:
     2019-12-09 11:13:10.378 7f9a372fb700  1 osd.374 pg_epoch: 493189
     pg[11.1992( v 457986'92619 (303558'88266,457986'92619]
     local-lis/les=466724/466725 n=4107 ec=8346/8346 lis/c 466724/466724
     les/c/f 466725/466725/176266 468956/493184/468423) [203,412] r=-1
     lpr=493184 pi=[466724,493184)/1 crt=457986'92619 lcod 0'0 unknown
     NOTIFY
     mbc={}] state<Start>: transitioning to Stray

     I tried to restart the impacted OSD w/o success, means the
     relevant OSD
     is still marked as down.

     Is there a procedure to overcome this issue, means getting all OSD  up?

     THX
     _______________________________________________
     ceph-users mailing list -- ceph-users(a)ceph.io
     <mailto:ceph-users@ceph.io>
     To unsubscribe send an email to ceph-users-leave(a)ceph.io
     <mailto:ceph-users-leave@ceph.io>

2024

2023

2022

2021

2020

2019

[ceph-users] Re: OSD state<Start>: transitioning to Stray