[ceph-users] Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

3 May 2021

Yes Patric,
In the process of killing MDS we are also *killing Monitor along with
OSD,Mgr and RGW*. we are performing Poweroff/Reboot the complete node (with
MDS,Mon,RGW,OSD,Mgr daemon).
Cluster: 2 Nodes with MDS|Mon|RGW|OSD each and third node with 1 Mon.

Note : when I am only stopping the MDS service it takes 4-7 Seconds to
activate and resume the standy MDS Node.

Thanks for your inputs.

 Best Regards,
Lokendra

On Mon, May 3, 2021 at 8:50 PM Patrick Donnelly &lt;pdonnell(a)redhat.com&gt; wrote:

...
  On Mon, May 3, 2021 at 6:36 AM Lokendra Rathour
 &lt;lokendrarathour(a)gmail.com&gt; wrote:

 Hi Team,
 I was setting up the ceph cluster with

    - Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
    - Deployment Type: Active Standby
    - Testing Mode: Failover of MDS Node
    - Setup : Octopus (15.2.7)
    - OS: centos 8.3
    - hardware: HP
    - Ram:  128 GB on each Node
    - OSD: 2 ( 1 tb each)
    - Operation: Normal I/O with mkdir on every 1 second.

 T*est Case: Power-off any active MDS Node for failover to happen*

 *Observation:*
 We have observed that whenever an active MDS Node is down it takes  around*
  40 seconds* to activate the standby MDS Node.
 on further checking the logs for the new-handover MDS Node we have seen
 delay on the basis of following inputs:

    1. 10 second delay after which Mon calls for new Monitor election
       1.  [log]  0 log_channel(cluster) log [INF] : mon.cephnode1 calling
       monitor election 
 In the process of killing the active MDS, are you also killing a monitor?

 --
 Patrick Donnelly, Ph.D.
 He / Him / His
 Principal Software Engineer
 Red Hat Sunnyvale, CA
 GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

-- 
~ Lokendra
www.inertiaspeaks.com
www.inertiagroups.com
skype: lokendrarathour

2024

2023

2022

2021

2020

2019

[ceph-users] Re: [ Ceph MDS MON Config Variables ] Failover Delay issue