[ceph-users] Re: Slow peering caused by "wait for new map"

4 Sep 2019

Btw: After the storm, I highly suggest you to consider to use Jumboframe.
It works like a charm.

[]'s
Arthur (aKa Guilherme Geronimo)

On 04/09/2019 15:50, Guilherme Geronimo wrote:
...

 I see that you have many inactive PGs, probably because the 6 OSD 
 OUT+DOWN.

 Problems with "flapping" OSD I use to solved:

 * setting NOUP flag
 * restarting the "fragile" OSDs
 * check if everything is ok look ing their logs
 * taking off the NOUP flag

 Another solution is:
 * Setting NOIN and NOUP flag
 * Taking the fragile OSD out
 * restarting the "fragile" OSDs
 * check if everything is ok look ing their logs
 * taking off the NOUP flag
 * Take a coffee and wait till all data are drain

 []'s
 Arthur (aKa Guilherme Geronimo)
 On 04/09/2019 15:32, Bryan Stillwell wrote:
> We are not using jumbo frames anywhere on this cluster (all mtu 
> 1500).  The cluster was originally built in October of 2016 and has 
> the following history:
>
> 2016-10-04: Created with Hammer (0.94.3)
> 2017-05-03: Upgraded to Hammer (0.94.10)
> 2017-10-09: Upgraded to Jewel (10.2.9)
> 2017-11-02: Upgraded to Jewel (10.2.10)
> 2018-04-30: Upgraded to Luminous (12.2.5)
> 2018-09-05: Upgraded to Luminous (12.2.8)
> 2019-04-05: Upgraded to Luminous (12.2.11)
> 2019-04-18: Upgraded to Luminous (12.2.12)
> 2019-07-26: Upgraded to Nautilus (14.2.2)
>
> It wasn't until after the Nautilus upgrade when this problem started 
> showing up.
>
> Here's the output you requested:
>
> [root@a2mon002 ~]# ceph -s
>   cluster:
>     id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
>     health: HEALTH_ERR
> nodown,norebalance,noscrub,nodeep-scrub flag(s) set
>             1 nearfull osd(s)
>             19 pool(s) nearfull
>             1 scrub errors
>             Reduced data availability: 6014 pgs inactive, 3 pgs down, 
> 5958 pgs peering, 83 pgs stale
>             Possible data damage: 1 pg inconsistent
>             Degraded data redundancy: 1601/81648846 objects degraded 
> (0.002%), 4 pgs degraded, 5 pgs undersized
>             1048 slow requests are blocked > 32 sec
>
>   services:
>     mon: 3 daemons, quorum a2mon002,a2mon003,a2mon004 (age 17m)
>     mgr: a2mon004(active, since 53m), standbys: a2mon003, a2mon002
>     mds: cephfs:2 {0=a2mon004=up:active(laggy or 
> crashed),1=a2mon003=up:active(laggy or crashed)} 1 up:standby
>     osd: 143 osds: 141 up, 137 in; 486 remapped pgs
>          flags nodown,norebalance,noscrub,nodeep-scrub
>
>   data:
>     pools:   20 pools, 6288 pgs
>     objects: 27.22M objects, 98 TiB
>     usage:   308 TiB used, 114 TiB / 422 TiB avail
>     pgs:     0.048% pgs unknown
>              95.611% pgs not active
>              1601/81648846 objects degraded (0.002%)
>              53012/81648846 objects misplaced (0.065%)
>              5379 peering
>              495  remapped+peering
>              269  active+clean
>              75   stale+peering
>              46   activating
>              7    stale+remapped+peering
>              3    unknown
>              3    active+undersized+degraded
>              3    down
>              2    activating+remapped
>              1    activating+undersized
>              1    active+clean+scrubbing
>              1  remapped+inconsistent+peering
>              1  activating+undersized+degraded
>              1    stale+activating
>              1    creating+peering
>
> [root@a2mon002 ~]# ceph versions
> {
>     "mon": {
>         "ceph version 14.2.2 
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3
>     },
>     "mgr": {
>         "ceph version 14.2.2 
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3
>     },
>     "osd": {
>         "ceph version 14.2.2 
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 141
>     },
>     "mds": {
>         "ceph version 14.2.2 
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 1
>     },
>     "overall": {
>         "ceph version 14.2.2 
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 148
>     }
> }
>
>
> We had seen the slow peering shortly after the Nautilus upgrade, but 
> it eventually recovered.  We then started filling the cluster up to 
> test another Nautilus bug (https://tracker.ceph.com/issues/41255), 
> but then a disk started to die (which caused the inconsistent PG). 
>  When we marked it out we ran into this peering problem again, but it 
> seems much worse this time.
>
> Thanks,
> Bryan
>
>> On Sep 4, 2019, at 11:55 AM, Guilherme Geronimo 
>> &lt;guilherme.geronimo(a)gmail.com <mailto:guilherme.geronimo@gmail.com>>

>> wrote:
>>
>> Notice: This email is from an external sender.
>>
>>
>>
>> Hey Bryan,
>>
>> I suppose all nodes are using jumboframes (mtu 9000), right?
>> I would suggest to check OSD->MON communication.
>>
>> Can you send the output os these commands for us?
>> * ceph -s
>> * ceph versions
>>
>> []'s
>> Arthur (aKa Guilherme Geronimo)
>>
>> On 04/09/2019 14:18, Bryan Stillwell wrote:
>>> Our test cluster is seeing a problem where peering is going 
>>> incredibly slow shortly after upgrading it to Nautilus (14.2.2) 
>>> from Luminous (12.2.12).
>>>
>>> From what I can tell it seems to be caused by "wait for new map" 
>>> taking a long time.  When looking at dump_historic_slow_ops on 
>>> pretty much any OSD I see stuff like this:
>>>
>>> # ceph daemon osd.112 dump_historic_slow_ops
>>> [...snip...]
>>>        {
>>>            "description": "osd_pg_create(e180614 287.4b:177739

>>> 287.75:177739 287.1c3:177739 287.1cf:177739 287.1e1:177739 
>>> 287.2dd:177739 287.2fc:177739 287.342:177739 287.382:177739)",
>>>            "initiated_at": "2019-09-03 15:12:41.366514",
>>>            "age": 4800.8847047119998,
>>>            "duration": 4780.0579745630002,
>>>            "type_data": {
>>>                "flag_point": "started",
>>>                "events": [
>>>                    {
>>>                        "time": "2019-09-03
15:12:41.366514",
>>>                        "event": "initiated"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:12:41.366514",
>>>                        "event": "header_read"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:12:41.366501",
>>>                        "event": "throttled"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:12:41.366547",
>>>                        "event": "all_read"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:39:03.379456",
>>>                        "event": "dispatched"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:39:03.379477",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:39:03.522376",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:53:55.912499",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
15:59:37.909063",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
16:00:43.356023",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
16:20:50.575498",
>>>                        "event": "wait for new map"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
16:31:48.689415",
>>>                        "event": "started"
>>>                    },
>>>                    {
>>>                        "time": "2019-09-03
16:32:21.424489",
>>>                        "event": "done"
>>>                    }
>>>                ]
>>>            }
>>>
>>> It always seems to be in osd_pg_create() with multiple "wait for 
>>> new map" messages before it finally does something.  What could be 
>>> causing it so long to get the OSD map?  The mons don't appear to be 
>>> overloaded in any way.
>>>
>>> Thanks,
>>> Bryan
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io 
>>> <mailto:ceph-users@ceph.io>
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io 
>>> <mailto:ceph-users-leave@ceph.io>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io 
>> <mailto:ceph-users@ceph.io>
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io 
>> <mailto:ceph-users-leave@ceph.io>
> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Slow peering caused by "wait for new map"