[ceph-users] Re: Slow peering caused by "wait for new map"

4 Sep 2019

I see that you have many inactive PGs, probably because the 6 OSD OUT+DOWN.

Problems with "flapping" OSD I use to solved:

* setting NOUP flag
* restarting the "fragile" OSDs
* check if everything is ok look ing their logs
* taking off the NOUP flag

Another solution is:
* Setting NOIN and NOUP flag
* Taking the fragile OSD out
* restarting the "fragile" OSDs
* check if everything is ok look ing their logs
* taking off the NOUP flag
* Take a coffee and wait till all data are drain

[]'s
Arthur (aKa Guilherme Geronimo)

On 04/09/2019 15:32, Bryan Stillwell wrote:
...
  We are not using jumbo frames anywhere on this cluster
(all mtu 1500). 
  The cluster was originally built in October of 2016 and has the 
 following history:

 2016-10-04: Created with Hammer (0.94.3)
 2017-05-03: Upgraded to Hammer (0.94.10)
 2017-10-09: Upgraded to Jewel (10.2.9)
 2017-11-02: Upgraded to Jewel (10.2.10)
 2018-04-30: Upgraded to Luminous (12.2.5)
 2018-09-05: Upgraded to Luminous (12.2.8)
 2019-04-05: Upgraded to Luminous (12.2.11)
 2019-04-18: Upgraded to Luminous (12.2.12)
 2019-07-26: Upgraded to Nautilus (14.2.2)

 It wasn't until after the Nautilus upgrade when this problem started 
 showing up.

 Here's the output you requested:

 [root@a2mon002 ~]# ceph -s
   cluster:
     id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
     health: HEALTH_ERR
 nodown,norebalance,noscrub,nodeep-scrub flag(s) set
             1 nearfull osd(s)
             19 pool(s) nearfull
             1 scrub errors
             Reduced data availability: 6014 pgs inactive, 3 pgs down, 
 5958 pgs peering, 83 pgs stale
             Possible data damage: 1 pg inconsistent
             Degraded data redundancy: 1601/81648846 objects degraded 
 (0.002%), 4 pgs degraded, 5 pgs undersized
             1048 slow requests are blocked > 32 sec

   services:
     mon: 3 daemons, quorum a2mon002,a2mon003,a2mon004 (age 17m)
     mgr: a2mon004(active, since 53m), standbys: a2mon003, a2mon002
     mds: cephfs:2 {0=a2mon004=up:active(laggy or 
 crashed),1=a2mon003=up:active(laggy or crashed)} 1 up:standby
     osd: 143 osds: 141 up, 137 in; 486 remapped pgs
          flags nodown,norebalance,noscrub,nodeep-scrub

   data:
     pools:   20 pools, 6288 pgs
     objects: 27.22M objects, 98 TiB
     usage:   308 TiB used, 114 TiB / 422 TiB avail
     pgs:     0.048% pgs unknown
              95.611% pgs not active
              1601/81648846 objects degraded (0.002%)
              53012/81648846 objects misplaced (0.065%)
              5379 peering
              495  remapped+peering
              269  active+clean
              75   stale+peering
              46   activating
              7    stale+remapped+peering
              3    unknown
              3    active+undersized+degraded
              3    down
              2    activating+remapped
              1    activating+undersized
              1    active+clean+scrubbing
              1  remapped+inconsistent+peering
              1  activating+undersized+degraded
              1    stale+activating
              1    creating+peering

 [root@a2mon002 ~]# ceph versions
 {
     "mon": {
         "ceph version 14.2.2 
 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3
     },
     "mgr": {
         "ceph version 14.2.2 
 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3
     },
     "osd": {
         "ceph version 14.2.2 
 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 141
     },
     "mds": {
         "ceph version 14.2.2 
 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 1
     },
     "overall": {
         "ceph version 14.2.2 
 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 148
     }
 }

 We had seen the slow peering shortly after the Nautilus upgrade, but 
 it eventually recovered.  We then started filling the cluster up to 
 test another Nautilus bug (https://tracker.ceph.com/issues/41255), but 
 then a disk started to die (which caused the inconsistent PG).  When 
 we marked it out we ran into this peering problem again, but it seems 
 much worse this time.

 Thanks,
 Bryan

  On Sep 4, 2019, at 11:55 AM, Guilherme Geronimo 
 &lt;guilherme.geronimo(a)gmail.com <mailto:guilherme.geronimo@gmail.com>> 
 wrote:

 Notice: This email is from an external sender.

 Hey Bryan,

 I suppose all nodes are using jumboframes (mtu 9000), right?
 I would suggest to check OSD->MON communication.

 Can you send the output os these commands for us?
 * ceph -s
 * ceph versions

 []'s
 Arthur (aKa Guilherme Geronimo)

 On 04/09/2019 14:18, Bryan Stillwell wrote:
  Our test cluster is seeing a problem where
peering is going 
 incredibly slow shortly after upgrading it to Nautilus (14.2.2) from 
 Luminous (12.2.12).

 From what I can tell it seems to be caused by "wait for new map" 
 taking a long time.  When looking at dump_historic_slow_ops on 
 pretty much any OSD I see stuff like this:

 # ceph daemon osd.112 dump_historic_slow_ops
 [...snip...]
        {
            "description": "osd_pg_create(e180614 287.4b:177739 
 287.75:177739 287.1c3:177739 287.1cf:177739 287.1e1:177739 
 287.2dd:177739 287.2fc:177739 287.342:177739 287.382:177739)",
            "initiated_at": "2019-09-03 15:12:41.366514",
            "age": 4800.8847047119998,
            "duration": 4780.0579745630002,
            "type_data": {
                "flag_point": "started",
                "events": [
                    {
                        "time": "2019-09-03 15:12:41.366514",
                        "event": "initiated"
                    },
                    {
                        "time": "2019-09-03 15:12:41.366514",
                        "event": "header_read"
                    },
                    {
                        "time": "2019-09-03 15:12:41.366501",
                        "event": "throttled"
                    },
                    {
                        "time": "2019-09-03 15:12:41.366547",
                        "event": "all_read"
                    },
                    {
                        "time": "2019-09-03 15:39:03.379456",
                        "event": "dispatched"
                    },
                    {
                        "time": "2019-09-03 15:39:03.379477",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 15:39:03.522376",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 15:53:55.912499",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 15:59:37.909063",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 16:00:43.356023",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 16:20:50.575498",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-09-03 16:31:48.689415",
                        "event": "started"
                    },
                    {
                        "time": "2019-09-03 16:32:21.424489",
                        "event": "done"
                    }
                ]
            }

 It always seems to be in osd_pg_create() with multiple "wait for new 
 map" messages before it finally does something.  What could be 
 causing it so long to get the OSD map?  The mons don't appear to be 
 overloaded in any way.

 Thanks,
 Bryan
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io 
 <mailto:ceph-users@ceph.io>
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 
 <mailto:ceph-users-leave@ceph.io> 
_______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 
 <mailto:ceph-users-leave@ceph.io> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Slow peering caused by "wait for new map"