2019-07-26: Upgraded to Nautilus (14.2.2)

It wasn't until after the Nautilus upgrade when this problem started showing up.

Here's the output you requested:

[root@a2mon002 ~]# ceph -s

cluster:

id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

health: HEALTH_ERR

nodown,norebalance,noscrub,nodeep-scrub flag(s) set

1 nearfull osd(s)

19 pool(s) nearfull

1 scrub errors

Reduced data availability: 6014 pgs inactive, 3 pgs down, 5958 pgs peering, 83 pgs stale

Possible data damage: 1 pg inconsistent

Degraded data redundancy: 1601/81648846 objects degraded (0.002%), 4 pgs degraded, 5 pgs undersized

1048 slow requests are blocked > 32 sec

services:

mon: 3 daemons, quorum a2mon002,a2mon003,a2mon004 (age 17m)

mgr: a2mon004(active, since 53m), standbys: a2mon003, a2mon002

mds: cephfs:2 {0=a2mon004=up:active(laggy or crashed),1=a2mon003=up:active(laggy or crashed)} 1 up:standby

osd: 143 osds: 141 up, 137 in; 486 remapped pgs

flags nodown,norebalance,noscrub,nodeep-scrub

data:

pools: 20 pools, 6288 pgs

objects: 27.22M objects, 98 TiB

usage: 308 TiB used, 114 TiB / 422 TiB avail

pgs: 0.048% pgs unknown

95.611% pgs not active

1601/81648846 objects degraded (0.002%)

53012/81648846 objects misplaced (0.065%)

5379 peering

495 remapped+peering

269 active+clean

75 stale+peering

46 activating

7 stale+remapped+peering

3 unknown

3 active+undersized+degraded

3 down

2 activating+remapped

1 activating+undersized

1 active+clean+scrubbing

1 remapped+inconsistent+peering

1 activating+undersized+degraded

1 stale+activating

1 creating+peering

[root@a2mon002 ~]# ceph versions

{

"mon": {

"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3

"mgr": {

"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 3

"osd": {

"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 141

"mds": {

"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 1

"overall": {

"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 148

}

We had seen the slow peering shortly after the Nautilus upgrade, but it eventually recovered. We then started filling the cluster up to test another Nautilus bug (https://tracker.ceph.com/issues/41255), but then a disk started to die (which caused the inconsistent PG). When we marked it out we ran into this peering problem again, but it seems much worse this time.

Thanks,

Bryan

On Sep 4, 2019, at 11:55 AM, Guilherme Geronimo <guilherme.geronimo@gmail.com> wrote:

Notice: This email is from an external sender.

Hey Bryan,

I suppose all nodes are using jumboframes (mtu 9000), right?
I would suggest to check OSD->MON communication.

Can you send the output os these commands for us?
* ceph -s
* ceph versions

[]'s
Arthur (aKa Guilherme Geronimo)

On 04/09/2019 14:18, Bryan Stillwell wrote:

Our test cluster is seeing a problem where peering is going incredibly slow shortly after upgrading it to Nautilus (14.2.2) from Luminous (12.2.12).

From what I can tell it seems to be caused by "wait for new map" taking a long time. When looking at dump_historic_slow_ops on pretty much any OSD I see stuff like this:

# ceph daemon osd.112 dump_historic_slow_ops
[...snip...]
       {
           "description": "osd_pg_create(e180614 287.4b:177739 287.75:177739 287.1c3:177739 287.1cf:177739 287.1e1:177739 287.2dd:177739 287.2fc:177739 287.342:177739 287.382:177739)",
           "initiated_at": "2019-09-03 15:12:41.366514",
           "age": 4800.8847047119998,
           "duration": 4780.0579745630002,
           "type_data": {
               "flag_point": "started",
               "events": [
                   {
                       "time": "2019-09-03 15:12:41.366514",
                       "event": "initiated"
                   },
                   {
                       "time": "2019-09-03 15:12:41.366514",
                       "event": "header_read"
                   },
                   {
                       "time": "2019-09-03 15:12:41.366501",
                       "event": "throttled"
                   },
                   {
                       "time": "2019-09-03 15:12:41.366547",
                       "event": "all_read"
                   },
                   {
                       "time": "2019-09-03 15:39:03.379456",
                       "event": "dispatched"
                   },
                   {
                       "time": "2019-09-03 15:39:03.379477",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 15:39:03.522376",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 15:53:55.912499",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 15:59:37.909063",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 16:00:43.356023",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 16:20:50.575498",
                       "event": "wait for new map"
                   },
                   {
                       "time": "2019-09-03 16:31:48.689415",
                       "event": "started"
                   },
                   {
                       "time": "2019-09-03 16:32:21.424489",
                       "event": "done"
                   }
               ]
           }

It always seems to be in osd_pg_create() with multiple "wait for new map" messages before it finally does something. What could be causing it so long to get the OSD map? The mons don't appear to be overloaded in any way.

Thanks,
Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io