[ceph-users] Re: osd is immidietly down and uses CPU full.

2 Feb 2020

Hello,

what you see is an stracktrace, so the OSD is hitting an unexpected
state (Otherwise there would be an error handler).

The crash happens, when the osd wants to read from pipe when processing
heartbeat. To me it sounds like a networking issue.

I see the other OSD an that host are healthy, to I would bet there is an
issue with tcp port that this particular osd daemon used.

Try 'netstat -tulpen' or so to check. If there a firewall between the
host that might cut something off?

rgds,

j.

On 02.02.20 04:08, Makito Nishimiya wrote:
...
  Hi.

 This is the cluster informastion.

 -------------- /var/log/ceph/ceph.osd.1.log ---------------------------
 2020-02-01 03:47:20.635504 7f86f4e40700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
 2020-02-01 03:47:20.635521 7f86f4f41700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
 2020-02-01 03:47:20.747876 7f86f3b2d700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
 2020-02-01 03:47:20.747903 7f86f3c2e700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
 2020-02-01 03:47:21.152436 7f86e2a2e700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
 2020-02-01 03:47:21.152441 7f86e2a2e700  1 heartbeat_map is_healthy
 'OSD::osd_op_tp thread 0x7f86fe35e700' had suicide timed out after 150
 2020-02-01 03:47:21.157963 7f86e2a2e700 -1 common/HeartbeatMap.cc: In
 function 'bool ceph::HeartbeatMap::_check(const
 ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f86e2a2e700
 time 2020-02-01 03:47:21.152463
 common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x85) [0x7f875259f425]
  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
 const*, long)+0x2e1) [0x7f87524dede1]
  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f87524df63e]
  4: (OSD::handle_osd_ping(MOSDPing*)+0x93f) [0x7f8751f141df]
  5: (OSD::heartbeat_dispatch(Message*)+0x3cb) [0x7f8751f1540b]
  6: (DispatchQueue::fast_dispatch(Message*)+0x76) [0x7f875265f9d6]
  7: (Pipe::reader()+0x1dff) [0x7f875269c68f]
  8: (Pipe::Reader::entry()+0xd) [0x7f87526a41ad]
  9: (()+0x7e25) [0x7f87502b8e25]
  10: (clone()+0x6d) [0x7f874e94234d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
 needed to interpret this.

 --- begin dump of recent events ---
 -10000> 2020-02-01 03:45:29.491110 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.203:6805/5717 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512a200 con
 0x7f8774eb6e80
  -9999> 2020-02-01 03:45:29.491109 7f86e8245700  1 --
 10.1.201.201:0/3892596 <== osd.9 10.1.201.202:6811/3006646 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781383400 con 0x7f8774eaf480
  -9998> 2020-02-01 03:45:29.491123 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.203:6803/5220 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512bc00 con
 0x7f8774eae100
  -9997> 2020-02-01 03:45:29.491137 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.203:6803/5220 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512aa00 con
 0x7f8774eae280
  -9996> 2020-02-01 03:45:29.491150 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.203:6811/15624 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512ac00 con
 0x7f8774eae700
  -9995> 2020-02-01 03:45:29.491148 7f86ec1a4700  1 --
 10.1.201.201:0/3892596 <== osd.8 10.1.202.202:6812/2131331 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781380200 con 0x7f8774eb5200
  -9994> 2020-02-01 03:45:29.491163 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.203:6811/15624 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512be00 con
 0x7f8774eae880
  -9993> 2020-02-01 03:45:29.491176 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.203:6801/4089 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128a00 con
 0x7f8774eaf780
  -9992> 2020-02-01 03:45:29.491169 7f86e8447700  1 --
 10.1.201.201:0/3892596 <== osd.9 10.1.202.202:6811/3006646 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781380400 con 0x7f8774eaf600
  -9991> 2020-02-01 03:45:29.491167 7f86e975a700  1 --
 10.1.201.201:0/3892596 <== osd.10 10.1.201.202:6801/3005449 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781380000 con 0x7f8774eb7c00
  -9990> 2020-02-01 03:45:29.491192 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.203:6801/4089 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128c00 con
 0x7f8774eaf900
  -9989> 2020-02-01 03:45:29.491206 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.203:6813/17285 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775451e00 con
 0x7f8774ee3180
  -9988> 2020-02-01 03:45:29.491184 7f86e9255700  1 --
 10.1.201.201:0/3892596 <== osd.11 10.1.201.202:6809/2004102 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781385800 con 0x7f8774eade00
  -9987> 2020-02-01 03:45:29.491202 7f86e9558700  1 --
 10.1.201.201:0/3892596 <== osd.10 10.1.202.202:6801/3005449 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781385400 con 0x7f8774eb7d80
  -9986> 2020-02-01 03:45:29.491233 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.203:6813/17285 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877544fe00 con
 0x7f8774ee3300
  -9985> 2020-02-01 03:45:29.491238 7f86e9053700  1 --
 10.1.201.201:0/3892596 <== osd.11 10.1.202.202:6813/2004102 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781383800 con 0x7f8774eadf80
  -9984> 2020-02-01 03:45:29.491234 7f86e874a700  1 --
 10.1.201.201:0/3892596 <== osd.12 10.1.201.202:6803/4953 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f8781383a00 con 0x7f8774eae400
  -9983> 2020-02-01 03:45:29.491247 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.204:6807/13468 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576000 con
 0x7f8774eb5e00
  -9982> 2020-02-01 03:45:29.491258 7f86ecbae700  1 --
 10.1.201.201:0/3892596 <== osd.13 10.1.201.202:6805/2390794 12 ====
 osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
 47+0+0 (4038678352 0 0) 0x7f87813a6400 con 0x7f8774ee1980
  -9981> 2020-02-01 03:45:29.491266 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.202.204:6807/13468 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576e00 con
 0x7f8774eb5f80
  -9980> 2020-02-01 03:45:29.491291 7f86f9253700  1 --
 10.1.201.201:0/3892596 --> 10.1.201.204:6809/410784 -- osd_ping(ping
 e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8774ffe400 con
 0x7f8774eac900

 ---------------- ceph -s
 ------------------------------------------------------

     cluster 57d2e172-0306-4f8a-946c-a4ae1e95b26b
      health HEALTH_WARN
             7 pgs backfill_wait
             11 pgs backfilling
             9 pgs degraded
             4 pgs recovery_wait
             21 pgs stuck unclean
             6 pgs undersized
             100 requests are blocked > 32 sec
             recovery 649692/75245727 objects degraded (0.863%)
             recovery 1027694/75245727 objects misplaced (1.366%)
             recovery 1/24998284 unfound (0.000%)
             pool default.rgw.buckets.data has many more objects per pg
 than average (too few pgs?)
      monmap e1: 3 mons at

{cephmon01=10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0}
             election epoch 562, quorum 0,1,2
 cephmon03,cephmon02,cephmon01
      osdmap e127248: 42 osds: 37 up, 37 in; 20 remapped pgs
             flags sortbitwise,require_jewel_osds
       pgmap v92257695: 1568 pgs, 14 pools, 18271 GB data, 24412 kobjects
             55189 GB used, 45363 GB / 100553 GB avail
             649692/75245727 objects degraded (0.863%)
             1027694/75245727 objects misplaced (1.366%)
             1/24998284 unfound (0.000%)
                 1540 active+clean
                    8 active+remapped+backfilling
                    5 active+remapped+wait_backfill
                    3 active+clean+scrubbing
                    3 active+clean+scrubbing+deep
                    3 active+undersized+degraded+remapped+backfilling
                    2 active+undersized+degraded+remapped+wait_backfill
                    2 active+recovery_wait+degraded
                    1 active+recovery_wait+undersized+degraded+remapped
                    1 active+recovery_wait+degraded+remapped
 recovery io 239 MB/s, 187 objects/s
   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr

 ---------------- ceph osd tree
 ------------------------------------------------------

 [root@ceph01 ceph]# ceph osd tree
 ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 108.19864 root default
 -2  19.09381     host ceph01
  0   2.72769         osd.0        up  1.00000          1.00000
  1   2.72769         osd.1      down        0          1.00000 <-- now
 down
  2   2.72769         osd.2        up  1.00000          1.00000
  5   2.72769         osd.5        up  1.00000          1.00000
  6   2.72768         osd.6        up  1.00000          1.00000
  3   2.72768         osd.3        up  1.00000          1.00000
  4   2.72769         osd.4        up  1.00000          1.00000
 -3  19.09383     host ceph02
  8   2.72769         osd.8        up  1.00000          1.00000
  9   2.72769         osd.9        up  1.00000          1.00000
 10   2.72769         osd.10       up  1.00000          1.00000
 12   2.72769         osd.12       up  1.00000          1.00000
 11   2.72769         osd.11       up  1.00000          1.00000
  7   2.72768         osd.7        up  1.00000          1.00000
 13   2.72769         osd.13       up  1.00000          1.00000
 -4  16.36626     host ceph03
 14   2.72769         osd.14       up  1.00000          1.00000
 16   2.72769         osd.16       up  1.00000          1.00000
 17   2.72769         osd.17       up  1.00000          1.00000
 19   2.72769         osd.19       up  1.00000          1.00000
 15   1.81850         osd.15       up  1.00000          1.00000
 18   1.81850         osd.18       up  1.00000          1.00000
 20   1.81850         osd.20       up  1.00000          1.00000
 -5  15.45706     host ceph04
 23   2.72769         osd.23       up  1.00000          1.00000
 24   2.72769         osd.24       up  1.00000          1.00000
 27   2.72769         osd.27     down        0          1.00000 <--
 more then 3month ago
 21   1.81850         osd.21       up  1.00000          1.00000
 22   1.81850         osd.22       up  1.00000          1.00000
 25   1.81850         osd.25       up  1.00000          1.00000
 26   1.81850         osd.26       up  1.00000          1.00000
 -6  19.09384     host ceph05
 28   2.72769         osd.28       up  1.00000          1.00000
 29   2.72769         osd.29       up  1.00000          1.00000
 30   2.72769         osd.30       up  1.00000          1.00000
 31   2.72769         osd.31     down        0          1.00000 <--
 more then 3month ago
 32   2.72769         osd.32       up  1.00000          1.00000
 34   2.72769         osd.34       up  1.00000          1.00000
 33   2.72769         osd.33     down        0          1.00000 <--
 more then 3month ago
 -7  19.09384     host ceph06
 35   2.72769         osd.35       up  1.00000          1.00000
 36   2.72769         osd.36       up  1.00000          1.00000
 37   2.72769         osd.37       up  1.00000          1.00000
 39   2.72769         osd.39       up  1.00000          1.00000
 40   2.72769         osd.40       up  1.00000          1.00000
 41   2.72769         osd.41       up  1.00000          1.00000
 38   2.72769         osd.38     down        0          1.00000 <--
 more then 3month ago

 ------------------------------

 On 2020/02/02 11:20, 西宮 牧人 wrote:
  Servers: 6 (include 7osds) total 42osdsl
 OS: Centos7
 Ceph: 10.2.5

 Hi, everyone

 The cluster is used for VM image storage and object storage.
 And I have a bucket which has more than 20 million objects.

 Now, I have a problem that cluster blocks operation.

 Suddenly cluster blocked operations, then VMs can't read disk.
 After a few hours, osd.1 was down.

 There is no disk fail messages in dmesg.
 And no error is in smartctl -a /dev/sde.

 I tried to wake up osd.1, but osd.1 is down soon.
 Just after re-waking up osd.1, VM can access to the disk.
 But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
 the osd was dead by suicide timeout.

 I found that the osdmap epoch of osd.1 is different from other one.
 So I think osd.1 was dead.

 Question.
 (1) Why does the epoch of osd.1 differ from other osds ones ?

   I checked all osds oldest_map and newest_map by ~ceph daemon osd.X
 status~
   All osd's ecpoch are same number except osd.1

 (2) Why does osd.1 use CPU full?

   After the cluster marked osd.1 down, osd.1 keeps up busy.
   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
 doesn't answer.

 Thank you. 
 -- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

2024

2023

2022

2021

2020

2019

[ceph-users] Re: osd is immidietly down and uses CPU full.