Manuel,

well, this is a bit different from the tickets I shared... But still looks like slow DB access.

80+ seconds for submit/commit latency is TOO HIGH, this definitely might cause suicides...

Have you had a chance to inspect disk utilization?


Introducing NVMe drive for WAL/DB might be a good idea, I can see up to 20GB allocated for META so they perfectly fit into 480GB NVMe drive.

Having single drive isn't that the perfect from performance and failure domain points of view though... I'd rather prefer 4-6 OSDs per drive...


As  a workaround you might also try to disable deep scrub.


Thanks,

Igor

On 8/7/2019 2:59 PM, EDH - Manuel Rios Fernandez wrote:

Hi Igor

 

Yes we got all in same device :

 

[root@CEPH-MON01 ~]# ceph osd df tree

ID  CLASS   WEIGHT    REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS TYPE NAME

31         130.96783        - 131 TiB 114 TiB 114 TiB  14 MiB  204 GiB  17 TiB 86.88 1.03   -            host CEPH008

  5 archive  10.91399  0.80002  11 TiB 7.9 TiB 7.9 TiB 2.6 MiB   15 GiB 3.0 TiB 72.65 0.86 181     up         osd.5

  6 archive  10.91399  1.00000  11 TiB 9.4 TiB 9.3 TiB 5.8 MiB   17 GiB 1.6 TiB 85.76 1.01 222     up         osd.6

11 archive  10.91399  1.00000  11 TiB  10 TiB  10 TiB  48 KiB   19 GiB 838 GiB 92.50 1.09 251     up         osd.11

45 archive  10.91399  1.00000  11 TiB  10 TiB  10 TiB 148 KiB   18 GiB 678 GiB 93.94 1.11 248     up         osd.45

46 archive  10.91399  1.00000  11 TiB 9.6 TiB 9.5 TiB 4.7 MiB   17 GiB 1.4 TiB 87.52 1.04 235     up         osd.46

47 archive  10.91399  1.00000  11 TiB 8.8 TiB 8.8 TiB  68 KiB   17 GiB 2.1 TiB 80.43 0.95 211     up         osd.47

55 archive  10.91399  1.00000  11 TiB 9.9 TiB 9.9 TiB 132 KiB   17 GiB 1.0 TiB 90.74 1.07 243     up         osd.55

70 archive  10.91399  1.00000  11 TiB  10 TiB  10 TiB  44 KiB   19 GiB 864 GiB 92.27 1.09 236     up         osd.70

71 archive  10.91399  1.00000  11 TiB 9.2 TiB 9.2 TiB  28 KiB   16 GiB 1.7 TiB 84.19 1.00 228     up         osd.71

78 archive  10.91399  1.00000  11 TiB 8.9 TiB 8.9 TiB 182 KiB   16 GiB 2.0 TiB 81.87 0.97 215     up         osd.78

79 archive  10.91399  1.00000  11 TiB  10 TiB  10 TiB 152 KiB   17 GiB 958 GiB 91.43 1.08 238     up         osd.79

91 archive  10.91399  1.00000  11 TiB 9.7 TiB 9.7 TiB  92 KiB   17 GiB 1.2 TiB 89.22 1.06 232     up         osd.91

 

Disk are HGST of 12TB for archive porpourse.

 

In the same osd we got sine commit bluestore log latency

 

2019-08-07 06:57:33.681 7f059b06e700  0 bluestore(/var/lib/ceph/osd/ceph-46) queue_transactions slow operation observed for l_bluestore_submit_lat, latency = 11.163s

2019-08-07 06:57:33.703 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46) _txc_committed_kv slow operation observed for l_bluestore_commit_lat, latency = 11.1858s, txc = 0x55e9e3ea2c00

2019-08-07 09:14:00.620 7f059d072700  0 bluestore(/var/lib/ceph/osd/ceph-46) queue_transactions slow operation observed for l_bluestore_submit_lat, latency = 7.23777s

2019-08-07 09:14:00.650 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46) _txc_committed_kv slow operation observed for l_bluestore_commit_lat, latency = 7.26778s, txc = 0x55eaafbf6600

2019-08-07 09:19:08.242 7f059e875700  0 bluestore(/var/lib/ceph/osd/ceph-46) queue_transactions slow operation observed for l_bluestore_submit_lat, latency = 81.8293s

2019-08-07 09:19:08.291 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46) _txc_committed_kv slow operation observed for l_bluestore_commit_lat, latency = 81.8609s, txc = 0x55ea05ee6000

2019-08-07 09:19:08.467 7f059b06e700  0 bluestore(/var/lib/ceph/osd/ceph-46) queue_transactions slow operation observed for l_bluestore_submit_lat, latency = 87.7795s

2019-08-07 09:19:08.481 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46) _txc_committed_kv slow operation observed for l_bluestore_commit_lat, latency = 87.7928s, txc = 0x55eaa7a40600

 

Maybe move OMAP +META from all OSD to a NVME of 480GB per node helps in this situation but not sure.

 

Manuel

 

 

 

 

De: Igor Fedotov <ifedotov@suse.de>
Enviado el: miércoles, 7 de agosto de 2019 13:10
Para: EDH - Manuel Rios Fernandez <mriosfer@easydatahost.com>; 'Ceph Users' <ceph-users@lists.ceph.com>
Asunto: Re: [ceph-users] 14.2.2 - OSD Crash

 

Hi Manuel,

as Brad pointed out timeouts and suicides are rather consequences of some other issues with OSDs.

I recall at least two recent relevant tickets:

https://tracker.ceph.com/issues/36482

https://tracker.ceph.com/issues/40741 (see last comments)

Both had massive and slow reads from RocksDB which caused timeouts..

Visible symptom for both cases was  unexpectedly high read I/O from underlying disks (main and/or DB).

You can use iotop for inspection..,

These were worsened by having significant part of DB at spinners due to spillovers. So wondering what's your layout in this respect:

what drives back troublesome OSDs, is there any spillover to slow device, how massive it is?

Also could you please inspect your OSD logs for the presence of lines containing "slow operation observed" substring. And share them if any..

 

Hope this helps.

Thanks,

Igor

 

 

On 8/7/2019 2:16 AM, EDH - Manuel Rios Fernandez wrote:

Hi

 

We got a pair of OSD located in  node that crash randomly since 14.2.2

 

OS Version : Centos 7.6

 

There’re a ton of lines before crash , I will unespected:

 

--

3045> 2019-08-07 00:39:32.013 7fe9a4996700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3044> 2019-08-07 00:39:32.013 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3043> 2019-08-07 00:39:32.033 7fe9a4195700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3042> 2019-08-07 00:39:32.033 7fe9a4996700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

--

-----

 

Some hundred lines of:

-164> 2019-08-07 00:47:36.628 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -163> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -162> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

-----

 

   -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick

   -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:21.756453)

   -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick

   -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:31.756604)

   -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick

   -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:41.756788)

   -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick

   -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:51.756982)

   -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick

   -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:01.757206)

   -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick

   -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:11.757364)

   -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick

   -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:21.757535)

   -64> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

   -63> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150

   -62> 2019-08-07 00:51:52.948 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 old cache_size: 1994018210 new cache size: 1992784572

   -61> 2019-08-07 00:51:52.948 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1992784572 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654593191 data_alloc: 452984832 data_used: 455929856

   -60> 2019-08-07 00:51:57.923 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1994110827 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -59> 2019-08-07 00:51:57.973 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1994110827 new cache size: 1994442069

   -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick

   -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:31.757684)

   -56> 2019-08-07 00:52:02.933 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1995765747 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -55> 2019-08-07 00:52:02.983 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1995765747 new cache size: 1996096345

   -54> 2019-08-07 00:52:07.943 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1997417449 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -53> 2019-08-07 00:52:07.993 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1997417449 new cache size: 1997747404

   -52> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: tick

   -51> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:41.757855)

   -50> 2019-08-07 00:52:12.952 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1999065941 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -49> 2019-08-07 00:52:13.002 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1999065941 new cache size: 1999395254

   -48> 2019-08-07 00:52:17.962 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2000711226 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -47> 2019-08-07 00:52:18.012 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2000711226 new cache size: 2001039899

   -46> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: tick

   -45> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:51.758043)

   -44> 2019-08-07 00:52:22.971 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2002353314 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -43> 2019-08-07 00:52:23.022 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2002353314 new cache size: 2002681348

   -42> 2019-08-07 00:52:27.982 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2003992210 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -41> 2019-08-07 00:52:28.031 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2003992210 new cache size: 2004319607

   -40> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: tick

   -39> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:01.758219)

   -38> 2019-08-07 00:52:32.991 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2005627920 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -37> 2019-08-07 00:52:33.041 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2005627920 new cache size: 2005954680

   -36> 2019-08-07 00:52:38.001 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2007260450 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -35> 2019-08-07 00:52:38.051 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2007260450 new cache size: 2007586575

   -34> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: tick

   -33> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:11.758447)

   -32> 2019-08-07 00:52:43.011 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2008889806 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -31> 2019-08-07 00:52:43.061 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2008889806 new cache size: 2009215297

   -30> 2019-08-07 00:52:48.021 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2010515995 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -29> 2019-08-07 00:52:48.071 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2010515995 new cache size: 2010840853

   -28> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: tick

   -27> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:21.758631)

   -26> 2019-08-07 00:52:53.031 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2012139023 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -25> 2019-08-07 00:52:53.081 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2012139023 new cache size: 2012463250

   -24> 2019-08-07 00:52:58.042 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2013758896 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -23> 2019-08-07 00:52:58.092 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2013758896 new cache size: 2014082492

   -22> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: tick

   -21> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:31.758799)

   -20> 2019-08-07 00:53:03.052 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2015375620 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -19> 2019-08-07 00:53:03.102 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2015375620 new cache size: 2015698587

   -18> 2019-08-07 00:53:08.062 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2016989201 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -17> 2019-08-07 00:53:08.112 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2016989201 new cache size: 2017311541

   -16> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: tick

   -15> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:41.759013)

   -14> 2019-08-07 00:53:13.071 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2018599645 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -13> 2019-08-07 00:53:13.121 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2018599645 new cache size: 2018921358

   -12> 2019-08-07 00:53:18.081 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2020206960 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -11> 2019-08-07 00:53:18.130 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2020206960 new cache size: 2020528048

   -10> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: tick

    -9> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:51.759214)

    -8> 2019-08-07 00:53:23.090 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2021811150 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -7> 2019-08-07 00:53:23.140 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2021811150 new cache size: 2022131613

    -6> 2019-08-07 00:53:28.100 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2023412220 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -5> 2019-08-07 00:53:28.150 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2023412220 new cache size: 2023732060

    -4> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: tick

    -3> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:53:01.759334)

    -2> 2019-08-07 00:53:33.110 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2025010178 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -1> 2019-08-07 00:53:33.160 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2025010178 new cache size: 2025329397

     0> 2019-08-07 00:53:37.655 7fe987e49700 -1 *** Caught signal (Aborted) **

in thread 7fe987e49700 thread_name:tp_osd_tp

 

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)

1: (()+0xf5d0) [0x7fe9a7cba5d0]

2: (pthread_kill()+0x31) [0x7fe9a7cb79d1]

3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x466) [0x55fef8748176]

4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x7b) [0x55fef874878b]

5: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xa9e) [0x55fef86085de]

6: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x7f) [0x55fef81cd7ff]

7: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x58) [0x55fef8118298]

8: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202) [0x55fef81767c2]

9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x55fef83eb490]

10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) [0x55fef816aef4]

11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55fef8769ce3]

12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fef876cd80]

13: (()+0x7dd5) [0x7fe9a7cb2dd5]

14: (clone()+0x6d) [0x7fe9a6b7202d]

 

About server load: 

 

[root@CEPH008 ~]# top

top - 00:57:30 up 186 days, 22 min,  1 user,  load average: 11.65, 13.42, 13.51

Tasks: 316 total,   1 running, 315 sleeping,   0 stopped,   0 zombie

%Cpu(s):  2.3 us,  1.2 sy,  0.0 ni, 74.1 id, 22.4 wa,  0.0 hi,  0.1 si,  0.0 st

KiB Mem : 65737480 total,   431824 free, 49046608 used, 16259048 buff/cache

KiB Swap: 29241340 total, 19406504 free,  9834836 used. 15917556 avail Mem

 

 

Currently the server is doing some deep-scrub that we got off during the last two weeks due a node evict and a new node install.

 

 

 



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com