Hi Kalle,
We are not using EC. The cluster is 15.2.5, it was upgraded from Mimic in
July. What is odd is the pg_logs report by a dump is much lower than we see
in osd mempool stats.
Regards,
Rob
On Thu, Nov 26, 2020 at 12:11 AM Kalle Happonen <kalle.happonen(a)csc.fi>
wrote:
Hi Robert,
This sounds very much like a big problem we had 2 weeks back.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EWPPEMPAJQT…
Are you running EC? Which version are you running? It would fit our
narrative if you use EC and recently updated to 14.2.11+
For some reason this memory use started growing a day after we updated to
14.2.13. Another case I read was 14.2.11 I think. We don't know if the
pg_logs hadn't really been used before, or if each entry size just grew
much larger after the update for some reason. We don't see this in our
replicated pools.
We significantly reduced the default pg_log amount from 3000->500. If your
cluster is still up and pgs are healthy, this should be doable online.
Sadly we couldn't support the memory usage, and OSD processes started to
get OOM killed. We had to trim these logs offline, which sadly affected our
production.
Cheers,
Kalle
----- Original Message -----
From: "Robert Brooks"
<robert.brooks(a)riskiq.net>
To: "ceph-users" <ceph-users(a)ceph.io>
Sent: Wednesday, 25 November, 2020 20:23:05
Subject: [ceph-users] high memory usage in osd_pglog
We are seeing very high osd_pglog usage in
mempools for ceph osds. For
example...
"mempool": {
"bloom_filter_bytes": 0,
"bloom_filter_items": 0,
"bluestore_alloc_bytes": 41857200,
"bluestore_alloc_items": 523215,
"bluestore_cache_data_bytes": 50876416,
"bluestore_cache_data_items": 1326,
"bluestore_cache_onode_bytes": 6814080,
"bluestore_cache_onode_items": 13104,
"bluestore_cache_other_bytes": 57793850,
"bluestore_cache_other_items": 2599669,
"bluestore_fsck_bytes": 0,
"bluestore_fsck_items": 0,
"bluestore_txc_bytes": 29904,
"bluestore_txc_items": 42,
"bluestore_writing_deferred_bytes": 733191,
"bluestore_writing_deferred_items": 96,
"bluestore_writing_bytes": 0,
"bluestore_writing_items": 0,
"bluefs_bytes": 101400,
"bluefs_items": 1885,
"buffer_anon_bytes": 21505818,
"buffer_anon_items": 14949,
"buffer_meta_bytes": 1161512,
"buffer_meta_items": 13199,
"osd_bytes": 1962920,
"osd_items": 167,
"osd_mapbl_bytes": 825079,
"osd_mapbl_items": 17,
"osd_pglog_bytes": 14099381936,
"osd_pglog_items": 134285429,
"osdmap_bytes": 734616,
"osdmap_items": 26508,
"osdmap_mapping_bytes": 0,
"osdmap_mapping_items": 0,
"pgmap_bytes": 0,
"pgmap_items": 0,
"mds_co_bytes": 0,
"mds_co_items": 0,
"unittest_1_bytes": 0,
"unittest_1_items": 0,
"unittest_2_bytes": 0,
"unittest_2_items": 0
},
Where roughly 14g is required for pg_logs. Cluster has 106 OSD and 2432
placement groups.
The pg log count for placement groups is much less than 134285429 logs.
Top counts are...
1486 1.41c
883 7.3
834 7.f
683 7.13
669 7.a
623 7.5
565 7.8
560 7.1c
546 7.16
544 7.19
Summing these gives 21594 pg logs.
Overall the performance of the cluster is poor, OSD memory usage is high
(20-30G resident), and with a moderate workload we are seeing iowait on
OSD
hosts. The memory allocated to caches appears to
be low, I believe
because
osd_pglog is taking most of the available
memory.
Regards,
Rob
--
*******************************************************************
This
message was sent from RiskIQ, and is intended only for the designated
recipient(s). It may contain confidential or proprietary information and
may be subject to confidentiality protections. If you are not a
designated
recipient, you may not review, copy or distribute
this message. If you
receive this in error, please notify the sender by reply e-mail and
delete
this message. Thank you.
*******************************************************************
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
*******************************************************************
This
message was sent from RiskIQ, and is intended only for the designated
recipient(s). It may contain confidential or proprietary information and
may be subject to confidentiality protections. If you are not a designated
recipient, you may not review, copy or distribute this message. If you
receive this in error, please notify the sender by reply e-mail and delete
this message. Thank you.
*******************************************************************