[ceph-users] Re: OSDs taking too much memory, for buffer_anon

25 May 2020

Hi Mark

Thank you! This is 14.2.8, on Ubuntu Bionic. Some with kernel 4.15, some 
with 5.3, but that does not seem to make a difference here. Transparent 
Huge Pages are not used according to
grep -i AnonHugePages /proc/meminfo

Workload is a mix of OpenStack volumes (replicated) and RGW on EC 8+3. 
EC pool with 1024 PGs, 900M objects.

Around 500 hdd OSDs (4 and 8 TB), 30 ssd OSDs (2 TB). The maximum number 
of PGs per OSD is only 123. The hdd OSDs have DB on SSD, but a bit less 
than 30 GB unfortunately. I have seen 200 GB and more slow_bytes, 
compression of the DB seems to help a lot.

No BlueStore compression.

I had a look at the related thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/JQ72K5LK3Y…

Today I saw a correlation that may match your thoughts. During 1 hour 
with a high number of write IOPS (not throughput) on the EC pool, 
available memory increased drastically.

Cheers
  Harry

On 20.05.20 15:15, Mark Nelson wrote:
> Hi Harald,
> 
> 
> Thanks!  So you can see from the perf dump that the target bytes are a 
> little below 4GB, but the mapped bytes are around 7GB.  The priority 
> cache manager has reacted by setting the "cache_bytes" to 128MB which is 
> the minimum global value and each cache is getting 64MB (the local 
> minimum value per cache).  Ultimately this means the priority cache 
> manager has basically told all of the caches to shrink to their smallest 
> possible values so it's doing the right thing.  So the next question is 
> why buffer_anon is so huge. Looking at the mempool stats, there are not 
> that many items but still a lot of memory used.  On average those items 
> in buffer_anon are ~150K.  It can't be just buffer anon though, you've 
> got several gigabytes of mapped memory being used beyond that and around 
> 4GB of unmapped memory that tcmalloc should be freeing every iteration 
> of the priority cache manager.
> 
> 
> So next questions:  What version of Ceph is this, and do you have 
> transparent huge pages enabled? We automatically disable it now, but if 
> you are running an older version you might want to disable (or at least 
> set it to madvise) manually.  Also, what kind of workload is hitting the 
> OSDs?  If you can reliably make it grow you could try doing a heap 
> profile at the same time the workload is going on and see if you can see 
> where the memory is being used.
> 
> 
> Mark
> 
> 
> On 5/20/20 7:36 AM, Harald Staub wrote:
>> Hi Mark
>>
>> Thank you for you explanations! Some numbers of this example osd below.
>>
>> Cheers
>>  Harry
>>
>> From dump mempools:
>>
>>             "buffer_anon": {
>>                 "items": 29012,
>>                 "bytes": 4584503367
>>             },
>>
>> From perf dump:
>>
>>     "prioritycache": {
>>         "target_bytes": 3758096384,
>>         "mapped_bytes": 7146692608,
>>         "unmapped_bytes": 3825983488,
>>         "heap_bytes": 10972676096,
>>         "cache_bytes": 134217728
>>     },
>>     "prioritycache:data": {
>>         "pri0_bytes": 0,
>>         "pri1_bytes": 0,
>>         "pri2_bytes": 0,
>>         "pri3_bytes": 0,
>>         "pri4_bytes": 0,
>>         "pri5_bytes": 0,
>>         "pri6_bytes": 0,
>>         "pri7_bytes": 0,
>>         "pri8_bytes": 0,
>>         "pri9_bytes": 0,
>>         "pri10_bytes": 0,
>>         "pri11_bytes": 0,
>>         "reserved_bytes": 67108864,
>>         "committed_bytes": 67108864
>>     },
>>     "prioritycache:kv": {
>>         "pri0_bytes": 0,
>>         "pri1_bytes": 0,
>>         "pri2_bytes": 0,
>>         "pri3_bytes": 0,
>>         "pri4_bytes": 0,
>>         "pri5_bytes": 0,
>>         "pri6_bytes": 0,
>>         "pri7_bytes": 0,
>>         "pri8_bytes": 0,
>>         "pri9_bytes": 0,
>>         "pri10_bytes": 0,
>>         "pri11_bytes": 0,
>>         "reserved_bytes": 67108864,
>>         "committed_bytes": 67108864
>>     },
>>     "prioritycache:meta": {
>>         "pri0_bytes": 0,
>>         "pri1_bytes": 0,
>>         "pri2_bytes": 0,
>>         "pri3_bytes": 0,
>>         "pri4_bytes": 0,
>>         "pri5_bytes": 0,
>>         "pri6_bytes": 0,
>>         "pri7_bytes": 0,
>>         "pri8_bytes": 0,
>>         "pri9_bytes": 0,
>>         "pri10_bytes": 0,
>>         "pri11_bytes": 0,
>>         "reserved_bytes": 67108864,
>>         "committed_bytes": 67108864
>>     },
>>
>> On 20.05.20 14:05, Mark Nelson wrote:
>>> Hi Harald,
>>>
>>>
>>> Any idea what the priority_cache_manger perf counters show? (or you 
>>> can also enable debug osd / debug priority_cache_manager) The osd 
>>> memory autotuning works by shrinking the bluestore and rocksdb caches 
>>> to some target value to try and keep the mapped memory of the process 
>>> bellow the osd_memory_target.  In some cases it's possible that 
>>> something other than the caches are using the memory (usually pglog) 
>>> or there's tons of pinned stuff in the cache that for some reason 
>>> can't be evicted. Knowing the cache tuning stats might help tell if 
>>> it's trying to shrink the caches and can't for some reason or if 
>>> there's something else going on.
>>>
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>
>>>
>>> On 5/20/20 6:10 AM, Harald Staub wrote:
>>>> As a follow-up to our recent memory problems with OSDs (with high 
>>>> pglog values: 
>>>>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJ…

>>>> ), we also see high buffer_anon values. E.g. more than 4 GB, with 
>>>> "osd memory target" set to 3 GB. Is there a way to restrict
it?
>>>>
>>>> As it is called "anon", I guess that it would first be
necessary to 
>>>> find out what exactly is behind this?
>>>>
>>>> Well maybe it is just as Wido said, with lots of small objects, 
>>>> there will be several problems.
>>>>
>>>> Cheers
>>>>  Harry
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: OSDs taking too much memory, for buffer_anon