Hi Mark
Thank you! This is 14.2.8, on Ubuntu Bionic. Some with kernel 4.15, some
with 5.3, but that does not seem to make a difference here. Transparent
Huge Pages are not used according to
grep -i AnonHugePages /proc/meminfo
Workload is a mix of OpenStack volumes (replicated) and RGW on EC 8+3.
EC pool with 1024 PGs, 900M objects.
Around 500 hdd OSDs (4 and 8 TB), 30 ssd OSDs (2 TB). The maximum number
of PGs per OSD is only 123. The hdd OSDs have DB on SSD, but a bit less
than 30 GB unfortunately. I have seen 200 GB and more slow_bytes,
compression of the DB seems to help a lot.
No BlueStore compression.
I had a look at the related thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/JQ72K5LK3Y…
Today I saw a correlation that may match your thoughts. During 1 hour
with a high number of write IOPS (not throughput) on the EC pool,
available memory increased drastically.
Cheers
Harry
On 20.05.20 15:15, Mark Nelson wrote:
> Hi Harald,
>
>
> Thanks! So you can see from the perf dump that the target bytes are a
> little below 4GB, but the mapped bytes are around 7GB. The priority
> cache manager has reacted by setting the "cache_bytes" to 128MB which is
> the minimum global value and each cache is getting 64MB (the local
> minimum value per cache). Ultimately this means the priority cache
> manager has basically told all of the caches to shrink to their smallest
> possible values so it's doing the right thing. So the next question is
> why buffer_anon is so huge. Looking at the mempool stats, there are not
> that many items but still a lot of memory used. On average those items
> in buffer_anon are ~150K. It can't be just buffer anon though, you've
> got several gigabytes of mapped memory being used beyond that and around
> 4GB of unmapped memory that tcmalloc should be freeing every iteration
> of the priority cache manager.
>
>
> So next questions: What version of Ceph is this, and do you have
> transparent huge pages enabled? We automatically disable it now, but if
> you are running an older version you might want to disable (or at least
> set it to madvise) manually. Also, what kind of workload is hitting the
> OSDs? If you can reliably make it grow you could try doing a heap
> profile at the same time the workload is going on and see if you can see
> where the memory is being used.
>
>
> Mark
>
>
> On 5/20/20 7:36 AM, Harald Staub wrote:
>> Hi Mark
>>
>> Thank you for you explanations! Some numbers of this example osd below.
>>
>> Cheers
>> Harry
>>
>> From dump mempools:
>>
>> "buffer_anon": {
>> "items": 29012,
>> "bytes": 4584503367
>> },
>>
>> From perf dump:
>>
>> "prioritycache": {
>> "target_bytes": 3758096384,
>> "mapped_bytes": 7146692608,
>> "unmapped_bytes": 3825983488,
>> "heap_bytes": 10972676096,
>> "cache_bytes": 134217728
>> },
>> "prioritycache:data": {
>> "pri0_bytes": 0,
>> "pri1_bytes": 0,
>> "pri2_bytes": 0,
>> "pri3_bytes": 0,
>> "pri4_bytes": 0,
>> "pri5_bytes": 0,
>> "pri6_bytes": 0,
>> "pri7_bytes": 0,
>> "pri8_bytes": 0,
>> "pri9_bytes": 0,
>> "pri10_bytes": 0,
>> "pri11_bytes": 0,
>> "reserved_bytes": 67108864,
>> "committed_bytes": 67108864
>> },
>> "prioritycache:kv": {
>> "pri0_bytes": 0,
>> "pri1_bytes": 0,
>> "pri2_bytes": 0,
>> "pri3_bytes": 0,
>> "pri4_bytes": 0,
>> "pri5_bytes": 0,
>> "pri6_bytes": 0,
>> "pri7_bytes": 0,
>> "pri8_bytes": 0,
>> "pri9_bytes": 0,
>> "pri10_bytes": 0,
>> "pri11_bytes": 0,
>> "reserved_bytes": 67108864,
>> "committed_bytes": 67108864
>> },
>> "prioritycache:meta": {
>> "pri0_bytes": 0,
>> "pri1_bytes": 0,
>> "pri2_bytes": 0,
>> "pri3_bytes": 0,
>> "pri4_bytes": 0,
>> "pri5_bytes": 0,
>> "pri6_bytes": 0,
>> "pri7_bytes": 0,
>> "pri8_bytes": 0,
>> "pri9_bytes": 0,
>> "pri10_bytes": 0,
>> "pri11_bytes": 0,
>> "reserved_bytes": 67108864,
>> "committed_bytes": 67108864
>> },
>>
>> On 20.05.20 14:05, Mark Nelson wrote:
>>> Hi Harald,
>>>
>>>
>>> Any idea what the priority_cache_manger perf counters show? (or you
>>> can also enable debug osd / debug priority_cache_manager) The osd
>>> memory autotuning works by shrinking the bluestore and rocksdb caches
>>> to some target value to try and keep the mapped memory of the process
>>> bellow the osd_memory_target. In some cases it's possible that
>>> something other than the caches are using the memory (usually pglog)
>>> or there's tons of pinned stuff in the cache that for some reason
>>> can't be evicted. Knowing the cache tuning stats might help tell if
>>> it's trying to shrink the caches and can't for some reason or if
>>> there's something else going on.
>>>
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>
>>>
>>> On 5/20/20 6:10 AM, Harald Staub wrote:
>>>> As a follow-up to our recent memory problems with OSDs (with high
>>>> pglog values:
>>>>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJ…
>>>> ), we also see high buffer_anon values. E.g. more than 4 GB, with
>>>> "osd memory target" set to 3 GB. Is there a way to restrict
it?
>>>>
>>>> As it is called "anon", I guess that it would first be
necessary to
>>>> find out what exactly is behind this?
>>>>
>>>> Well maybe it is just as Wido said, with lots of small objects,
>>>> there will be several problems.
>>>>
>>>> Cheers
>>>> Harry
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io