[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

5 Apr 2024

Hi Adam
No problem, i really appreciate your input :)
The memory stats returned are as follows
  "memory_available_kb": 19858056,
  "memory_free_kb": 277480,
  "memory_total_kb": 32827840,

On Thu, Apr 4, 2024 at 10:14 PM Adam King &lt;adking(a)redhat.com&gt; wrote:

...
  Sorry to keep asking for more info, but can I also get
what `cephadm
 gather-facts` on that host returns for "memory_total_kb". Might end up
 creating a unit test out of this case if we have a calculation bug here.

 On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted &lt;mads2a(a)gmail.com&gt; wrote:

> sorry for the double send, forgot to hit reply all so it would appear on
> the page
>
> Hi Adam
>
> If we multiply by 0.7, and work through the previous example from that
> number, we would still arrive at roughly 2.5 gb for each osd. And the host
> in question is trying to set it to less than 500mb.
> I have attached a list of the processes running on the host. Currently
> you can even see that the OSD's are taking up the most memory by far, and
> at least 5x its proposed minimum.
> root@my-ceph01:/# ceph orch ps | grep my-ceph01
> crash.my-ceph01               my-ceph01               running (3w)
>  7m ago  13M    9052k        -  17.2.6
> grafana.my-ceph01             my-ceph01  *:3000       running (3w)
>  7m ago  13M    95.6M        -  8.3.5
> mds.testfs.my-ceph01.xjxfzd  my-ceph01               running (3w)      7m
> ago  10M     485M        -  17.2.6
> mds.prodfs.my-ceph01.rplvac   my-ceph01               running (3w)
>  7m ago  12M    26.9M        -  17.2.6
> mds.prodfs.my-ceph01.twikzd    my-ceph01               running (3w)
>  7m ago  12M    26.2M        -  17.2.6
> mgr.my-ceph01.rxdefe          my-ceph01  *:8443,9283  running (3w)
>  7m ago  13M     907M        -  17.2.6
> mon.my-ceph01                 my-ceph01               running (3w)
>  7m ago  13M     503M    2048M  17.2.6
> node-exporter.my-ceph01       my-ceph01  *:9100       running (3w)
>  7m ago  13M    20.4M        -  1.5.0
> osd.3                            my-ceph01               running (3w)
>  7m ago  11M    2595M    4096M  17.2.6
> osd.5                            my-ceph01               running (3w)
>  7m ago  11M    2494M    4096M  17.2.6
> osd.6                            my-ceph01               running (3w)
>  7m ago  11M    2698M    4096M  17.2.6
> osd.9                            my-ceph01               running (3w)
>  7m ago  11M    3364M    4096M  17.2.6
> prometheus.my-ceph01          my-ceph01  *:9095       running (3w)
>  7m ago  13M     164M        -  2.42.0
>
>
>
>
> On Thu, Mar 28, 2024 at 2:13 AM Adam King &lt;adking(a)redhat.com&gt; wrote:
>
>>  I missed a step in the calculation. The total_memory_kb I mentioned
>> earlier is also multiplied by the value of the
>> mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
>> all the daemons. That value defaults to 0.7. That might explain it seeming
>> like it's getting a value lower than expected. Beyond that, I'd think
'i'd
>> need a list of the daemon types and count on that host to try and work
>> through what it's doing.
>>
>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted &lt;mads2a(a)gmail.com&gt; wrote:
>>
>>> Hi Adam.
>>>
>>> So doing the calculations with what you are stating here I arrive at a
>>> total sum for all the listed processes at 13.3 (roughly) gb, for everything
>>> except the osds, leaving well in excess of +4gb for each OSD.
>>> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
>>> none of the other daemons seem to have a limit set according to ceph orch
>>> ps. Then again, they are nowhere near the values stated in min_size_by_type
>>> that you list.
>>> Obviously yes, I could disable the auto tuning, but that would leave me
>>> none the wiser as to why this exact host is trying to do this.
>>>
>>>
>>>
>>> On Tue, Mar 26, 2024 at 10:20 PM Adam King &lt;adking(a)redhat.com&gt; wrote:
>>>
>>>> For context, the value the autotune goes with takes the value from
>>>> `cephadm gather-facts` on the host (the "memory_total_kb"
field) and then
>>>> subtracts from that per daemon on the host according to
>>>>
>>>>     min_size_by_type = {
>>>>         'mds': 4096 * 1048576,
>>>>         'mgr': 4096 * 1048576,
>>>>         'mon': 1024 * 1048576,
>>>>         'crash': 128 * 1048576,
>>>>         'keepalived': 128 * 1048576,
>>>>         'haproxy': 128 * 1048576,
>>>>         'nvmeof': 4096 * 1048576,
>>>>     }
>>>>     default_size = 1024 * 1048576
>>>>
>>>> what's left is then divided by the number of OSDs on the host to
>>>> arrive at the value. I'll also add, since it seems to be an issue on
this
>>>> particular host,  if you add the "_no_autotune_memory" label to
the host,
>>>> it will stop trying to do this on that host.
>>>>
>>>> On Mon, Mar 25, 2024 at 6:32 PM &lt;mads2a(a)gmail.com&gt; wrote:
>>>>
>>>>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04
>>>>> hosts in it, each with 4 OSD's attached. The first 2 servers
hosting mgr's
>>>>> have 32GB of RAM each, and the remaining have 24gb
>>>>> For some reason i am unable to identify, the first host in the
>>>>> cluster appears to constantly be trying to set the osd_memory_target
>>>>> variable to roughly half of what the calculated minimum is for the
cluster,
>>>>> i see the following spamming the logs constantly
>>>>> Unable to set osd_memory_target on my-ceph01 to 480485376: error
>>>>> parsing value: Value '480485376' is below minimum 939524096
>>>>> Default is set to 4294967296.
>>>>> I did double check and osd_memory_base (805306368) +
>>>>> osd_memory_cache_min (134217728) adds up to minimum exactly
>>>>> osd_memory_target_autotune is currently enabled. But i cannot for
the
>>>>> life of me figure out how it is arriving at 480485376 as a value for
that
>>>>> particular host that even has the most RAM. Neither the cluster or
the host
>>>>> is even approaching max utilization on memory, so it's not like
there are
>>>>> processes competing for resources.
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>
>>>>> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum