[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

11 Apr 2024

Hi Adam. Just tried to extend the hosts memory to 48gb, and it stopped
throwing the error, and set it to 3.something gb instead

Thank you so much for you time and explainations

On Tue, Apr 9, 2024 at 9:30 PM Adam King &lt;adking(a)redhat.com&gt; wrote:

...
  The same experiment with the mds daemons pulling 4GB
instead of the 16GB,
 and me fixing the starting total memory (I accidentally used the
 memory_available_kb instead of memory_total_kb the first time) gives us

 *DEBUG    cephadm.autotune:autotune.py:35 Autotuning OSD memory with given
 parameters:Total memory: 23530995712Daemons: [<DaemonDescription>(crash.a),
 <DaemonDescription>(grafana.a), <DaemonDescription>(mds.a),
 <DaemonDescription>(mds.b), <DaemonDescription>(mds.c),
 <DaemonDescription>(mgr.a), <DaemonDescription>(mon.a),
 <DaemonDescription>(node-exporter.a), <DaemonDescription>(osd.1),
 <DaemonDescription>(osd.2), <DaemonDescription>(osd.3),
 <DaemonDescription>(osd.4), <DaemonDescription>(prometheus.a)]DEBUG
  cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
 daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 23396777984DEBUG
    cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
 grafana daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
 22323036160DEBUG    cephadm.autotune:autotune.py:40 Subtracting 4294967296
 from total for mds daemonDEBUG    cephadm.autotune:autotune.py:42 new
 total: 18028068864DEBUG    cephadm.autotune:autotune.py:40 Subtracting
 4294967296 from total for mds daemonDEBUG
  cephadm.autotune:autotune.py:42 new total: 13733101568DEBUG
  cephadm.autotune:autotune.py:40 Subtracting 4294967296 from total for mds
 daemonDEBUG    cephadm.autotune:autotune.py:42 new total: 9438134272DEBUG
  cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for mgr
 daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 5143166976DEBUG
  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for mon
 daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 4069425152DEBUG
  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
 node-exporter daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
 2995683328DEBUG    cephadm.autotune:autotune.py:50 Subtracting 1073741824
 from total for prometheus daemonDEBUG    cephadm.autotune:autotune.py:52
 new total: 1921941504DEBUG    cephadm.autotune:autotune.py:66 Final total
 is 1921941504 to be split among 4 OSDsDEBUG
  cephadm.autotune:autotune.py:68 Result is 480485376 per OSD*

 My understanding is, given starting memory_total_kb of *32827840*, we get
 *33615708160* total bytes. We multiply that by the 0.7 autotune ratio to
 get *23530995712 *bytes to be split among the daemons (something like
 23-24 GB). Then the mgr and mds daemons all get 4GB, the mon,
 node-exporter, and prometheus all take 1GB, and the crash daemon gets
 128KB. That leaves us with only 2GB to split among the 4 OSDs. That's how
 we arrive at that "480485376" number per OSD from the original error
 message you posted.

 Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
  value: Value '480485376' is below minimum
939524096 

 As that value is well below the minimum (it's only about half a GB), it
 reports that error when trying to set it.

 On Tue, Apr 9, 2024 at 12:58 PM Mads Aasted &lt;mads2a(a)gmail.com&gt; wrote:

> Hi Adam
>
> Seems like the mds_cache_memory_limit both set globally through cephadm
> and the hosts mds daemons are all set to approx. 4gb
> root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
> 4294967296
> same if query the individual mds daemons running on my-ceph01, or any of
> the other mds daemons on the other hosts.
>
> On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted &lt;mads2a(a)gmail.com&gt; wrote:
>
>> Hi Adam
>>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King &lt;adking(a)redhat.com&gt;om>:
>>
>>> I did end up writing a unit test to see what we calculated here, as
>>> well as adding a bunch of debug logging (haven't created a PR yet, but
>>> probably will).  The total memory was set to (19858056 * 1024 * 0.7) (total
>>> memory in bytes * the autotune target ratio) = 14234254540. What ended up
>>> getting logged was (ignore the daemon id for the daemons, they don't
affect
>>> anything. Only the types matter)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DEBUG    cephadm.autotune:autotune.py:35 Autotuning OSD memory with
>>> given parameters:Total memory: 14234254540Daemons:
>>> [<DaemonDescription>(crash.a), <DaemonDescription>(grafana.a),
>>> <DaemonDescription>(mds.a), <DaemonDescription>(mds.b),
>>> <DaemonDescription>(mds.c), <DaemonDescription>(mgr.a),
>>> <DaemonDescription>(mon.a),
<DaemonDescription>(node-exporter.a),
>>> <DaemonDescription>(osd.1), <DaemonDescription>(osd.2),
>>> <DaemonDescription>(osd.3), <DaemonDescription>(osd.4),
>>> <DaemonDescription>(prometheus.a)]DEBUG   
cephadm.autotune:autotune.py:50
>>> Subtracting 134217728 from total for crash daemonDEBUG
>>>  cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
>>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>>> grafana daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> 13026294988DEBUG    cephadm.autotune:autotune.py:40 Subtracting 17179869184
>>> from total for mds daemonDEBUG    cephadm.autotune:autotune.py:42 new
>>> total: -4153574196DEBUG    cephadm.autotune:autotune.py:40 Subtracting
>>> 17179869184 from total for mds daemonDEBUG
>>>  cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
>>>  cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
>>> daemonDEBUG    cephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
>>>    cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
>>> mgr daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> -42808279860DEBUG    cephadm.autotune:autotune.py:50 Subtracting 1073741824
>>> from total for mon daemonDEBUG    cephadm.autotune:autotune.py:52 new
>>> total: -43882021684DEBUG    cephadm.autotune:autotune.py:50 Subtracting
>>> 1073741824 from total for node-exporter daemonDEBUG
>>>  cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
>>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>>> prometheus daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> -46029505332*
>>>
>>> It looks like it was taking pretty much all the memory away for the mds
>>> daemons. The amount, however, is taken from the
"mds_cache_memory_limit"
>>> setting for each mds daemon. The number it was defaulting to for the test
>>> is quite large. I guess I'd need to know what that comes out to for the
mds
>>> daemons in your cluster to get a full picture. Also, you can see the total
>>> go well into the negatives here. When that happens cephadm just tries to
>>> remove the osd_memory_target config settings for the OSDs on the host, but
>>> given the error message from your initial post, it must be getting some
>>> positive value when actually running on your system.
>>>
>>> On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted &lt;mads2a(a)gmail.com&gt; wrote:
>>>
>>>> Hi Adam
>>>> No problem, i really appreciate your input :)
>>>> The memory stats returned are as follows
>>>>   "memory_available_kb": 19858056,
>>>>   "memory_free_kb": 277480,
>>>>   "memory_total_kb": 32827840,
>>>>
>>>> On Thu, Apr 4, 2024 at 10:14 PM Adam King &lt;adking(a)redhat.com&gt;
wrote:
>>>>
>>>>> Sorry to keep asking for more info, but can I also get what `cephadm
>>>>> gather-facts` on that host returns for "memory_total_kb".
Might end up
>>>>> creating a unit test out of this case if we have a calculation bug
here.
>>>>>
>>>>> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted &lt;mads2a(a)gmail.com&gt;
wrote:
>>>>>
>>>>>> sorry for the double send, forgot to hit reply all so it would
>>>>>> appear on the page
>>>>>>
>>>>>> Hi Adam
>>>>>>
>>>>>> If we multiply by 0.7, and work through the previous example
from
>>>>>> that number, we would still arrive at roughly 2.5 gb for each
osd. And the
>>>>>> host in question is trying to set it to less than 500mb.
>>>>>> I have attached a list of the processes running on the host.
>>>>>> Currently you can even see that the OSD's are taking up the
most memory by
>>>>>> far, and at least 5x its proposed minimum.
>>>>>> root@my-ceph01:/# ceph orch ps | grep my-ceph01
>>>>>> crash.my-ceph01               my-ceph01               running
(3w)
>>>>>>    7m ago  13M    9052k        -  17.2.6
>>>>>> grafana.my-ceph01             my-ceph01  *:3000       running
(3w)
>>>>>>    7m ago  13M    95.6M        -  8.3.5
>>>>>> mds.testfs.my-ceph01.xjxfzd  my-ceph01               running
(3w)
>>>>>>    7m ago  10M     485M        -  17.2.6
>>>>>> mds.prodfs.my-ceph01.rplvac   my-ceph01               running
(3w)
>>>>>>    7m ago  12M    26.9M        -  17.2.6
>>>>>> mds.prodfs.my-ceph01.twikzd    my-ceph01               running
(3w)
>>>>>>      7m ago  12M    26.2M        -  17.2.6
>>>>>> mgr.my-ceph01.rxdefe          my-ceph01  *:8443,9283  running
(3w)
>>>>>>    7m ago  13M     907M        -  17.2.6
>>>>>> mon.my-ceph01                 my-ceph01               running
(3w)
>>>>>>    7m ago  13M     503M    2048M  17.2.6
>>>>>> node-exporter.my-ceph01       my-ceph01  *:9100       running
(3w)
>>>>>>    7m ago  13M    20.4M        -  1.5.0
>>>>>> osd.3                            my-ceph01               running
>>>>>> (3w)      7m ago  11M    2595M    4096M  17.2.6
>>>>>> osd.5                            my-ceph01               running
>>>>>> (3w)      7m ago  11M    2494M    4096M  17.2.6
>>>>>> osd.6                            my-ceph01               running
>>>>>> (3w)      7m ago  11M    2698M    4096M  17.2.6
>>>>>> osd.9                            my-ceph01               running
>>>>>> (3w)      7m ago  11M    3364M    4096M  17.2.6
>>>>>> prometheus.my-ceph01          my-ceph01  *:9095       running
(3w)
>>>>>>    7m ago  13M     164M        -  2.42.0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 28, 2024 at 2:13 AM Adam King
&lt;adking(a)redhat.com&gt; wrote:
>>>>>>
>>>>>>>  I missed a step in the calculation. The total_memory_kb I
>>>>>>> mentioned earlier is also multiplied by the value of the
>>>>>>> mgr/cephadm/autotune_memory_target_ratio before doing the
subtractions for
>>>>>>> all the daemons. That value defaults to 0.7. That might
explain it seeming
>>>>>>> like it's getting a value lower than expected. Beyond
that, I'd think 'i'd
>>>>>>> need a list of the daemon types and count on that host to try
and work
>>>>>>> through what it's doing.
>>>>>>>
>>>>>>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted
&lt;mads2a(a)gmail.com&gt;
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Adam.
>>>>>>>>
>>>>>>>> So doing the calculations with what you are stating here
I arrive
>>>>>>>> at a total sum for all the listed processes at 13.3
(roughly) gb, for
>>>>>>>> everything except the osds, leaving well in excess of
+4gb for each OSD.
>>>>>>>> Besides the mon daemon which i can tell on my host has a
limit of
>>>>>>>> 2gb , none of the other daemons seem to have a limit set
according to ceph
>>>>>>>> orch ps. Then again, they are nowhere near the values
stated in
>>>>>>>> min_size_by_type that you list.
>>>>>>>> Obviously yes, I could disable the auto tuning, but that
would
>>>>>>>> leave me none the wiser as to why this exact host is
trying to do this.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 26, 2024 at 10:20 PM Adam King
&lt;adking(a)redhat.com&gt;
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For context, the value the autotune goes with takes
the value
>>>>>>>>> from `cephadm gather-facts` on the host (the
"memory_total_kb" field) and
>>>>>>>>> then subtracts from that per daemon on the host
according to
>>>>>>>>>
>>>>>>>>>     min_size_by_type = {
>>>>>>>>>         'mds': 4096 * 1048576,
>>>>>>>>>         'mgr': 4096 * 1048576,
>>>>>>>>>         'mon': 1024 * 1048576,
>>>>>>>>>         'crash': 128 * 1048576,
>>>>>>>>>         'keepalived': 128 * 1048576,
>>>>>>>>>         'haproxy': 128 * 1048576,
>>>>>>>>>         'nvmeof': 4096 * 1048576,
>>>>>>>>>     }
>>>>>>>>>     default_size = 1024 * 1048576
>>>>>>>>>
>>>>>>>>> what's left is then divided by the number of OSDs
on the host to
>>>>>>>>> arrive at the value. I'll also add, since it
seems to be an issue on this
>>>>>>>>> particular host,  if you add the
"_no_autotune_memory" label to the host,
>>>>>>>>> it will stop trying to do this on that host.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 25, 2024 at 6:32 PM
&lt;mads2a(a)gmail.com&gt; wrote:
>>>>>>>>>
>>>>>>>>>> I have a virtual ceph cluster running 17.2.6 with
4 ubuntu 22.04
>>>>>>>>>> hosts in it, each with 4 OSD's attached. The
first 2 servers hosting mgr's
>>>>>>>>>> have 32GB of RAM each, and the remaining have
24gb
>>>>>>>>>> For some reason i am unable to identify, the
first host in the
>>>>>>>>>> cluster appears to constantly be trying to set
the osd_memory_target
>>>>>>>>>> variable to roughly half of what the calculated
minimum is for the cluster,
>>>>>>>>>> i see the following spamming the logs constantly
>>>>>>>>>> Unable to set osd_memory_target on my-ceph01 to
480485376: error
>>>>>>>>>> parsing value: Value '480485376' is below
minimum 939524096
>>>>>>>>>> Default is set to 4294967296.
>>>>>>>>>> I did double check and osd_memory_base
(805306368) +
>>>>>>>>>> osd_memory_cache_min (134217728) adds up to
minimum exactly
>>>>>>>>>> osd_memory_target_autotune is currently enabled.
But i cannot
>>>>>>>>>> for the life of me figure out how it is arriving
at 480485376 as a value
>>>>>>>>>> for that particular host that even has the most
RAM. Neither the cluster or
>>>>>>>>>> the host is even approaching max utilization on
memory, so it's not like
>>>>>>>>>> there are processes competing for resources.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>>> To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>>>>>>>>>>
>>>>>>>>>> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum