Hi Adam
Seems like the mds_cache_memory_limit both set globally through cephadm and
the hosts mds daemons are all set to approx. 4gb
root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
4294967296
same if query the individual mds daemons running on my-ceph01, or any of
the other mds daemons on the other hosts.
On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted <mads2a(a)gmail.com> wrote:
Hi Adam
Let me just finish tucking in a devlish tyke here and i’ll get to it first
thing
tirs. 9. apr. 2024 kl. 18.09 skrev Adam King <adking(a)redhat.com>om>:
> I did end up writing a unit test to see what we calculated here, as well
> as adding a bunch of debug logging (haven't created a PR yet, but probably
> will). The total memory was set to (19858056 * 1024 * 0.7) (total memory
> in bytes * the autotune target ratio) = 14234254540. What ended up getting
> logged was (ignore the daemon id for the daemons, they don't affect
> anything. Only the types matter)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *DEBUG cephadm.autotune:autotune.py:35 Autotuning OSD memory with
> given parameters:Total memory: 14234254540Daemons:
> [<DaemonDescription>(crash.a), <DaemonDescription>(grafana.a),
> <DaemonDescription>(mds.a), <DaemonDescription>(mds.b),
> <DaemonDescription>(mds.c), <DaemonDescription>(mgr.a),
> <DaemonDescription>(mon.a), <DaemonDescription>(node-exporter.a),
> <DaemonDescription>(osd.1), <DaemonDescription>(osd.2),
> <DaemonDescription>(osd.3), <DaemonDescription>(osd.4),
> <DaemonDescription>(prometheus.a)]DEBUG cephadm.autotune:autotune.py:50
> Subtracting 134217728 from total for crash daemonDEBUG
> cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
> cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> grafana daemonDEBUG cephadm.autotune:autotune.py:52 new total:
> 13026294988DEBUG cephadm.autotune:autotune.py:40 Subtracting 17179869184
> from total for mds daemonDEBUG cephadm.autotune:autotune.py:42 new
> total: -4153574196DEBUG cephadm.autotune:autotune.py:40 Subtracting
> 17179869184 from total for mds daemonDEBUG
> cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
> cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
> daemonDEBUG cephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
> cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
> mgr daemonDEBUG cephadm.autotune:autotune.py:52 new total:
> -42808279860DEBUG cephadm.autotune:autotune.py:50 Subtracting 1073741824
> from total for mon daemonDEBUG cephadm.autotune:autotune.py:52 new
> total: -43882021684DEBUG cephadm.autotune:autotune.py:50 Subtracting
> 1073741824 from total for node-exporter daemonDEBUG
> cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
> cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> prometheus daemonDEBUG cephadm.autotune:autotune.py:52 new total:
> -46029505332*
>
> It looks like it was taking pretty much all the memory away for the mds
> daemons. The amount, however, is taken from the "mds_cache_memory_limit"
> setting for each mds daemon. The number it was defaulting to for the test
> is quite large. I guess I'd need to know what that comes out to for the mds
> daemons in your cluster to get a full picture. Also, you can see the total
> go well into the negatives here. When that happens cephadm just tries to
> remove the osd_memory_target config settings for the OSDs on the host, but
> given the error message from your initial post, it must be getting some
> positive value when actually running on your system.
>
> On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted <mads2a(a)gmail.com> wrote:
>
>> Hi Adam
>> No problem, i really appreciate your input :)
>> The memory stats returned are as follows
>> "memory_available_kb": 19858056,
>> "memory_free_kb": 277480,
>> "memory_total_kb": 32827840,
>>
>> On Thu, Apr 4, 2024 at 10:14 PM Adam King <adking(a)redhat.com> wrote:
>>
>>> Sorry to keep asking for more info, but can I also get what `cephadm
>>> gather-facts` on that host returns for "memory_total_kb". Might end
up
>>> creating a unit test out of this case if we have a calculation bug here.
>>>
>>> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted <mads2a(a)gmail.com> wrote:
>>>
>>>> sorry for the double send, forgot to hit reply all so it would appear
>>>> on the page
>>>>
>>>> Hi Adam
>>>>
>>>> If we multiply by 0.7, and work through the previous example from that
>>>> number, we would still arrive at roughly 2.5 gb for each osd. And the
host
>>>> in question is trying to set it to less than 500mb.
>>>> I have attached a list of the processes running on the host. Currently
>>>> you can even see that the OSD's are taking up the most memory by far,
and
>>>> at least 5x its proposed minimum.
>>>> root@my-ceph01:/# ceph orch ps | grep my-ceph01
>>>> crash.my-ceph01 my-ceph01 running (3w)
>>>> 7m ago 13M 9052k - 17.2.6
>>>> grafana.my-ceph01 my-ceph01 *:3000 running (3w)
>>>> 7m ago 13M 95.6M - 8.3.5
>>>> mds.testfs.my-ceph01.xjxfzd my-ceph01 running (3w)
>>>> 7m ago 10M 485M - 17.2.6
>>>> mds.prodfs.my-ceph01.rplvac my-ceph01 running (3w)
>>>> 7m ago 12M 26.9M - 17.2.6
>>>> mds.prodfs.my-ceph01.twikzd my-ceph01 running (3w)
>>>> 7m ago 12M 26.2M - 17.2.6
>>>> mgr.my-ceph01.rxdefe my-ceph01 *:8443,9283 running (3w)
>>>> 7m ago 13M 907M - 17.2.6
>>>> mon.my-ceph01 my-ceph01 running (3w)
>>>> 7m ago 13M 503M 2048M 17.2.6
>>>> node-exporter.my-ceph01 my-ceph01 *:9100 running (3w)
>>>> 7m ago 13M 20.4M - 1.5.0
>>>> osd.3 my-ceph01 running (3w)
>>>> 7m ago 11M 2595M 4096M 17.2.6
>>>> osd.5 my-ceph01 running (3w)
>>>> 7m ago 11M 2494M 4096M 17.2.6
>>>> osd.6 my-ceph01 running (3w)
>>>> 7m ago 11M 2698M 4096M 17.2.6
>>>> osd.9 my-ceph01 running (3w)
>>>> 7m ago 11M 3364M 4096M 17.2.6
>>>> prometheus.my-ceph01 my-ceph01 *:9095 running (3w)
>>>> 7m ago 13M 164M - 2.42.0
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 28, 2024 at 2:13 AM Adam King <adking(a)redhat.com>
wrote:
>>>>
>>>>> I missed a step in the calculation. The total_memory_kb I mentioned
>>>>> earlier is also multiplied by the value of the
>>>>> mgr/cephadm/autotune_memory_target_ratio before doing the
subtractions for
>>>>> all the daemons. That value defaults to 0.7. That might explain it
seeming
>>>>> like it's getting a value lower than expected. Beyond that,
I'd think 'i'd
>>>>> need a list of the daemon types and count on that host to try and
work
>>>>> through what it's doing.
>>>>>
>>>>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted
<mads2a(a)gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Adam.
>>>>>>
>>>>>> So doing the calculations with what you are stating here I arrive
at
>>>>>> a total sum for all the listed processes at 13.3 (roughly) gb,
for
>>>>>> everything except the osds, leaving well in excess of +4gb for
each OSD.
>>>>>> Besides the mon daemon which i can tell on my host has a limit
of
>>>>>> 2gb , none of the other daemons seem to have a limit set
according to ceph
>>>>>> orch ps. Then again, they are nowhere near the values stated in
>>>>>> min_size_by_type that you list.
>>>>>> Obviously yes, I could disable the auto tuning, but that would
leave
>>>>>> me none the wiser as to why this exact host is trying to do
this.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 26, 2024 at 10:20 PM Adam King
<adking(a)redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> For context, the value the autotune goes with takes the value
from
>>>>>>> `cephadm gather-facts` on the host (the
"memory_total_kb" field) and then
>>>>>>> subtracts from that per daemon on the host according to
>>>>>>>
>>>>>>> min_size_by_type = {
>>>>>>> 'mds': 4096 * 1048576,
>>>>>>> 'mgr': 4096 * 1048576,
>>>>>>> 'mon': 1024 * 1048576,
>>>>>>> 'crash': 128 * 1048576,
>>>>>>> 'keepalived': 128 * 1048576,
>>>>>>> 'haproxy': 128 * 1048576,
>>>>>>> 'nvmeof': 4096 * 1048576,
>>>>>>> }
>>>>>>> default_size = 1024 * 1048576
>>>>>>>
>>>>>>> what's left is then divided by the number of OSDs on the
host to
>>>>>>> arrive at the value. I'll also add, since it seems to be
an issue on this
>>>>>>> particular host, if you add the
"_no_autotune_memory" label to the host,
>>>>>>> it will stop trying to do this on that host.
>>>>>>>
>>>>>>> On Mon, Mar 25, 2024 at 6:32 PM <mads2a(a)gmail.com>
wrote:
>>>>>>>
>>>>>>>> I have a virtual ceph cluster running 17.2.6 with 4
ubuntu 22.04
>>>>>>>> hosts in it, each with 4 OSD's attached. The first 2
servers hosting mgr's
>>>>>>>> have 32GB of RAM each, and the remaining have 24gb
>>>>>>>> For some reason i am unable to identify, the first host
in the
>>>>>>>> cluster appears to constantly be trying to set the
osd_memory_target
>>>>>>>> variable to roughly half of what the calculated minimum
is for the cluster,
>>>>>>>> i see the following spamming the logs constantly
>>>>>>>> Unable to set osd_memory_target on my-ceph01 to
480485376: error
>>>>>>>> parsing value: Value '480485376' is below minimum
939524096
>>>>>>>> Default is set to 4294967296.
>>>>>>>> I did double check and osd_memory_base (805306368) +
>>>>>>>> osd_memory_cache_min (134217728) adds up to minimum
exactly
>>>>>>>> osd_memory_target_autotune is currently enabled. But i
cannot for
>>>>>>>> the life of me figure out how it is arriving at 480485376
as a value for
>>>>>>>> that particular host that even has the most RAM. Neither
the cluster or the
>>>>>>>> host is even approaching max utilization on memory, so
it's not like there
>>>>>>>> are processes competing for resources.
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>>>>
>>>>>>>>