Developing best-practices around Ceph daemons and kubernetes memory limits - Dev

26 Mar 2020

I am trying to develop some best practices around setting Kubernetes Pod Memory Requests
and Memory Limits for Ceph daemons. 

Setting a Pod Memory Request will control how Kubernetes schedules a pod. Setting a Pod
Memory Limit will mean that the container may be killed if it exceeds the limit. 

Advice I got from Joao: In the case of Ceph monitors, they are more likely to be
experiencing memory over-use during recovery scenarios, and killing mons during this due
to exceeding a limit may make the problem much worse. The best-practice I have here is to
only set memory requests for Ceph mons, ideally 4GB.

In the case of OSDs, things are a little more complex. OSDs will read the
POD_MEMORY_REQUEST and POD_MEMORY_LIMIT environment variables which are set by Rook inside
Kubernetes pods, and OSD will tune their memory usage to meet this. They will target the
minimum between POD_MEMORY_REQUEST and [POD_MEMORY_LIMIT * 0.8]. OSDs to my understanding
aggressively try to stay within their targets. What are the risks of setting (or not
setting) Pod Memory Limits on OSDs knowing that if the limit is set too low or if the OSDs
begin to memory leak, they will be terminated and restarted by Kubernetes?
  - One risk I can imagine is that if OSDs are all started at nearly the same time and
experience similar loads, they might be likely to leak memory at similar rates and be
killed by Kubernetes at about the same time. Stampeding herds of OSD memory leaks followed
by memory limit terminations might occur which could ripple to causing other OSDs to
become unstable.
  - Not setting a limit might mean that OSDs experience memory leak and cause OOM
situations for other daemons or for the Kubernetes kubelet if the system settings
don't guarantee kubelet some amount of resources.

What are the risks of killing other daemons past a particular limit? Is it good to kill
daemons if they exceed a limit in order to prevent memory leaks from affecting the rest of
the system? MDS? RGW? MGR? NFS-Ganesha?

If anyone has knowledgeable recommendations about any daemons, I'd love your input.
Please reply-all so that I get replies straight to my inbox.
Blaine