Hey all,
We’ve been running some benchmarks against Ceph which we deployed using the Rook operator
in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD
receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some
investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods
like so:
[pid 22561] sendmsg(77, {msg_name(0)=NULL,
msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"...,
73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314},
{"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}],
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …>
I’ve checked other OSDs and only a single OSD receives these messages. I suspect its
creating a bottleneck. Does anyone have an idea on why these are being generated or how to
stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing
simple gets/puts/deletes.
We’re running Ceph 14.2.5 nautilus
Thank you!
Show replies by date