Hey all,
We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods like so:
[pid 22561] sendmsg(77, {msg_name(0)=NULL, msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"..., 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, {"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …>
I’ve checked other OSDs and only a single OSD receives these messages. I suspect its creating a bottleneck. Does anyone have an idea on why these are being generated or how to stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing simple gets/puts/deletes.
We’re running Ceph 14.2.5 nautilus
Thank you!
Hi everyone,
Currently, our client application and Ceph cluster are running on the primary datacenter. We’re planning to deploy Ceph on the secondary datacenter for DR. The secondary datacenter is in the standby mode. If something went wrong with the primary datacenter, the secondary datacenter will take over.
The possible way would work in this case is that adding hosts from the secondary datacenter into the existed Ceph cluster in the primary datacenter. By this way, it would add more latency for client requests since client from primary datacenter might connects to OSD hosts in the secondary datacenter)
Are there any special configurations in Ceph that fulfill this requirement?
[cid:image001.png@01D6380A.599C1D00]
I truly appreciate any comments!
Nghia.
Hi,
For previous Ceph version upgrades, we've used the rolling_upgrade
playbook from Ceph-ansible - for example, the stable-3.0 branch supports
both Jewel and Luminous, so we used it to migrate our clusters from
Jewel to Luminous.
As I understand it, upgrading direct from Luminous to Nautilus is a
supported operation. But there is no Ceph-ansible release that supports
both versions. Indeed, stable-4.0 supports Nautilus but no other releases.
Is the expected process to use stable-4.0 for the upgrade, or do we have
to do the upgrade by hand and only then update our version of ceph-ansible?
Thanks,
Matthew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.