@Haomai,
Does HAVE_IBV_EXP still work with any RNIC in current Ceph repository?
@Nasution:
I have never used below options yet
ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2?
ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use
0000:0000:0000:0000:0000 :****:****:**** one?
To use RDMA, you may need:
1) configure “ulimit -l” to be unlimited
2) For RNIC with SRQ function:
a. below configuration should be OK
ms_async_rdma_device_name = mlx5_bond_0
ms_cluster_type = async+rdma
ms_public_type = async+posix
b. If you need to different RoCEv1 or RoCEv2, you need to configure
“ms_async_rdma_gid_idx”
Reference:
https://github.com/ceph/ceph/pull/31517/commits/b971cff51a9179c02f85a27cc19…
From: Lazuardi Nasution <mrxlazuardin(a)gmail.com>
Sent: Thursday, September 10, 2020 12:23 AM
To: Liu, Changcheng <changcheng.liu(a)intel.com>
Subject: Ceph with RDMA
Hi,
I'm reading your post regarding Ceph with RDMA. Have you solved your problem? I'm
trying the same way, but currently I'm facing a problem that some OSDs are
automatically down not so long after it up due to no heartbeat reply, even for the newly
installed cluster. I'm using the following RDMA related configuration.
[global]
.......
ms_async_rdma_device_name = mlx5_bond_0
ms_cluster_type = async+rdma
ms_public_type = async+posix
#/rbd does not support rdma
ms_async_rdma_polling_us = 0
ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2?
ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use
0000:0000:0000:0000:0000 :****:****:**** one?
[mgr]
ms_type = async+posix
I have put "LimitMEMLOCK on OSD (because it is the only one that failed to start
without it) systemd unit file. "Would you mind sharing your configuration of working
Ceph with RDMA? Do I miss something?
Best regards,