1. The current problem is that it still sending data over the ethernet instead of ib.
  2. [global]
    fsid=xxxx
    mon_initial_members = node1, node2, node3
    mon_host = xxx.xx.xxx.ab,xxx.xx.xxx.ac, xxx.xx.xxx.ad
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    public_network = xxx.xx.xxx.0/24
    cluster_network = xx.xxx.0.0/16
    ms_cluster_type = async+rdma
    ms_type = async+rdma
    ms_public_type = async+posix
    [mgr]
    ms_type = async+posix
  3. The ceph cluster is deployed using ceph-deploy then once up all of the daemons are turned off the rdma cluster config is then sent around then once that is complete the daemons are turned back on. The ulimit is set to unlimited, LimitMEMLOCK=infinity is set on the ceph-disk@.service, ceph-mds@.service, ceph-mon@.service, ceph-osd@.service, ceph-radosgw@.service, aswell as PrivateDevices=no on ceph-mds@.service, ceph-mon@.service and ceph-radosgw@.service. The ethernet mtu is set to 1000

From: Liu, Changcheng <changcheng.liu@intel.com>
Sent: 30 October 2019 12:24
To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) <gabryel.mason-williams@diamond.ac.uk>
Cc: dev@ceph.io <dev@ceph.io>
Subject: Re: RMDA Bug?
 
1. What's the problem do you hit when using RDMA in 14.2.4? Any log shows the error?
2. What's your ceph.conf?
3. How do you deploy the ceph cluster? RDMA need lock some memory. So,
it needs change some system configuration to meet with this requirement?

On 11:21 Wed 30 Oct, Gabryel Mason-Williams wrote:
> Liu, Changcheng wrote:
> > On 07:31 Mon 28 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote:
> > >     I am using ceph version 12.2.8
> > >     (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable).
> > > 
> > >     I have not checked the master branch do you think this is an issue in
> > >     luminous that has been removed in later versions?      I haven't hit problem
> > on master branch. Ceph/RDMA changed a lot
> >       from luminous to master branch.
> >
> >       Is below configuration really needed in luminous/ceph.conf?
> > >     ms_async_rdma_local_gid = xxxx          On master branch, this
> > parameter is not needed at all.
> > B.R.
> > Changcheng
> > >      __________________________________________________________________
>
> Thanks, the issue of the OSD's falling over seems to have gone away updating to Nautilus 14.2.4. However, I am still unable to get it to properly communicate over RDMA even with removing ms_async_rdma_local_gid.
> _______________________________________________
> Dev mailing list -- dev@ceph.io
> To unsubscribe send an email to dev-leave@ceph.io

 

-- 

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom