Hi Williams,
Besides usign same port(both publid and cluster network use RDMA) for
RDMA messenger, I also tried to use public-network-TCP-messenger and
cluster-network-RDMA-messenger. There's no serious problem happen.
The ceph is built by self based on master commit 8cb1f6bd(Wed Nov 6
18:43:41 2019 -0500).
I don't have too many nodes to check your problem.
BTW, on "ceph-users Digest, Vol 82, Issue 27", there's below item:
2. Re: mgr daemons becoming unresponsive (Gregory Farnum)
However, I haven't hit mgr problem in my side.
B.R.
Changcheng
On 08:53 Mon 11 Nov, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote:
> @Changcheng
>
> Sorry for the late reply as well.
>
> I followed your setup and I have an issue where the MGR cannot connect
> to the cluster and RDMA does not work, I believe the MGR is not
> supported on RDMA.
>
> Thank you for your time but I believe we may be hitting a dead end with
> this approach as we seem to get different results.
>
> Kind regards
>
> Gabryel Mason-Williams
> __________________________________________________________________
>
> From: Liu, Changcheng <changcheng.liu(a)intel.com>
> Sent: 01 November 2019 06:24
> To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
> <gabryel.mason-williams(a)diamond.ac.uk>
> Cc: dev(a)ceph.io <dev(a)ceph.io>
> Subject: Re: RMDA Bug?
>
> @Williams,
> Sorry for late reply. I'm busy on working getting Ceph/RDMA
> performance data these days.
> I'm using Intel RDMA NIC with small cluster, there's no serious
> issue
> happened.
> For Mellanox NIC, there's no problem with your ceph.conf from my
> perspective.
> Below is the steps that I used to deploy cluster
> 1. server0: 172.16.1.4, /dev/nvme0n1, /dev/nvme1n1
> 2. server1: 172.16.1.2, /dev/nvme0n1, /dev/nvme1n1
>
> Below is my deploy steps:
> [admin@server0 deploy]$ ceph-deploy new server0 --fsid
> 24280750-d4f7-4d4f-89e4-f95b8fab87ff
> [admin@server0 deploy]$ #change ceph.conf as below:
> [admin@server0 deploy]$ cat ceph.conf
> [global]
> cluster = ceph
> fsid = 24280750-d4f7-4d4f-89e4-f95b8fab87ff
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd pool default size = 2
> osd pool default min size = 2
> osd pool default pg num = 64
> osd pool default pgp num = 128
>
> osd pool default crush rule = 0
> osd crush chooseleaf type = 1
>
> mon_allow_pool_delete=true
> osd_pool_default_pg_autoscale_mode=on
>
> ms_type = async+rdma
> ;----changcheng: change device to your dev name----------
> ms_async_rdma_device_name = irdma1
> ;----changcheng: ignore below parameters with Mellanox
> NIC--------
> ;ms_async_rdma_support_srq = false
>
> mon_initial_members = server0
> mon_host = 172.16.1.4
>
> [mon.rdmarhel0]
> host = server0
> mon addr = 172.16.1.4
> [admin@server0 deploy]$ ceph-deploy mon create-initial
> [admin@server0 deploy]$ ceph-deploy admin server0 server1
> [admin@server0 deploy]$ ceph-deploy mgr create server0
> [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme0n1
> server0
> [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme1n1
> server0
> [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme0n1
> server1
> [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme1n1
> server1
> B.R.
> Changcheng
> On 08:27 Thu 31 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote:
> > 1. When not defining a public and cluster network the OSD and MGR
> > nodes do not get recognised
> >
> > sudo ceph -s
> >
> > cluster:
> >
> > id: 820f1573-bc4a-4ee0-b702-80ba5ac13c25
> >
> > health: HEALTH_WARN
> >
> > 3 osds down
> >
> > 3 hosts (3 osds) down
> >
> > 1 root (3 osds) down
> >
> > no active mgr
> >
> > too few PGs per OSD (21 < min 30)
> >
> >
> > services:
> >
> > mon: 3 daemons, quorum
> > cs04r-sc-com99-05,cs04r-sc-com99-07,cs04r-sc-com99-08 (age 5m)
> >
> > mgr: no daemons active (since 4m)
> >
> > osd: 3 osds: 0 up (since 9m), 3 in (since 9m)
> >
> >
> > data:
> >
> > pools: 1 pools, 64 pgs
> >
> > objects: 0 objects, 0 B
> >
> > usage: 3.0 GiB used, 114 GiB / 117 GiB avail
> >
> > pgs: 44 stale+active+clean
> >
> > 20 active+clean
> >
> > This is an issue within the ms_type being async+rdma as the
> daemons are
> > running:
> >
> > sudo systemctl status ceph-osd.target
> >
> > $B!|(B ceph-osd.target - ceph target allowing to start/stop all
> > ceph-osd@.service instances at once
> >
> > Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target;
> enabled;
> > vendor preset: enabled)
> >
> > Active: active since Thu 2019-10-31 08:13:42 GMT; 8min ago
> >
> > sudo systemctl status ceph-mgr.target
> >
> > $B!|(B ceph-mgr.target - ceph target allowing to start/stop all
> > ceph-mgr@.service instances at once
> > Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target;
> enabled;
> > vendor preset: enabled)
> > Active: active since Thu 2019-10-31 08:13:33 GMT; 11min ago
> >
> > With the config being
> >
> > [global]
> >
> > fsid = 820f1573-bc4a-4ee0-b702-80ba5ac13c25
> >
> > mon_initial_members = node1, node2, node3
> >
> > mon_host = xxx.xx.xxx.aa,xxx.xx.xxx.ac, xxx.xx.xxx.ad
> >
> > auth_cluster_required = cephx
> >
> > auth_service_required = cephx
> >
> > auth_client_required = cephx
> >
> > ms_type = async+rdma
> >
> > ms_async_rdma_device_name = mlx4_0
> >
> __________________________________________________________________
> >
> > From: Liu, Changcheng <changcheng.liu(a)intel.com>
> > Sent: 31 October 2019 01:09
> > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
> > <gabryel.mason-williams(a)diamond.ac.uk>
> > Cc: dev(a)ceph.io <dev(a)ceph.io>
> > Subject: Re: RMDA Bug?
> >
> > > 2) I'll confirm with my colleague that whether cluster network
> is
> > really used in 14.2.4. We also hit similar problem these days even
> > using TCP async messenger.
> > [Changcheng]:
> > 1) The problem should be already sovled in 14.2.4. We hit the
> problem
> > in 14.2.1
> > 2) I'll try to verify your problem when I have time(I'm working
on
> > other
> > affairs). There should be no problem when unifying both
> public/cluster
> > network with RDMA device.
> > On 23:22 Wed 30 Oct, Liu, Changcheng wrote:
> > > I'm working on master branch and deploy two nodes cluster. Data
> is
> > transferring over RDMA.
> > > [admin@server0 ~]$ sudo ceph daemon osd.0 perf dump
> > AsyncMessenger::RDMAWorker-1
> > > {
> > > "AsyncMessenger::RDMAWorker-1": {
> > > "tx_no_mem": 0,
> > > "tx_parital_mem": 0,
> > > "tx_failed_post": 0,
> > > "tx_chunks": 26966,
> > > "tx_bytes": 52789637,
> > > "rx_chunks": 26916,
> > > "rx_bytes": 52812278,
> > > "pending_sent_conns": 0
> > > }
> > > }
> > >
> > > The only difference is that I don$B!G(Bt differentiate
> public/cluster
> > network in my cluster.
> > > You can try to make all public/cluster network use RDMA.
> > > Note:
> > > 1) If both public/cluster use RDMA, we can$B!G(Bt
> differentiate them in
> > different subnetwork. This is feature limited. I'm planning to
> solve it
> > in future)
> > > 2) I'll confirm with my colleague that whether cluster network
> is
> > really used in 14.2.4. We also hit similar problem these days even
> > using TCP async messenger.
> > >
> > > Below is my cluster's ceph configuration.
> > > I also attach the systemd patch used in my side.
> > > [admin@server0 ~]$ cat /etc/ceph/ceph.conf
> > > [global]
> > > cluster = ceph
> > > fsid = 24280750-d4f7-4d4f-89e4-f95b8fab87ff
> > > auth_cluster_required = cephx
> > > auth_service_required = cephx
> > > auth_client_required = cephx
> > >
> > > osd pool default size = 2
> > > osd pool default min size = 2
> > > osd pool default pg num = 64
> > > osd pool default pgp num = 128
> > >
> > > osd pool default crush rule = 0
> > > osd crush chooseleaf type = 1
> > >
> > > mon_allow_pool_delete=true
> > > osd_pool_default_pg_autoscale_mode=off
> > >
> > > ms_type = async+rdma
> > > ms_async_rdma_device_name = mlx5_0
> > >
> > > mon_initial_members = server0
> > > mon_host = 172.16.1.4
> > >
> > > [mon.rdmarhel0]
> > > host = server0
> > > mon addr = 172.16.1.4
> > > [admin@server0 ~]$
> > >
> > > B.R.
> > > Changcheng
> > >
> > > On 13:07 Wed 30 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
> wrote:
> > > > 1. The current problem is that it still sending data over
> the
> > ethernet
> > > > instead of ib.
> > > > 2. [global]
> > > > fsid=xxxx
> > > > mon_initial_members = node1, node2, node3
> > > > mon_host = xxx.xx.xxx.ab,xxx.xx.xxx.ac, xxx.xx.xxx.ad
> > > > auth_cluster_required = cephx
> > > > auth_service_required = cephx
> > > > auth_client_required = cephx
> > > > public_network = xxx.xx.xxx.0/24
> > > > cluster_network = xx.xxx.0.0/16
> > > > ms_cluster_type = async+rdma
> > > > ms_type = async+rdma
> > > > ms_public_type = async+posix
> > > > [mgr]
> > > > ms_type = async+posix
> > > > 3. The ceph cluster is deployed using ceph-deploy then
> once up
> > all of
> > > > the daemons are turned off the rdma cluster config is
> then
> > sent
> > > > around then once that is complete the daemons are
> turned
> > back on.
> > > > The ulimit is set to unlimited, LimitMEMLOCK=infinity
> is set
> > on the
> > > > ceph-disk@.service, ceph-mds@.service,
> ceph-mon@.service,
> > > > ceph-osd@.service, ceph-radosgw@.service, aswell as
> > > > PrivateDevices=no on ceph-mds@.service,
> ceph-mon@.service
> > and
> > > > ceph-radosgw@.service. The ethernet mtu is set to 1000
> > > >
> > __________________________________________________________________
> > > >
> > > > From: Liu, Changcheng <changcheng.liu(a)intel.com>
> > > > Sent: 30 October 2019 12:24
> > > > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
> > > > <gabryel.mason-williams(a)diamond.ac.uk>
> > > > Cc: dev(a)ceph.io <dev(a)ceph.io>
> > > > Subject: Re: RMDA Bug?
> > > >
> > > > 1. What's the problem do you hit when using RDMA in
14.2.4?
> Any
> > log
> > > > shows the error?
> > > > 2. What's your ceph.conf?
> > > > 3. How do you deploy the ceph cluster? RDMA need lock some
> > memory. So,
> > > > it needs change some system configuration to meet with this
> > > > requirement?
> > > > On 11:21 Wed 30 Oct, Gabryel Mason-Williams wrote:
> > > > > Liu, Changcheng wrote:
> > > > > > On 07:31 Mon 28 Oct, Mason-Williams, Gabryel
> > (DLSLtd,RAL,LSCI)
> > > > wrote:
> > > > > > > I am using ceph version 12.2.8
> > > > > > >
(ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous
> > (stable).
> > > > > > >
> > > > > > > I have not checked the master branch do
you think
> this
> > is an
> > > > issue in
> > > > > > > luminous that has been removed in later
versions?
> > I
> > > > haven't hit problem
> > > > > > on master branch. Ceph/RDMA changed a lot
> > > > > > from luminous to master branch.
> > > > > >
> > > > > > Is below configuration really needed in
> > luminous/ceph.conf?
> > > > > > > ms_async_rdma_local_gid = xxxx On
master
> > branch,
> > > > this
> > > > > > parameter is not needed at all.
> > > > > > B.R.
> > > > > > Changcheng
> > > > > > >
> > > >
> > __________________________________________________________________
> > > > >
> > > > > Thanks, the issue of the OSD's falling over seems to
have
> gone
> > away
> > > > updating to Nautilus 14.2.4. However, I am still unable to
> get
> > it to
> > > > properly communicate over RDMA even with removing
> > > > ms_async_rdma_local_gid.
> > > > > _______________________________________________
> > > > > Dev mailing list -- dev(a)ceph.io
> > > > > To unsubscribe send an email to dev-leave(a)ceph.io
> > > >
> > > >
> > > > --
> > > >
> > > > This e-mail and any attachments may contain confidential,
> > copyright and
> > > > or privileged material, and are for the use of the intended
> > addressee
> > > > only. If you are not the intended addressee or an
> authorised
> > recipient
> > > > of the addressee please notify us of receipt by returning
> the
> > e-mail
> > > > and do not use, copy, retain, distribute or disclose the
> > information in
> > > > or attached to the e-mail.
> > > > Any opinions expressed within this e-mail are those of the
> > individual
> > > > and not necessarily of Diamond Light Source Ltd.
> > > > Diamond Light Source Ltd. cannot guarantee that this e-mail
> or
> > any
> > > > attachments are free from viruses and we cannot accept
> liability
> > for
> > > > any damage which you may sustain as a result of software
> viruses
> > which
> > > > may be transmitted in or with the message.
> > > > Diamond Light Source Limited (company no. 4375679).
> Registered
> > in
> > > > England and Wales with its registered office at Diamond
> House,
> > Harwell
> > > > Science and Innovation Campus, Didcot, Oxfordshire, OX11
> 0DE,
> > United
> > > > Kingdom
> > >
> > > > _______________________________________________
> > > > Dev mailing list -- dev(a)ceph.io
> > > > To unsubscribe send an email to dev-leave(a)ceph.io
> > >
> > > From 40fa0d7096364b410e8242c46967029fb949876a Mon Sep 17
> 00:00:00
> > 2001
> > > From: Changcheng Liu <changcheng.liu(a)aliyun.com>
> > > Date: Tue, 23 Jul 2019 18:50:57 +0800
> > > Subject: [PATCH] rdma systemd: grant access to /dev and unlimit
> mem
> > >
> > > Signed-off-by: Changcheng Liu <changcheng.liu(a)aliyun.com>
> > >
> > > diff --git a/systemd/ceph-fuse@.service.in
> > b/systemd/ceph-fuse@.service.in
> > > index d603042b12..ff2e9072f6 100644
> > > --- a/systemd/ceph-fuse@.service.in
> > > +++ b/systemd/ceph-fuse@.service.in
> > > @@ -12,6 +12,7 @@ ExecStart=/usr/bin/ceph-fuse -f --cluster
> > ${CLUSTER} %I
> > > LockPersonality=true
> > > MemoryDenyWriteExecute=true
> > > NoNewPrivileges=true
> > > +LimitMEMLOCK=infinity
> > > # ceph-fuse requires access to /dev fuse device
> > > PrivateDevices=no
> > > ProtectControlGroups=true
> > > diff --git a/systemd/ceph-mds@.service.in
> > b/systemd/ceph-mds@.service.in
> > > index 39a2e63105..0e58dfeeea 100644
> > > --- a/systemd/ceph-mds@.service.in
> > > +++ b/systemd/ceph-mds@.service.in
> > > @@ -14,7 +14,8 @@ ExecReload=/bin/kill -HUP $MAINPID
> > > LockPersonality=true
> > > MemoryDenyWriteExecute=true
> > > NoNewPrivileges=true
> > > -PrivateDevices=yes
> > > +LimitMEMLOCK=infinity
> > > +PrivateDevices=no
> > > ProtectControlGroups=true
> > > ProtectHome=true
> > > ProtectKernelModules=true
> > > diff --git a/systemd/ceph-mgr@.service.in
> > b/systemd/ceph-mgr@.service.in
> > > index c98f6378b9..682c7ecef3 100644
> > > --- a/systemd/ceph-mgr@.service.in
> > > +++ b/systemd/ceph-mgr@.service.in
> > > @@ -18,7 +18,8 @@ LockPersonality=true
> > > MemoryDenyWriteExecute=false
> > >
> > > NoNewPrivileges=true
> > > -PrivateDevices=yes
> > > +LimitMEMLOCK=infinity
> > > +PrivateDevices=no
> > > ProtectControlGroups=true
> > > ProtectHome=true
> > > ProtectKernelModules=true
> > > diff --git a/systemd/ceph-mon@.service.in
> > b/systemd/ceph-mon@.service.in
> > > index c95fcabb26..51854fad96 100644
> > > --- a/systemd/ceph-mon@.service.in
> > > +++ b/systemd/ceph-mon@.service.in
> > > @@ -21,7 +21,8 @@ LockPersonality=true
> > > MemoryDenyWriteExecute=true
> > > # Need NewPrivileges via `sudo smartctl`
> > > NoNewPrivileges=false
> > > -PrivateDevices=yes
> > > +LimitMEMLOCK=infinity
> > > +PrivateDevices=no
> > > ProtectControlGroups=true
> > > ProtectHome=true
> > > ProtectKernelModules=true
> > > diff --git a/systemd/ceph-osd@.service.in
> > b/systemd/ceph-osd@.service.in
> > > index 1b5c9c82b8..06c20d7c83 100644
> > > --- a/systemd/ceph-osd@.service.in
> > > +++ b/systemd/ceph-osd@.service.in
> > > @@ -16,6 +16,8 @@ LockPersonality=true
> > > MemoryDenyWriteExecute=true
> > > # Need NewPrivileges via `sudo smartctl`
> > > NoNewPrivileges=false
> > > +LimitMEMLOCK=infinity
> > > +PrivateDevices=no
> > > ProtectControlGroups=true
> > > ProtectHome=true
> > > ProtectKernelModules=true
> > > diff --git a/systemd/ceph-radosgw@.service.in
> > b/systemd/ceph-radosgw@.service.in
> > > index 7e3ddf6c04..fe1a6b9159 100644
> > > --- a/systemd/ceph-radosgw@.service.in
> > > +++ b/systemd/ceph-radosgw@.service.in
> > > @@ -13,7 +13,8 @@ ExecStart=/usr/bin/radosgw -f --cluster
> ${CLUSTER}
> > --name client.%i --setuser ce
> > > LockPersonality=true
> > > MemoryDenyWriteExecute=true
> > > NoNewPrivileges=true
> > > -PrivateDevices=yes
> > > +LimitMEMLOCK=infinity
> > > +PrivateDevices=no
> > > ProtectControlGroups=true
> > > ProtectHome=true
> > > ProtectKernelModules=true
> > > diff --git a/systemd/ceph-volume@.service
> > b/systemd/ceph-volume@.service
> > > index c21002cecb..e2d1f67b85 100644
> > > --- a/systemd/ceph-volume@.service
> > > +++ b/systemd/ceph-volume@.service
> > > @@ -9,6 +9,7 @@ KillMode=none
> > > Environment=CEPH_VOLUME_TIMEOUT=10000
> > > ExecStart=/bin/sh -c 'timeout $CEPH_VOLUME_TIMEOUT
> > /usr/sbin/ceph-volume-systemd %i'
> > > TimeoutSec=0
> > > +LimitMEMLOCK=infinity
> > >
> > > [Install]
> > > WantedBy=multi-user.target
> > > --
> > > 2.17.1
> > >
> > > _______________________________________________
> > > Dev mailing list -- dev(a)ceph.io
> > > To unsubscribe send an email to dev-leave(a)ceph.io
> >
> >
> > --
> >
> > This e-mail and any attachments may contain confidential,
> copyright and
> > or privileged material, and are for the use of the intended
> addressee
> > only. If you are not the intended addressee or an authorised
> recipient
> > of the addressee please notify us of receipt by returning the
> e-mail
> > and do not use, copy, retain, distribute or disclose the
> information in
> > or attached to the e-mail.
> > Any opinions expressed within this e-mail are those of the
> individual
> > and not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > attachments are free from viruses and we cannot accept liability
> for
> > any damage which you may sustain as a result of software viruses
> which
> > may be transmitted in or with the message.
> > Diamond Light Source Limited (company no. 4375679). Registered in
> > England and Wales with its registered office at Diamond House,
> Harwell
> > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE,
> United
> > Kingdom