Help unsubscribe please - ceph-users

23 Oct 2019

On Wed, 23 Oct 2019 at 3:12 am, &lt;ceph-users-request(a)ceph.io&gt; wrote:

...
  Send ceph-users mailing list submissions to
         ceph-users(a)ceph.io

 To subscribe or unsubscribe via email, send a message with subject or
 body 'help' to
         ceph-users-request(a)ceph.io

 You can reach the person managing the list at
         ceph-users-owner(a)ceph.io

 When replying, please edit your Subject line so it is more specific
 than "Re: Contents of ceph-users digest..."

 Today's Topics:

    1. Re: Replace ceph osd in a container (Sasha Litvak)
    2. Re: Fwd: large concurrent rbd operations block for over 15 mins!
       (Mark Nelson)
    3. Re: rgw multisite failover (Ed Fisher)

 ----------------------------------------------------------------------

 Date: Tue, 22 Oct 2019 08:52:54 -0500
 From: Sasha Litvak &lt;alexander.v.litvak(a)gmail.com&gt;
 Subject: [ceph-users] Re: Replace ceph osd in a container
 To: Frank Schilder &lt;frans(a)dtu.dk&gt;
 Cc: ceph-users &lt;ceph-users(a)ceph.io&gt;
 Message-ID:
         <
 CALi_L49RxWcBx_ZivRHHWgYg8Ea_UrH-0YKGMY4+b20KhXu6UQ(a)mail.gmail.com&gt;
 Content-Type: multipart/alternative;
         boundary="000000000000a0731a0595801ea4"

 --000000000000a0731a0595801ea4
 Content-Type: text/plain; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

 Frank,

 Thank you for your suggestion.  It sounds very promising.  I will
 definitely try it.

 Best,

 On Tue, Oct 22, 2019, 2:44 AM Frank Schilder &lt;frans(a)dtu.dk&gt; wrote:

  > I am suspecting that mon or mgr have no
access to /dev or /var/lib  whil=
 e
  osd containers do.
  Cluster configured originally by ceph-ansible
(nautilus 14.2.2) 
 They don't, because they don't need to.

  The question is if I want to replace all disks on
a single node, and I  have 6 nodes with pools
 > replication 3, is it safe to restart mgr mounting /dev and  /var/lib/cep=
 h
  volumes (not configured right now).

 Restarting mons is safe in the sense that data will not get lost.  However=
 ,
  access might get lost temporarily.

 The question is, how many mons do you have? If you have only 1 or 2, it
 will mean downtime. If you can bear the downtime, it doesn't matter. If  y=
 ou
  have at least 3, you can restart one after the
other.

 However, I would not do that. Having to restart a mon container every  tim=
 e
  some minor container config changes for reasons
that have nothing to do
 with a mon sounds like calling for trouble.

 I also use containers and would recommend a different approach. I created
 an additional type of container (ceph-adm) that I use for all admin  tasks=
 .
  Its the same image and the entry point simply
executes a sleep infinity.  =
 In
  this container I make all relevant hardware
visible. You might also want  =
 to
  expose /var/run/ceph to be able to use admin
sockets without hassle. This
 way, I separated admin operations from actual storage daemons and can
 modify and restart the admin container as I like.

 Best regards,

 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 Frank Schilder
 AIT Ris=C3=B8 Campus
 Bygning 109, rum S14

 ________________________________________
 From: ceph-users &lt;ceph-users-bounces(a)lists.ceph.com&gt; on behalf of Alex
 Litvak &lt;alexander.v.litvak(a)gmail.com&gt;
 Sent: 22 October 2019 08:04
 To: ceph-users(a)lists.ceph.com
 Subject: [ceph-users] Replace ceph osd in a container

 Hello cephers,

 So I am having trouble with a new hardware systems with strange OSD
 behavior and I want to replace a disk with a brand new one to test the
 theory.

 I run all daemons in containers and on one of the nodes I have mon, mgr,
 and 6 osds.  So following

 https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replac=
 ing-an-osd

<https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replac=ing-an-osd>

 I stopped container with osd.23, waited until it is down and out, ran
 safe-to-destroy loop and then destroyed the osd all using the monitor  fro=
 m
  the container on this node.  All good.

 Then I swapped the SSDs and started running additional steps (from step  3=
 )
  using the same mon container.  I have no ceph
packages installed on the
 bare metal box. It looks like mon container doesn't
 see the disk.

      podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap /dev/sdh
   stderr: lsblk: /dev/sdh: not a block device
   stderr: error: /dev/sdh: No such file or directory
   stderr: Unknown device, --name=3D, --path=3D, or absolute path in  /dev/=
  or
  /sys expected.
 usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
                             [--osd-fsid OSD_FSID]
                             [DEVICES [DEVICES ...]]
 ceph-volume lvm zap: error: Unable to proceed with non-existing device:
 /dev/sdh
 Error: exit status 2
 root@storage2n2-la:~# ls -l /dev/sd
 sda   sdc   sdd   sde   sdf   sdg   sdg1  sdg2  sdg5  sdh
 root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph-volume
 lvm zap sdh
   stderr: lsblk: sdh: not a block device
   stderr: error: sdh: No such file or directory
   stderr: Unknown device, --name=3D, --path=3D, or absolute path in  /dev/=
  or
  /sys expected.
 usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
                             [--osd-fsid OSD_FSID]
                             [DEVICES [DEVICES ...]]
 ceph-volume lvm zap: error: Unable to proceed with non-existing device:  s=
 dh
  Error: exit status 2

 I execute lsblk and it sees device sdh
 root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la lsblk
 lsblk: dm-1: failed to get device path
 lsblk: dm-2: failed to get device path
 lsblk: dm-4: failed to get device path
 lsblk: dm-6: failed to get device path
 lsblk: dm-4: failed to get device path
 lsblk: dm-2: failed to get device path
 lsblk: dm-1: failed to get device path
 lsblk: dm-0: failed to get device path
 lsblk: dm-0: failed to get device path
 lsblk: dm-7: failed to get device path
 lsblk: dm-5: failed to get device path
 lsblk: dm-7: failed to get device path
 lsblk: dm-6: failed to get device path
 lsblk: dm-5: failed to get device path
 lsblk: dm-3: failed to get device path
 NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
 sdf      8:80   0   1.8T  0 disk
 sdd      8:48   0   1.8T  0 disk
 sdg      8:96   0 223.5G  0 disk
 |-sdg5   8:101  0   223G  0 part
 |-sdg1   8:97       487M  0 part
 `-sdg2   8:98         1K  0 part
 sde      8:64   0   1.8T  0 disk
 sdc      8:32   0   3.5T  0 disk
 sda      8:0    0   3.5T  0 disk
 sdh      8:112  0   3.5T  0 disk

 So I use a fellow osd container (osd.5) on the same node and run all of
 the operations (zap and prepare) successfully.

 I am suspecting that mon or mgr have no access to /dev or /var/lib while
 osd containers do.  Cluster configured originally by ceph-ansible  (nautil=
 us
  14.2.2)

 The question is if I want to replace all disks on a single node, and I
 have 6 nodes with pools replication 3, is it safe to restart mgr mounting
 /dev and /var/lib/ceph volumes (not configured right now).

 I cannot use other osd containers on the same box because my controller
 reverts from raid to non-raid mode with all disks lost and not just a
 single one.  So I need to replace all 6 osds to run back
 in containers and the only things will remain operational on node are mon
 and mgr containers.

 I prefer not to install a full cluster or client on the bare metal node  i=
 f
  possible.

 Thank you for your help,

 _______________________________________________
 ceph-users mailing list
 ceph-users(a)lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 --000000000000a0731a0595801ea4
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Frank,<div
dir=3D"auto"> </div><div
 dir=3D"auto">Thank=
 you=C2=A0for your suggestion.=C2=A0 It sounds very promising.=C2=A0 I
 will=
 definitely try it.</div><div
dir=3D"auto"> </div><div
 dir=3D"auto">Best=
 ,</div></div> <div class=3D"gmail_quote"><div
dir=3D"ltr"
 class=3D"gmail=
 _attr">On Tue, Oct 22, 2019, 2:44 AM Frank Schilder &lt;<a
href=3D"mailto:
 f=
 rans@dtu.dk">frans@dtu.dk</a>&gt;
wrote: </div><blockquote
 class=3D"gmai=
 l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc
 solid;padding-left=
 :1ex">&gt; I am suspecting that mon or mgr have no access to /dev or
 /var/l=
 ib while osd containers do. 
 &gt; Cluster configured originally by ceph-ansible (nautilus 14.2.2) 
 
 They don&#39;t, because they don&#39;t need to. 
 
 &gt; The question is if I want to replace all disks on a single node, and
 I=
 have 6 nodes with pools 
 &gt; replication 3, is it safe to restart mgr mounting /dev and
 /var/lib/ce=
 ph volumes (not configured right now). 
 
 Restarting mons is safe in the sense that data will not get lost. However,
 =
 access might get lost temporarily. 
 
 The question is, how many mons do you have? If you have only 1 or 2, it
 wil=
 l mean downtime. If you can bear the downtime, it doesn&#39;t matter. If
 yo=
 u have at least 3, you can restart one after the other. 
 
 However, I would not do that. Having to restart a mon container every time
 =
 some minor container config changes for reasons that have nothing to do
 wit=
 h a mon sounds like calling for trouble. 
 
 I also use containers and would recommend a different approach. I created
 a=
 n additional type of container (ceph-adm) that I use for all admin tasks.
 I=
 ts the same image and the entry point simply executes a sleep infinity. In
 =
 this container I make all relevant hardware visible. You might also want
 to=
 expose /var/run/ceph to be able to use admin sockets without hassle. This
 =
 way, I separated admin operations from actual storage daemons and can
 modif=
 y and restart the admin container as I like. 
 
 Best regards, 
 
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 
 Frank Schilder 
 AIT Ris=C3=B8 Campus 
 Bygning 109, rum S14 
 
 ________________________________________ 
 From: ceph-users &lt;<a
href=3D"mailto:ceph-users-bounces@lists.ceph.com"
 t=
 arget=3D"_blank"
rel=3D&quot;noreferrer&quot;&gt;ceph-users-bounces(a)lists.ceph.com
 </a>&g=
 t; on behalf of Alex Litvak &lt;<a href=3D"mailto:
 alexander.v.litvak(a)gmail.=
 com" target=3D"_blank"
rel=3D&quot;noreferrer&quot;&gt;alexander.v.litvak(a)gmail.com
 </a>&=
 gt; 
 Sent: 22 October 2019 08:04 
 To: <a href=3D"mailto:ceph-users@lists.ceph.com"
target=3D"_blank"
 rel=3D"n=
 oreferrer&quot;&gt;ceph-users(a)lists.ceph.com&lt;/a&gt;&lt;br&gt;
 Subject: [ceph-users] Replace ceph osd in a container 
 
 Hello cephers, 
 
 So I am having trouble with a new hardware systems with strange OSD
 behavio=
 r and I want to replace a disk with a brand new one to test the theory. 
 
 I run all daemons in containers and on one of the nodes I have mon, mgr,
 an=
 d 6 osds.=C2=A0 So following <a href=3D"
 https://docs.ceph.com/docs/master/r=
 ados/operations/add-or-rm-osds/#replacing-an-osd

<https://docs.ceph.com/docs/master/r=ados/operations/add-or-rm-osds/#replacing-an-osd>"
 rel=3D"noreferrer norefer=
 rer" target=3D"_blank">
 https://docs.ceph.com/docs/master/rados/operations/a=
 dd-or-rm-osds/#replacing-an-osd

<https://docs.ceph.com/docs/master/rados/operations/a=dd-or-rm-osds/#replacing-an-osd>
 </a> 
 
 I stopped container with osd.23, waited until it is down and out, ran
 safe-=
 to-destroy loop and then destroyed the osd all using the monitor from the
 c=
 ontainer on this node.=C2=A0 All good. 
 
 Then I swapped the SSDs and started running additional steps (from step 3)
 =
 using the same mon container.=C2=A0 I have no ceph packages installed on
 th=
 e bare metal box. It looks like mon container doesn&#39;t 
 see the disk. 
 
 =C2=A0 =C2=A0 =C2=A0podman exec -it ceph-mon-storage2n2-la ceph-volume lvm
 =
 zap /dev/sdh 
 =C2=A0 stderr: lsblk: /dev/sdh: not a block device 
 =C2=A0 stderr: error: /dev/sdh: No such file or directory 
 =C2=A0 stderr: Unknown device, --name=3D, --path=3D, or absolute path in
 /d=
 ev/ or /sys expected. 
 usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID] 
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 [--osd-fsid OSD_FSID] 
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 [DEVICES [DEVICES ...]] 
 ceph-volume lvm zap: error: Unable to proceed with non-existing device:
 /de=
 v/sdh 
 Error: exit status 2 
 root@storage2n2-la:~# ls -l /dev/sd 
 sda=C2=A0 =C2=A0sdc=C2=A0 =C2=A0sdd=C2=A0 =C2=A0sde=C2=A0 =C2=A0sdf=C2=A0 =
 =C2=A0sdg=C2=A0 =C2=A0sdg1=C2=A0 sdg2=C2=A0 sdg5=C2=A0 sdh 
 root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph-volume
 lv=
 m zap sdh 
 =C2=A0 stderr: lsblk: sdh: not a block device 
 =C2=A0 stderr: error: sdh: No such file or directory 
 =C2=A0 stderr: Unknown device, --name=3D, --path=3D, or absolute path in
 /d=
 ev/ or /sys expected. 
 usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID] 
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 [--osd-fsid OSD_FSID] 
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 [DEVICES [DEVICES ...]] 
 ceph-volume lvm zap: error: Unable to proceed with non-existing device:
 sdh=
 
 Error: exit status 2 
 
 I execute lsblk and it sees device sdh 
 root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la lsblk 
 lsblk: dm-1: failed to get device path 
 lsblk: dm-2: failed to get device path 
 lsblk: dm-4: failed to get device path 
 lsblk: dm-6: failed to get device path 
 lsblk: dm-4: failed to get device path 
 lsblk: dm-2: failed to get device path 
 lsblk: dm-1: failed to get device path 
 lsblk: dm-0: failed to get device path 
 lsblk: dm-0: failed to get device path 
 lsblk: dm-7: failed to get device path 
 lsblk: dm-5: failed to get device path 
 lsblk: dm-7: failed to get device path 
 lsblk: dm-6: failed to get device path 
 lsblk: dm-5: failed to get device path 
 lsblk: dm-3: failed to get device path 
 NAME=C2=A0 =C2=A0MAJ:MIN RM=C2=A0 =C2=A0SIZE RO TYPE MOUNTPOINT 
 sdf=C2=A0 =C2=A0 =C2=A0 8:80=C2=A0 =C2=A00=C2=A0 =C2=A01.8T=C2=A0 0
 disk<br=
 sdd=C2=A0 =C2=A0 =C2=A0 8:48=C2=A0
=C2=A00=C2=A0 =C2=A01.8T=C2=A0 0
 disk<br=
 sdg=C2=A0 =C2=A0 =C2=A0 8:96=C2=A0 =C2=A00
223.5G=C2=A0 0 disk 
 |-sdg5=C2=A0 =C2=A08:101=C2=A0 0=C2=A0 =C2=A0223G=C2=A0 0 part 
 |-sdg1=C2=A0 =C2=A08:97=C2=A0 =C2=A0 =C2=A0 =C2=A0487M=C2=A0 0 part 
 `-sdg2=C2=A0 =C2=A08:98=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01K=C2=A0 0
 part<br=
 sde=C2=A0 =C2=A0 =C2=A0 8:64=C2=A0
=C2=A00=C2=A0 =C2=A01.8T=C2=A0 0
 disk<br=
 sdc=C2=A0 =C2=A0 =C2=A0 8:32=C2=A0
=C2=A00=C2=A0 =C2=A03.5T=C2=A0 0
 disk<br=
 sda=C2=A0 =C2=A0 =C2=A0 8:0=C2=A0 =C2=A0
0=C2=A0 =C2=A03.5T=C2=A0 0
 disk<br=
 sdh=C2=A0 =C2=A0 =C2=A0 8:112=C2=A0
0=C2=A0 =C2=A03.5T=C2=A0 0 disk 
 
 So I use a fellow osd container (osd.5) on the same node and run all of
 the=
 operations (zap and prepare) successfully. 
 
 I am suspecting that mon or mgr have no access to /dev or /var/lib while
 os=
 d containers do.=C2=A0 Cluster configured originally by ceph-ansible
 (nauti=
 lus 14.2.2) 
 
 The question is if I want to replace all disks on a single node, and I
 have=
 6 nodes with pools replication 3, is it safe to restart mgr mounting /dev
 =
 and /var/lib/ceph volumes (not configured right now). 
 
 I cannot use other osd containers on the same box because my controller
 rev=
 erts from raid to non-raid mode with all disks lost and not just a single
 o=
 ne.=C2=A0 So I need to replace all 6 osds to run back 
 in containers and the only things will remain operational on node are mon
 a=
 nd mgr containers. 
 
 I prefer not to install a full cluster or client on the bare metal node if
 =
 possible. 
 
 Thank you for your help, 
 
 _______________________________________________ 
 ceph-users mailing list 
 <a href=3D"mailto:ceph-users@lists.ceph.com" target=3D"_blank"
 rel=3D"noref=
 errer&quot;&gt;ceph-users(a)lists.ceph.com&lt;/a&gt;&lt;br&gt;
 <a href=3D"http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com"
 rel=3D"n=
 oreferrer noreferrer" target=3D"_blank">
 http://lists.ceph.com/listinfo.cgi/=
 ceph-users-ceph.com
 <http://lists.ceph.com/listinfo.cgi/=ceph-users-ceph.com></a> 
 </blockquote></div>

 --000000000000a0731a0595801ea4--

 ------------------------------

 Date: Tue, 22 Oct 2019 08:59:21 -0500
 From: Mark Nelson &lt;mnelson(a)redhat.com&gt;
 Subject: [ceph-users] Re: Fwd: large concurrent rbd operations block
         for over 15 mins!
 To: ceph-users(a)ceph.io
 Message-ID: &lt;362e3930-8c30-d3e0-d0b0-30187c8551e4(a)redhat.com&gt;
 Content-Type: text/plain; charset=UTF-8; format=flowed

 Out of curiosity, when you chose EC over replication how did you weigh
 IOPS vs space amplification in your decision making process?  I'm
 wondering if we should prioritize EC latency vs other tasks in future
 tuning efforts (it's always a tradeoff deciding what to focus on).

 Thanks,

 Mark

 On 10/22/19 2:35 AM, Frank Schilder wrote:
  Getting decent RBD performance is not a trivial
exercise. While at a  first glance 61 SSDs for 245 clients sounds more or less OK,
it does come
 down to a bit more than that.
   > The first thing is, how to get SSD
performance out of SSDs with ceph.
 This post will provide very good clues and might already point out the
 bottleneck: https://yourcmc.ru/wiki/index.php?title=Ceph_performance . Do
 you have good enterprise SSDs?
   > Next thing to look at, what kind of
data pool, replicated or erasure
 coded? If erasure coded, has the profile been benchmarked? There are very
 poor choices. Good ones are 4+m, 8+m. 4+m better IOps, 8+m better
 throughput. m>=2.
   > More complications: do you need to
deploy more than one OSD per SSD to
 boost performance? This is indicated by the iodepth required in an fio
 benchmark to get full IOPs. Good SSDs deliver already spec performance with
 1 OSD. More common ones require 2-4 OSDs per disk. Are you using
 ceph-volume already, its default is 2 OSDs per SSD (batch mode).
   > To give a base line, after extensive
testing and working through all the
 required tuning steps, I could run about 250 VMs on a 6+2 EC data pool on
 33 enterprise SAS SSDs with 1 OSD per disk, each VM getting 50IOPs write
 performance. This is probably what you would like to see as well.
   > If you use replicated data pool, this
should be relatively easy. With EC
 data pool, this is a bit of a battle.
   > Good luck,
   > =================
 > Frank Schilder
 > AIT Risø Campus
 > Bygning 109, rum S14
   >
________________________________________
 > From: ceph-users &lt;ceph-users-bounces(a)lists.ceph.com&gt; on behalf of Void
 Star Nill &lt;void.star.nill(a)gmail.com&gt;
  Sent: 22 October 2019 03:00
 To: ceph-users
 Subject: [ceph-users] Fwd: large concurrent rbd operations block for  over 15
mins!
   > Apparently the graph is too big, so
my last post is stuck. Resending
 without the graph.
   > Thanks

 > ---------- Forwarded message ---------
 > From: Void Star Nill &lt;void.star.nill(a)gmail.com&lt;mailtomailto:
 void.star.nill(a)gmail.com&gt;&gt;
  Date: Mon, Oct 21, 2019 at 4:41 PM
 Subject: large concurrent rbd operations block for over 15 mins!
 To: ceph-users &lt;ceph-users(a)lists.ceph.com&lt;mailtomailto: 
ceph-users(a)lists.ceph.com&gt;&gt;

 > Hello,
   > I have been running some benchmark
tests with a mid-size cluster and I
 am seeing some issues. Wanted to know if this is a bug or something that
 can be tuned. Appreciate any help on this.
   > - I have a 15 node Ceph cluster, with
3 monitors and 12 data nodes with
 total 61 OSDs on SSDs running 14.2.4 nautilus (stable) version. Each node
 has 100G link.
  - I have 245 client machines from which I am
triggering rbd operations.  Each client has 25G link
  - rbd operations include, creating an RBD image
of 50G size and layering  feature, mapping the image to the client machine,
formatting the device in
 ext4 format, mounting it, running dd to write to the full disk and cleaning
 up (unmount, unmap and remove).
   > If I run these RBD operations
concurrently on a small number of machines
 (say 16-20), they run very well and I see good throughput. All image
 operations (except for dd) take less than 2 seconds.
   > However, when I scale it up to 245
clients, each running these
 operations concurrently, I see lot of operations getting hung for a long
 time and the overall throughput reduces drastically.
   > For example, some of the format
operations take over 10-15 mins!!!
   > Note that, all operations do complete
- so its most likely not a
 deadlock kind of situation.
   > I dont see any errors in ceph.log on
the monitor nodes. However, the
 clients do report "hung_task_timeout" in dmesg logs.
   > As you can see in the below image,
half the format operations are
 completing in less than a second time, while the other half is over 10mins
 (y axis is in seconds)

   > [11117.113618] INFO: task
umount:9902 blocked for more than 120 seconds.
 > [11117.113677]       Tainted: G           OE    4.15.0-51-generic
 #55~16.04.1-Ubuntu
  [11117.113731] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"  disables this message.
  [11117.113787] umount          D    0  9902  
9901 0x00000000
 [11117.113793] Call Trace:
 [11117.113804]  __schedule+0x3d6/0x8b0
 [11117.113810]  ? _raw_spin_unlock_bh+0x1e/0x20
 [11117.113814]  schedule+0x36/0x80
 [11117.113821]  wb_wait_for_completion+0x64/0x90
 [11117.113828]  ? wait_woken+0x80/0x80
 [11117.113831]  __writeback_inodes_sb_nr+0x8e/0xb0
 [11117.113835]  writeback_inodes_sb+0x27/0x30
 [11117.113840]  __sync_filesystem+0x51/0x60
 [11117.113844]  sync_filesystem+0x26/0x40
 [11117.113850]  generic_shutdown_super+0x27/0x120
 [11117.113854]  kill_block_super+0x2c/0x80
 [11117.113858]  deactivate_locked_super+0x48/0x80
 [11117.113862]  deactivate_super+0x5a/0x60
 [11117.113866]  cleanup_mnt+0x3f/0x80
 [11117.113868]  __cleanup_mnt+0x12/0x20
 [11117.113874]  task_work_run+0x8a/0xb0
 [11117.113881]  exit_to_usermode_loop+0xc4/0xd0
 [11117.113885]  do_syscall_64+0x100/0x130
 [11117.113887]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 [11117.113891] RIP: 0033:0x7f0094384487
 [11117.113893] RSP: 002b:00007fff4199efc8 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a6
  [11117.113897] RAX: 0000000000000000 RBX:
0000000000944030 RCX:  00007f0094384487
  [11117.113899] RDX: 0000000000000001 RSI:
0000000000000000 RDI:  0000000000944210
  [11117.113900] RBP: 0000000000944210 R08:
0000000000000000 R09:  0000000000000014
  [11117.113902] R10: 00000000000006b2 R11:
0000000000000246 R12:  00007f009488d83c
  [11117.113903] R13: 0000000000000000 R14:
0000000000000000 R15:  00007fff4199f250
  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 
 ------------------------------

 Date: Tue, 22 Oct 2019 11:10:38 -0500
 From: Ed Fisher &lt;ed(a)debacle.org&gt;
 Subject: [ceph-users] Re: rgw multisite failover
 To: Frank R &lt;frankaritchie(a)gmail.com&gt;
 Cc: ceph-users &lt;ceph-users(a)ceph.com&gt;
 Message-ID: &lt;084F4293-88CC-456A-B8A4-2E36ACA24B65(a)debacle.org&gt;
 Content-Type: multipart/alternative;
         boundary="Apple-Mail=_AE5E92AF-C94B-43D4-8A65-947B2DE6F04A"

 --Apple-Mail=_AE5E92AF-C94B-43D4-8A65-947B2DE6F04A
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
         charset=us-ascii

On Oct 18, 2019, at 10:40 PM, Frank R
&lt;frankaritchie(a)gmail.com&gt; wrote:
=20
 I am looking to change an RGW multisite deployment so that the = secondary will
become master. This is meant to be a permanent change.
 =20
 Per:
 https://docs.ceph.com/docs/mimic/radosgw/multisite/ = 
<https://docs.ceph.com/docs/mimic/radosgw/multisite/>
 =20
 I need to:
=20
 1. Stop RGW daemons on the current master end.
=20
 On a secondary RGW node:
 2. radosgw-admin zone modify --rgw-zone=3D{zone-name} --master = --default
 3. radosgw-admin period update --commit
 4. systemctl restart ceph-radosgw(a)rgw.`hostname -s`
=20
 Since I want the former master to be secondary permanently do I need = to do
anything after restarting the RGW daemons on the old master end?

 Before you restart the RGW daemons on the old master you want to make =
 sure you pull the current realm from the new master. Beyond that there =
 should be no changes needed.=20=

 --Apple-Mail=_AE5E92AF-C94B-43D4-8A65-947B2DE6F04A
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/html;
         charset=us-ascii

 <html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html; =
 charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
 -webkit-nbsp-mode: space; line-break: after-white-space;"
class=3D""><br =
 class=3D""><div><br class=3D""><blockquote
type=3D"cite" class=3D""><div =
 class=3D"">On Oct 18, 2019, at 10:40 PM, Frank R &lt;<a =
 href=3D"mailto:frankaritchie@gmail.com" =
 class=3D&quot;&quot;&gt;frankaritchie(a)gmail.com&lt;/a&gt;&amp;gt;
wrote:</div><br =
 class=3D"Apple-interchange-newline"><div class=3D""><div
dir=3D"ltr" =
 class=3D"">I am looking to change an RGW multisite deployment so that =
 the secondary will become master. This is meant to be a permanent =
 change.<div class=3D""><br class=3D""></div><div
class=3D"">Per:</div><div=
  class=3D""><a =
 href=3D"https://docs.ceph.com/docs/mimic/radosgw/multisite/" =

class=3D"">https://docs.ceph.com/docs/mimic/radosgw/multisite/…
=
 class=3D""></div><div class=3D""><br
class=3D""></div><div class=3D"">I =
 need to:</div><div class=3D""><br
class=3D""></div><div class=3D"">1. =
 Stop RGW daemons on the current master end.</div><div
class=3D""><br =
 class=3D""></div><div class=3D"">On a secondary RGW
node:</div><div =
 class=3D"">2.&nbsp;radosgw-admin zone modify --rgw-zone=3D{zone-name} =
 --master --default</div><div class=3D"">3.&nbsp;radosgw-admin
period =
 update --commit</div><div class=3D"">4.&nbsp;systemctl restart
=
 ceph-radosgw(a)rgw.`hostname -s`</div><div class=3D""><br =
 class=3D""></div><div class=3D"">Since I want the
former master to be =
 secondary permanently do I need to do anything after restarting the RGW =
 daemons on the old master
end?</div></div></div></blockquote><div><br =
 class=3D""></div><div><br
class=3D""></div>Before you restart the RGW =
 daemons on the old master you want to make sure you pull the current =
 realm from the new master. Beyond that there should be no changes =
 needed.&nbsp;</div></body></html>=

 --Apple-Mail=_AE5E92AF-C94B-43D4-8A65-947B2DE6F04A--

 ------------------------------

 Subject: Digest Footer

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
 %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

 ------------------------------

 End of ceph-users Digest, Vol 81, Issue 56
 ******************************************