Hi Venky,
“peer-bootstrap import” is working fine now. It was port 3300 blocked by firewall.
Thank you for your help.
Regards,
Anantha
From: Adiga, Anantha
Sent: Monday, August 7, 2023 1:29 PM
To: Venky Shankar <vshankar(a)redhat.com>om>; ceph-users(a)ceph.io
Subject: RE: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Venky,
Could this be the reason that the peer-bootstrap import is hanging? how do I upgrade
cephfs-mirror to Quincy?
root@fl31ca104ja0201:/# cephfs-mirror --version
ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)
root@fl31ca104ja0201:/# ceph version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
root@fl31ca104ja0201:/#
Thank you,
Anantha
From: Adiga, Anantha
Sent: Monday, August 7, 2023 11:21 AM
To: 'Venky Shankar' <vshankar@redhat.com<mailto:vshankar@redhat.com>>;
'ceph-users(a)ceph.io' <ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: RE: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Venky,
I tried on another secondary Quincy cluster and it is the same problem. The peer_bootstrap
mport command hangs.
root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs
eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ==
……
…….
..command does not complete..waits here
^C to exit.
Thereafter some commands do not complete…
root@fl31ca104ja0201:/# ceph -s
cluster:
id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
health: HEALTH_OK
services:
mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age
2d)
mgr: fl31ca104ja0201.kkoono(active, since 3d), standbys: fl31ca104ja0202,
fl31ca104ja0203
mds: 1/1 daemons up, 2 standby
osd: 44 osds: 44 up (since 2d), 44 in (since 5w)
cephfs-mirror: 1 daemon active (1 hosts)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 25 pools, 769 pgs
objects: 614.40k objects, 1.9 TiB
usage: 2.9 TiB used, 292 TiB / 295 TiB avail
pgs: 769 active+clean
io:
client: 32 KiB/s rd, 0 B/s wr, 33 op/s rd, 1 op/s wr
root@fl31ca104ja0201:/#
root@fl31ca104ja0201:/# ceph fs status cephfs
This command also waits. ……
I have attached the mgr log
root@fl31ca104ja0201:/# ceph service status
{
"cephfs-mirror": {
"5306346": {
"status_stamp": "2023-08-07T17:35:56.884907+0000",
"last_beacon": "2023-08-07T17:45:01.903540+0000",
"status": {
"status_json":
"{\"1\":{\"name\":\"cephfs\",\"directory_count\":0,\"peers\":{}}}"
}
}
Quincy secondary cluster
root@a001s008-zz14l47008:/# ceph mgr module enable mirroring
root@a001s008-zz14l47008:/# ceph fs authorize cephfs client.mirror_remote / rwps
[client.mirror_remote]
key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA==
root@a001s008-zz14l47008:/# ceph auth get client.mirror_remote
[client.mirror_remote]
key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA==
caps mds = "allow rwps fsname=cephfs"
caps mon = "allow r fsname=cephfs"
caps osd = "allow rw tag cephfs data=cephfs"
root@a001s008-zz14l47008:/#
root@a001s008-zz14l47008:/# ceph fs snapshot mirror peer_bootstrap create cephfs
client.mirror_remote shgR-site
{"token":
"eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ=="}
root@a001s008-zz14l47008:/#
Thank you,
Anantha
From: Adiga, Anantha
Sent: Friday, August 4, 2023 11:55 AM
To: Venky Shankar <vshankar@redhat.com<mailto:vshankar@redhat.com>>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io
Subject: RE: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Venky,
Thank you so much for the guidance. Attached is the mgr log.
Note: the 4th node in the primary cluster has smaller capacity drives, the other 3 nodes
have the larger capacity drives.
32 ssd 6.98630 1.00000 7.0 TiB 44 GiB 44 GiB 183 KiB 148 MiB 6.9 TiB
0.62 0.64 40 up osd.32
-7 76.84927 - 77 TiB 652 GiB 648 GiB 20 MiB 3.0 GiB 76 TiB
0.83 0.86 - host fl31ca104ja0203
1 ssd 6.98630 1.00000 7.0 TiB 73 GiB 73 GiB 8.0 MiB 333 MiB 6.9 TiB
1.02 1.06 54 up osd.1
4 ssd 6.98630 1.00000 7.0 TiB 77 GiB 77 GiB 1.1 MiB 174 MiB 6.9 TiB
1.07 1.11 55 up osd.4
7 ssd 6.98630 1.00000 7.0 TiB 47 GiB 47 GiB 140 KiB 288 MiB 6.9 TiB
0.66 0.68 51 up osd.7
10 ssd 6.98630 1.00000 7.0 TiB 75 GiB 75 GiB 299 KiB 278 MiB 6.9 TiB
1.05 1.09 44 up osd.10
13 ssd 6.98630 1.00000 7.0 TiB 94 GiB 94 GiB 1018 KiB 291 MiB 6.9 TiB
1.31 1.36 72 up osd.13
16 ssd 6.98630 1.00000 7.0 TiB 31 GiB 31 GiB 163 KiB 267 MiB 7.0 TiB
0.43 0.45 49 up osd.16
19 ssd 6.98630 1.00000 7.0 TiB 14 GiB 14 GiB 756 KiB 333 MiB 7.0 TiB
0.20 0.21 50 up osd.19
22 ssd 6.98630 1.00000 7.0 TiB 105 GiB 104 GiB 1.3 MiB 313 MiB 6.9 TiB
1.46 1.51 48 up osd.22
25 ssd 6.98630 1.00000 7.0 TiB 17 GiB 16 GiB 257 KiB 272 MiB 7.0 TiB
0.23 0.24 45 up osd.25
28 ssd 6.98630 1.00000 7.0 TiB 72 GiB 72 GiB 6.1 MiB 180 MiB 6.9 TiB
1.01 1.05 43 up osd.28
31 ssd 6.98630 1.00000 7.0 TiB 47 GiB 46 GiB 592 KiB 358 MiB 6.9 TiB
0.65 0.68 56 up osd.31
-9 64.04089 - 64 TiB 728 GiB 726 GiB 17 MiB 1.8 GiB 63 TiB
1.11 1.15 - host fl31ca104ja0302
33 ssd 5.82190 1.00000 5.8 TiB 65 GiB 65 GiB 245 KiB 144 MiB 5.8 TiB
1.09 1.13 47 up osd.33
34 ssd 5.82190 1.00000 5.8 TiB 14 GiB 14 GiB 815 KiB 83 MiB 5.8 TiB
0.24 0.25 55 up osd.34
35 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 224 KiB 213 MiB 5.7 TiB
1.30 1.34 44 up osd.35
36 ssd 5.82190 1.00000 5.8 TiB 117 GiB 117 GiB 8.5 MiB 284 MiB 5.7 TiB
1.96 2.03 52 up osd.36
37 ssd 5.82190 1.00000 5.8 TiB 58 GiB 58 GiB 501 KiB 132 MiB 5.8 TiB
0.98 1.01 40 up osd.37
38 ssd 5.82190 1.00000 5.8 TiB 123 GiB 123 GiB 691 KiB 266 MiB 5.7 TiB
2.07 2.14 73 up osd.38
39 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 609 KiB 193 MiB 5.7 TiB
1.30 1.34 62 up osd.39
40 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 262 KiB 148 MiB 5.7 TiB
1.29 1.34 55 up osd.40
41 ssd 5.82190 1.00000 5.8 TiB 44 GiB 44 GiB 4.4 MiB 140 MiB 5.8 TiB
0.75 0.77 44 up osd.41
42 ssd 5.82190 1.00000 5.8 TiB 45 GiB 45 GiB 886 KiB 135 MiB 5.8 TiB
0.75 0.78 47 up osd.42
43 ssd 5.82190 1.00000 5.8 TiB 28 GiB 28 GiB 187 KiB 104 MiB 5.8 TiB
0.48 0.49 58 up osd.43
[Also: Yesterday I had two cfs-mirror running one on fl31ca104ja0201 and fl31ca104ja0302.
The cfs-mirror on fl31ca104ja0201 was stopped. When the import token was run on
fl31ca104ja0302, the cfs-mirror log was active. Just in case it is useful I have attached
that log (cfsmirror-container.log) as well. ]
How can I list the token on the target cluster after running the create peer_bootstrap
command?
Here is today’s status with your suggestion:
There is only one cfs-mirror daemon running now. It is on fl31ca104ja0201 node.
root@fl31ca104ja0201:/# ceph -s
cluster:
id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
health: HEALTH_OK
services:
mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age
7m)
mgr: fl31ca104ja0201.kkoono(active, since 13m), standbys: fl31ca104ja0202,
fl31ca104ja0203
mds: 1/1 daemons up, 2 standby
osd: 44 osds: 44 up (since 7m), 44 in (since 4w)
cephfs-mirror: 1 daemon active (1 hosts)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 25 pools, 769 pgs
objects: 614.40k objects, 1.9 TiB
usage: 2.8 TiB used, 292 TiB / 295 TiB avail
pgs: 769 active+clean
io:
client: 32 MiB/s rd, 0 B/s wr, 57 op/s rd, 1 op/s wr
root@fl31ca104ja0201:/#
root@fl31ca104ja0201:/#
root@fl31ca104ja0201:/# ceph tell mgr.fl31ca104ja0201.kkoono config set debug_mgr 20
{
"success": ""
}
root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs
eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0=
^CInterrupted
Ctrl-C after 15 min. Once the command is run, the health status goes to WARN .
root@fl31ca104ja0201:/# ceph -s
cluster:
id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
health: HEALTH_WARN
6 slow ops, oldest one blocked for 1095 sec, mon.fl31ca104ja0203 has slow
ops
services:
mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age
30m)
mgr: fl31ca104ja0201.kkoono(active, since 35m), standbys: fl31ca104ja0202,
fl31ca104ja0203
mds: 1/1 daemons up, 2 standby
osd: 44 osds: 44 up (since 29m), 44 in (since 4w)
cephfs-mirror: 1 daemon active (1 hosts)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 25 pools, 769 pgs
objects: 614.40k objects, 1.9 TiB
usage: 2.8 TiB used, 292 TiB / 295 TiB avail
pgs: 769 active+clean
io:
client: 67 KiB/s rd, 0 B/s wr, 68 op/s rd, 21 op/s wr
-----Original Message-----
From: Venky Shankar <vshankar@redhat.com<mailto:vshankar@redhat.com>
Sent: Thursday, August 3, 2023 11:03 PM
To: Adiga, Anantha <anantha.adiga@intel.com<mailto:anantha.adiga@intel.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io
Subject: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Anantha,
On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha
<anantha.adiga@intel.com<mailto:anantha.adiga@intel.com>> wrote:
Hi
Could you please provide guidance on how to diagnose
this issue:
In this case, there are two Ceph clusters: cluster A,
4 nodes and cluster B, 3 node, in different locations. Both are already running RGW
multi-site, A is master.
Cephfs snapshot mirroring is being configured on the
clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is
being configured. The bootstrap import step on the primary node hangs.
On the target cluster :
---------------------------
"version": "16.2.5",
"release": "pacific",
"release_type": "stable"
root@cr21meg16ba0101:/# ceph fs snapshot mirror
peer_bootstrap create
cephfs client.mirror_remote flex2-site
{"token":
"eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJma
Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiw
gInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd
1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTU
uNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6M
zMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="}
Seems fine uptil here.
root@cr21meg16ba0101:/var/run/ceph#
On the source cluster:
----------------------------
"version": "17.2.6",
"release": "quincy",
"release_type": "stable"
root@fl31ca104ja0201:/# ceph -s
cluster:
id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
health: HEALTH_OK
services:
mon: 3 daemons, quorum
fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m)
mgr: fl31ca104ja0201.nwpqlh(active,
since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202
mds: 1/1 daemons up, 2 standby
osd: 44 osds: 44 up (since 111m), 44 in
(since 4w)
cephfs-mirror: 1 daemon active (1 hosts)
rgw: 3 daemons active (3 hosts, 1
zones)
data:
volumes: 1/1 healthy
pools: 25 pools, 769 pgs
objects: 614.40k objects, 1.9 TiB
usage: 2.8 TiB used, 292 TiB / 295 TiB avail
pgs: 769 active+clean
root@fl31ca104ja0302:/# ceph mgr module enable
mirroring module
'mirroring' is already enabled
root@fl31ca104ja0302:/# ceph fs
snapshot mirror peer_bootstrap import cephfs
eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaW
xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwg
InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1
h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUu
NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6Mz
MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0=
Going by your description, I'm guessing this is the command that hangs? If that's
the case, set `debug_mgr=20`, repeat the token import step and share the ceph-mgr log.
Also note that you can check the mirror daemon status as detailed in
https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemon-status
root@fl31ca104ja0302:/var/run/ceph# ceph
--admin-daemon
/var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok
status {
"metadata": {
"ceph_sha1":
"d7ff0d10654d2280e08f1ab989c7cdf3064446a5",
"ceph_version": "ceph version
17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)",
"entity_id":
"cephfs-mirror.fl31ca104ja0302.sypagt",
"hostname":
"fl31ca104ja0302",
"pid": "7",
"root": "/"
},
"dentry_count": 0,
"dentry_pinned_count": 0,
"id": 5194553,
"inst": {
"name": {
"type": "client",
"num": 5194553
},
"addr": {
"type": "v1",
"addr":
"10.45.129.5:0",
"nonce": 2497002034
}
},
"addr": {
"type": "v1",
"addr": "10.45.129.5:0",
"nonce": 2497002034
},
"inst_str": "client.5194553
10.45.129.5:0/2497002034",
"addr_str":
"10.45.129.5:0/2497002034",
"inode_count": 1,
"mds_epoch": 118,
"osd_epoch": 6266,
"osd_epoch_barrier": 0,
"blocklisted": false,
"fs_name": "cephfs"
}
root@fl31ca104ja0302:/home/general# docker logs
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja030
2-sypagt --tail 10 debug 2023-08-03T05:24:27.413+0000
7f8eb6fc0280 0
ceph version 17.2.6
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable), process cephfs-mirror, pid 7 debug
2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0
pidfile_write: ignore
empty --pid-file debug 2023-08-03T05:24:27.445+0000
7f8eb6fc0280 1
mgrc service_daemon_register cephfs-mirror.5184622
metadata
{arch=x86_64,ceph_release=quincy,ceph_version=ceph
version 17.2.6
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302,
>
container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c<mailto:container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c
64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R)
Xeon(R) Gold 6252 CPU
@ 2.10GHz,distro=centos,distro_description=CentOS
Stream
8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt,
instance_id=5184622,kernel_description=#82-Ubuntu SMP
Tue Jun 6
23:10:23 UTC
2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_total_kb
=527946928,os=Linux} debug
2023-08-03T05:27:10.419+0000 7f8ea1b2c700
0 client.5194553 ms_handle_reset on
v2:10.45.128.141:3300/0 debug
2023-08-03T05:50:10.917+0000 7f8ea1b2c700 0
client.5194553
ms_handle_reset on v2:10.45.128.139:3300/0
Thank you,
Anantha
_______________________________________________
ceph-users mailing list --
ceph-users@ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an
> email to ceph-users-leave@ceph.io<mailto:ceph-users-leave@ceph.io
--
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> To
unsubscribe send an email to
ceph-users-leave@ceph.io<mailto:ceph-users-leave@ceph.io