Hi,
Just now was possible continue this. Below is the information required.
Thanks advance,
Gesiel
Em seg., 20 de jan. de 2020 às 15:06, Mike Christie <mchristi(a)redhat.com>
escreveu:
On 01/20/2020 10:29 AM, Gesiel Galvão Bernardes
wrote:
Hi,
Only now have I been able to act on this problem. My environment is
relatively simple: I have two ESXi 6.7 hosts, connected to two ISCSI
gateways, using two RBD images.
When this mail started, the workaround was to keep only one ISCSI
gateway connected, so it works normally. After the answer that received
here that the problem could be in the configuration of VMWare, I
reviewed the configuration of both (they are exactly according to Ceph
documentation), and rebooted both.
Can you give me some basic info.
The output of:
# "gwcli ls"
o- /
.........................................................................................................
[...]
o- cluster
.........................................................................................
[Clusters: 1]
| o- ceph
............................................................................................
[HEALTH_OK]
| o- pools
..........................................................................................
[Pools: 7]
| | o- data_ecpool ................................. [(x3), Commit:
0.00Y/25836962M (0%), Used: 25011763551408b]
| | o- pool1 ....................................... [(x2), Commit:
5.0T/38755444M (13%), Used: 16162480333070b]
| | o- pool2 ....................................... [(x3), Commit:
6.0T/25836962M (24%), Used: 24261305669809b]
| | o- pool3 ........................................ [(4+0), Commit:
0.00Y/58133164M (0%), Used: 719866819603b]
| | o- pool_cache ...................................... [(x3), Commit:
0.00Y/706164M (0%), Used: 422549509342b]
| | o- poolfs ................................................ [(x3),
Commit: 0.00Y/25836962M (0%), Used: 0.00Y]
| | o- rbd ................................................... [(x3),
Commit: 0.00Y/25836962M (0%), Used: 5737b]
| o- topology
...............................................................................
[OSDs: 61,MONs: 2]
o- disks
.......................................................................................
[11.0T, Disks: 3]
| o- pool1
........................................................................................
[pool1 (5.0T)]
| | o- vmware_iscsi1
................................................................
[pool1/vmware_iscsi1 (5.0T)]
| o- pool2
........................................................................................
[pool2 (6.0T)]
| o- iscsi-test
......................................................................
[pool2/iscsi-test (1.0T)]
| o- vmware_iscsi2
................................................................
[pool2/vmware_iscsi2 (5.0T)]
o- iscsi-targets
...............................................................
[DiscoveryAuth: None, Targets: 1]
o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
......................................... [Auth: None, Gateways: 2]
o- disks
..........................................................................................
[Disks: 3]
| o- pool1/vmware_iscsi1
................................................................ [Owner:
ceph-iscsi2]
| o- pool2/iscsi-test
................................................................... [Owner:
ceph-iscsi2]
| o- pool2/vmware_iscsi2
................................................................ [Owner:
ceph-iscsi1]
o- gateways
............................................................................
[Up: 2/2, Portals: 2]
| o- ceph-iscsi1
........................................................................
[192.168.201.1 (UP)]
| o- ceph-iscsi2
........................................................................
[192.168.201.2 (UP)]
o- host-groups
..................................................................................
[Groups : 0]
o- hosts
.......................................................................
[Auth: ACL_ENABLED, Hosts: 2]
o- iqn.1994-05.com.redhat:rh7-client ..............................
[LOGGED-IN, Auth: CHAP, Disks: 2(10.0T)]
| o- lun 0 .................................................
[pool1/vmware_iscsi1(5.0T), Owner: ceph-iscsi2]
| o- lun 1 .................................................
[pool2/vmware_iscsi2(5.0T), Owner: ceph-iscsi1]
o- iqn.1994-05.com.redhat:tcnvh8 ..................................
[LOGGED-IN, Auth: CHAP, Disks: 2(10.0T)]
o- lun 0 .................................................
[pool1/vmware_iscsi1(5.0T), Owner: ceph-iscsi2]
o- lun 1 .................................................
[pool2/vmware_iscsi2(5.0T), Owner: ceph-iscsi1]
from one of the iscsi nodes, and give me the output of:
# targetcli ls
ceph-iscsi1 ~]# targetcli ls
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
o- /
.........................................................................................................
[...]
o- backstores
..............................................................................................
[...]
| o- block
..................................................................................
[Storage Objects: 0]
| o- fileio
.................................................................................
[Storage Objects: 0]
| o- pscsi
..................................................................................
[Storage Objects: 0]
| o- ramdisk
................................................................................
[Storage Objects: 0]
| o- user:glfs
..............................................................................
[Storage Objects: 0]
| o- user:qcow
..............................................................................
[Storage Objects: 0]
| o- user:rbd
...............................................................................
[Storage Objects: 3]
| | o- pool1.vmware_iscsi1 ............................
[pool1/vmware_iscsi1;osd_op_timeout=30 (5.0TiB) activated]
| | | o- alua
...................................................................................
[ALUA Groups: 3]
| | | o- ano1
............................................................... [ALUA
state: Active/non-optimized]
| | | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| | o- pool2.iscsi-test ..................................
[pool2/iscsi-test;osd_op_timeout=30 (1.0TiB) activated]
| | | o- alua
...................................................................................
[ALUA Groups: 3]
| | | o- ano1
............................................................... [ALUA
state: Active/non-optimized]
| | | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| | o- pool2.vmware_iscsi2 ............................
[pool2/vmware_iscsi2;osd_op_timeout=30 (5.0TiB) activated]
| | o- alua
...................................................................................
[ALUA Groups: 3]
| | o- ano2
............................................................... [ALUA
state: Active/non-optimized]
| | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| o- user:zbc
...............................................................................
[Storage Objects: 0]
o- iscsi
............................................................................................
[Targets: 1]
| o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
......................................................... [TPGs: 2]
| o- tpg1
..........................................................................
[no-gen-acls, auth per-acl]
| | o- acls
..........................................................................................
[ACLs: 2]
| | | o- iqn.1994-05.com.redhat:rh7-client
........................................ [1-way auth, Mapped LUNs: 2]
| | | | o- mapped_lun0
.................................................... [lun0
user/pool1.vmware_iscsi1 (rw)]
| | | | o- mapped_lun1
.................................................... [lun2
user/pool2.vmware_iscsi2 (rw)]
| | | o- iqn.1994-05.com.redhat:tcnvh8
............................................ [1-way auth, Mapped LUNs: 2]
| | | o- mapped_lun0
.................................................... [lun0
user/pool1.vmware_iscsi1 (rw)]
| | | o- mapped_lun1
.................................................... [lun2
user/pool2.vmware_iscsi2 (rw)]
| | o- luns
..........................................................................................
[LUNs: 3]
| | | o- lun0
................................................................
[user/pool1.vmware_iscsi1 (ano1)]
| | | o- lun1
...................................................................
[user/pool2.iscsi-test (ano1)]
| | | o- lun2
..................................................................
[user/pool2.vmware_iscsi2 (ao)]
| | o- portals
....................................................................................
[Portals: 1]
| | o- 192.168.201.1:3260
...............................................................................
[OK]
| o- tpg2
...........................................................................................
[disabled]
| o- acls
..........................................................................................
[ACLs: 0]
| o- luns
..........................................................................................
[LUNs: 3]
| | o- lun0
..................................................................
[user/pool1.vmware_iscsi1 (ao)]
| | o- lun1
.....................................................................
[user/pool2.iscsi-test (ao)]
| | o- lun2
................................................................
[user/pool2.vmware_iscsi2 (ano2)]
| o- portals
....................................................................................
[Portals: 1]
| o- 192.168.201.2:3260
...............................................................................
[OK]
o- loopback
.........................................................................................
[Targets: 0]
ceph-iscsi2 ~]# targetcli ls
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
o- /
.........................................................................................................
[...]
o- backstores
..............................................................................................
[...]
| o- block
..................................................................................
[Storage Objects: 0]
| o- fileio
.................................................................................
[Storage Objects: 0]
| o- pscsi
..................................................................................
[Storage Objects: 0]
| o- ramdisk
................................................................................
[Storage Objects: 0]
| o- user:glfs
..............................................................................
[Storage Objects: 0]
| o- user:qcow
..............................................................................
[Storage Objects: 0]
| o- user:rbd
...............................................................................
[Storage Objects: 3]
| | o- pool1.vmware_iscsi1 ............................
[pool1/vmware_iscsi1;osd_op_timeout=30 (5.0TiB) activated]
| | | o- alua
...................................................................................
[ALUA Groups: 3]
| | | o- ano1
............................................................... [ALUA
state: Active/non-optimized]
| | | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| | o- pool2.iscsi-test ..................................
[pool2/iscsi-test;osd_op_timeout=30 (1.0TiB) activated]
| | | o- alua
...................................................................................
[ALUA Groups: 3]
| | | o- ano1
............................................................... [ALUA
state: Active/non-optimized]
| | | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| | o- pool2.vmware_iscsi2 ............................
[pool2/vmware_iscsi2;osd_op_timeout=30 (5.0TiB) activated]
| | o- alua
...................................................................................
[ALUA Groups: 3]
| | o- ano2
............................................................... [ALUA
state: Active/non-optimized]
| | o- ao
..................................................................... [ALUA
state: Active/optimized]
| | o- default_tg_pt_gp
....................................................... [ALUA state:
Active/optimized]
| o- user:zbc
...............................................................................
[Storage Objects: 0]
o- iscsi
............................................................................................
[Targets: 1]
| o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
......................................................... [TPGs: 2]
| o- tpg1
...........................................................................................
[disabled]
| | o- acls
..........................................................................................
[ACLs: 0]
| | o- luns
..........................................................................................
[LUNs: 3]
| | | o- lun0
................................................................
[user/pool1.vmware_iscsi1 (ano1)]
| | | o- lun1
...................................................................
[user/pool2.iscsi-test (ano1)]
| | | o- lun2
..................................................................
[user/pool2.vmware_iscsi2 (ao)]
| | o- portals
....................................................................................
[Portals: 1]
| | o- 192.168.201.1:3260
...............................................................................
[OK]
| o- tpg2
..........................................................................
[no-gen-acls, auth per-acl]
| o- acls
..........................................................................................
[ACLs: 2]
| | o- iqn.1994-05.com.redhat:rh7-client
........................................ [1-way auth, Mapped LUNs: 2]
| | | o- mapped_lun0
.................................................... [lun0
user/pool1.vmware_iscsi1 (rw)]
| | | o- mapped_lun1
.................................................... [lun2
user/pool2.vmware_iscsi2 (rw)]
| | o- iqn.1994-05.com.redhat:tcnvh8
............................................ [1-way auth, Mapped LUNs: 2]
| | o- mapped_lun0
.................................................... [lun0
user/pool1.vmware_iscsi1 (rw)]
| | o- mapped_lun1
.................................................... [lun2
user/pool2.vmware_iscsi2 (rw)]
| o- luns
..........................................................................................
[LUNs: 3]
| | o- lun0
..................................................................
[user/pool1.vmware_iscsi1 (ao)]
| | o- lun1
.....................................................................
[user/pool2.iscsi-test (ao)]
| | o- lun2
................................................................
[user/pool2.vmware_iscsi2 (ano2)]
| o- portals
....................................................................................
[Portals: 1]
| o- 192.168.201.2:3260
...............................................................................
[OK]
o- loopback
.........................................................................................
[Targets: 0]
from both iscsi nodes.
The ceph-iscsi and tcmu-runner versions and did you build them yourself,
get them from the ceph repos or get it from a distro repo.
I use in both gateways: ceph-iscsi-3.3-1.el7.noarch from ceph-iscsi repo (
http://download.ceph.com/ceph-iscsi/3/rpm/el7/noarch)
tcmu-runner-1.4.0-0.1.51.geef5115.el7.x86_64 from
https://3.chacra.ceph.com/r/tcmu-runner/master/eef511565078fb4e2ed52caaff16…
It turns out, now the gateway
"ceph-iscsi1" is working. When I turn on
"ceph-iscsi2" it appears as "Active / Not optimized" for the two RBD
images (before was an Active / Optimized for each image), and if I turn
off "ceph-iscsi1" it (ceph-iscs2) remains as "Active / No optimized
"and
images are unavailable.
On the ESX side can you give me the output of:
esxcli storage nmp path list -d disk_id
esxcli storage nmp path list -d naa.6001405ba48e0b99e4c418ca13506c8e
iqn.1994-05.com.redhat:rh7-client-00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,1-naa.6001405ba48e0b99e4c418ca13506c8e
Runtime Name: vmhba68:C0:T0:L0
Device: naa.6001405ba48e0b99e4c418ca13506c8e
Device Display Name: LIO-ORG iSCSI Disk
(naa.6001405ba48e0b99e4c418ca13506c8e)
Group State: active unoptimized
Array Priority: 0
Storage Array Type Path Config:
{TPG_id=1,TPG_state=ANO,RTP_id=1,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
esxcli storage core device list -d disk_id
:~] esxcli storage core device list -d naa.6001405ba48e0b99e4c418ca13506c8e
naa.6001405ba48e0b99e4c418ca13506c8e
Display Name: LIO-ORG iSCSI Disk (naa.6001405ba48e0b99e4c418ca13506c8e)
Has Settable Display Name: true
Size: 5242880
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/naa.6001405ba48e0b99e4c418ca13506c8e
Vendor: LIO-ORG
Model: TCMU device
Revision: 0002
SCSI Level: 5
Is Pseudo: false
Status: degraded
Is RDM Capable: true
Is Local: false
Is Removable: false
Is SSD: false
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: supported
Other UIDs: vml.02000000006001405ba48e0b99e4c418ca13506c8e54434d552064
Is Shared Clusterwide: true
Is SAS: false
Is USB: false
Is Boot Device: false
Device Max Queue Depth: 128
No of outstanding IOs with competing worlds: 32
Drive Type: unknown
RAID Level: unknown
Number of Physical Drives: unknown
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false
esxcli storage nmp device list -d disk_id
esxcli storage nmp device list -d naa.6001405ba48e0b99e4c418ca13506c8e
naa.6001405ba48e0b99e4c418ca13506c8e
Device Display Name: LIO-ORG iSCSI Disk
(naa.6001405ba48e0b99e4c418ca13506c8e)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on;
explicit_support=off; explicit_allow=on; alua_followover=on;
action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current Path=vmhba68:C0:T0:L0
Path Selection Policy Device Custom Config:
Working Paths: vmhba68:C0:T0:L0
Is USB: false
esxcli storage nmp satp list
esxcli storage nmp satp list
Name Default PSP Description
------------------- -------------
--------------------------------------------------------------------------------
VMW_SATP_ALUA VMW_PSP_MRU Supports non-specific arrays that use
the ALUA protocol
VMW_SATP_MSA VMW_PSP_MRU Placeholder (plugin not loaded)
VMW_SATP_DEFAULT_AP VMW_PSP_MRU Placeholder (plugin not loaded)
VMW_SATP_SVC VMW_PSP_FIXED Placeholder (plugin not loaded)
VMW_SATP_EQL VMW_PSP_FIXED Placeholder (plugin not loaded)
VMW_SATP_INV VMW_PSP_FIXED Placeholder (plugin not loaded)
VMW_SATP_EVA VMW_PSP_FIXED Placeholder (plugin not loaded)
VMW_SATP_ALUA_CX VMW_PSP_RR Placeholder (plugin not loaded)
VMW_SATP_SYMM VMW_PSP_RR Placeholder (plugin not loaded)
VMW_SATP_CX VMW_PSP_MRU Placeholder (plugin not loaded)
VMW_SATP_LSI VMW_PSP_MRU Placeholder (plugin not loaded)
VMW_SATP_DEFAULT_AA VMW_PSP_FIXED Supports non-specific active/active
arrays
VMW_SATP_LOCAL VMW_PSP_FIXED Supports direct attached devices
and the /var/log/vmkernel.log for when you stop a node
and the image
goes to the unavailable state.
2020-02-02T00:14:59.094Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640:
Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380
2020-02-02T00:14:59.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:14:59.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405cb71b882378c4138826c2da30" - issuing command 0x45a283482940
2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405cb71b882378c4138826c2da30" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405cb71b882378c4138826c2da30": awaiting fast path state update...
2020-02-02T00:15:06.979Z cpu25:2124996)ScsiDeviceIO: 3449:
Cmd(0x459a9abd2380) 0x9e, CmdSN 0x1dc0cf from world 0 to dev
"naa.6001405ba48e0b99e4c418ca13506c8e" failed H:0x5 D:0x0 P:0x0 Invalid
sense data: 0x46 0x80 0x41.
2020-02-02T00:15:06.979Z cpu25:2124996)WARNING: NMP:
nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e"
is blocked. Not starting I/O from device.
2020-02-02T00:15:07.094Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:603: Retry world restore device
"naa.6001405ba48e0b99e4c418ca13506c8e" - no more commands to retry
2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP:
nmp_IssueCommandToDevice:5726: I/O could not be issued to device
"naa.6001405ba48e0b99e4c418ca13506c8e" due to Not found
2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP:
nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e":
awaiting fast path state update for failover with I/O blocked. No prior
reservation exists on the device.
2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP:
nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e"
is blocked. Not starting I/O from device.
2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380
2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:08.154Z cpu25:2124996)ScsiDeviceIO: 3449:
Cmd(0x45a283482940) 0x12, CmdSN 0x1dc10c from world 0 to dev
"naa.6001405cb71b882378c4138826c2da30" failed H:0x5 D:0x0 P:0x0 Invalid
sense data: 0x46 0x80 0x41.
2020-02-02T00:15:08.154Z cpu25:2124996)WARNING: NMP:
nmp_DeviceStartLoop:729: NMP Device "naa.6001405cb71b882378c4138826c2da30"
is blocked. Not starting I/O from device.
2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380
2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:603: Retry world restore device
"naa.6001405cb71b882378c4138826c2da30" - no more commands to retry
2020-02-02T00:15:14.094Z cpu22:2098078)NMP:
nmp_ResetDeviceLogThrottling:3575: last error status from device
naa.6001405cb71b882378c4138826c2da30 repeated 35 times
2020-02-02T00:15:14.095Z cpu22:2098078)NMP:
nmp_ResetDeviceLogThrottling:3575: last error status from device
naa.6001405ba48e0b99e4c418ca13506c8e repeated 8 times
2020-02-02T00:15:39.094Z cpu27:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380
2020-02-02T00:15:39.095Z cpu27:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:39.095Z cpu27:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:46.981Z cpu25:2124996)ScsiDeviceIO: 3449:
Cmd(0x459a9abd2380) 0x25, CmdSN 0x1dc10d from world 0 to dev
"naa.6001405ba48e0b99e4c418ca13506c8e" failed H:0x5 D:0x0 P:0x0 Invalid
sense data: 0x0 0x0 0x0.
2020-02-02T00:15:46.981Z cpu25:2124996)WARNING: NMP:
nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e"
is blocked. Not starting I/O from device.
2020-02-02T00:15:47.094Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:603: Retry world restore device
"naa.6001405ba48e0b99e4c418ca13506c8e" - no more commands to retry
2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP:
nmp_IssueCommandToDevice:5726: I/O could not be issued to device
"naa.6001405ba48e0b99e4c418ca13506c8e" due to Not found
2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP:
nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e":
awaiting fast path state update for failover with I/O blocked. No prior
reservation exists on the device.
2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP:
nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e"
is blocked. Not starting I/O from device.
2020-02-02T00:15:48.094Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9ba62b40
2020-02-02T00:15:48.095Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:48.095Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9ba62b40
2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:715: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not
found (APD), try again...
2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:765: Logical device
"naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update...
2020-02-02T00:15:52.967Z cpu0:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2170: socket 0x430e49d44a90 network resource pool
netsched.pools.persist.iscsi associated
2020-02-02T00:15:52.967Z cpu0:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2198: socket 0x430e49d44a90 network tracker id
193945594 tracker.iSCSI.192.168.201.1 associated
2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk:
iscsivmk_StartConnection:880: vmhba68:CH:0 T:0 CN:0: iSCSI connection is
being marked "ONLINE"
2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk:
iscsivmk_StartConnection:881: Sess [ISID: 00023d000001 TARGET:
iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw TPGT: 1 TSIH: 0]
2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk:
iscsivmk_StartConnection:882: Conn [CID: 0 L: 192.168.201.107:35264 R:
192.168.201.1:3260]
2020-02-02T00:15:54.712Z cpu30:2097595)ScsiDevice: 4481: Handle REPORTED
LUNS CHANGED DATA unit attention
2020-02-02T00:15:54.712Z cpu30:2097595)ScsiDevice: 4512: Handle INQUIRY
PARAMETERS CHANGED unit attention
2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 6001: Setting Device
naa.6001405ba48e0b99e4c418ca13506c8e state back to 0x2
2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 8708: No Handlers
registered! (naa.6001405ba48e0b99e4c418ca13506c8e)!
2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 6022: Device
naa.6001405ba48e0b99e4c418ca13506c8e is Out of APD; token num:1
2020-02-02T00:15:54.716Z cpu2:2097312)StorageApdHandler: 1315: APD exit for
0x4305de04a9d0 [naa.6001405ba48e0b99e4c418ca13506c8e]
2020-02-02T00:15:54.716Z cpu0:2097602)StorageApdHandler: 507: APD exit
event for 0x4305de04a9d0 [naa.6001405ba48e0b99e4c418ca13506c8e]
2020-02-02T00:15:54.716Z cpu0:2097602)StorageApdHandlerEv: 117: Device or
filesystem with identifier [naa.6001405ba48e0b99e4c418ca13506c8e] has
exited the All Paths Down state.
2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 6001: Setting Device
naa.6001405cb71b882378c4138826c2da30 state back to 0x2
2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 8708: No Handlers
registered! (naa.6001405cb71b882378c4138826c2da30)!
2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 6022: Device
naa.6001405cb71b882378c4138826c2da30 is Out of APD; token num:1
2020-02-02T00:15:54.717Z cpu11:2097618)StorageApdHandler: 1315: APD exit
for 0x4305de041660 [naa.6001405cb71b882378c4138826c2da30]
2020-02-02T00:15:54.717Z cpu0:2097602)StorageApdHandler: 507: APD exit
event for 0x4305de041660 [naa.6001405cb71b882378c4138826c2da30]
2020-02-02T00:15:54.717Z cpu0:2097602)StorageApdHandlerEv: 117: Device or
filesystem with identifier [naa.6001405cb71b882378c4138826c2da30] has
exited the All Paths Down state.
2020-02-02T00:15:55.096Z cpu0:2098243)NMP: nmpCompleteRetryForPath:327:
Retry world recovered device "naa.6001405ba48e0b99e4c418ca13506c8e"
2020-02-02T00:15:55.096Z cpu8:2099418 opID=17fe6cc5)World: 11943: VC opID
sps-Main-52618-962-fe-1-62ed maps to vmkernel opID 17fe6cc5
2020-02-02T00:15:55.096Z cpu8:2099418 opID=17fe6cc5)WARNING: ScsiDeviceIO:
10750: READ CAPACITY on device "naa.6001405ba48e0b99e4c418ca13506c8e" from
Plugin "NMP" failed. I/O error
2020-02-02T00:15:55.097Z cpu3:2098243)NMP: nmp_ThrottleLogForDevice:3788:
Cmd 0x28 (0x459a9abd2380, 0) to dev "naa.6001405ba48e0b99e4c418ca13506c8e"
on path "vmhba68:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2
0x4 0xa. Act:FAILOVER
2020-02-02T00:15:55.097Z cpu3:2098243)WARNING: NMP:
nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e":
awaiting fast path state update for failover with I/O blocked. No prior
reservation exists on the device.
2020-02-02T00:15:56.095Z cpu27:2098080)WARNING: NMP:
nmpDeviceAttemptFailover:640: Retry world failover device
"naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380
2020-02-02T00:15:56.099Z cpu3:2098243)NMP: nmpCompleteRetryForPath:327:
Retry world recovered device "naa.6001405ba48e0b99e4c418ca13506c8e"
2020-02-02T00:16:02.095Z cpu23:2097316)qfle3:
qfle3_queue_alloc_with_attr:517: [vmnic0]
QueueOps.qfle3_queue_alloc_with_attr num_attr :1 attrs: 0x451a4521bcd0
2020-02-02T00:16:02.095Z cpu23:2097316)qfle3:
qfle3_queue_alloc_with_attr:545: [vmnic0] Feature LRO requested.
2020-02-02T00:16:02.095Z cpu23:2097316)qfle3: qfle3_rq_alloc:282: [vmnic0]
allocating RX queue at 1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_alloc:316: [vmnic0]
Marking RX queue 1 IN_USE
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_queue_start:1529:
[vmnic0] QueueOps.queueStart
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_queue_start:1537:
[vmnic0] RxQ, QueueIDVal:1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rx_queue_start:1452:
[vmnic0] qfle3_rx_queue_start, QueueIDVal:1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1287: [vmnic0]
qfle3_rq_start 1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_init_eth_fp:68:
[vmnic0] qfle3_init_eth_fp for fp 1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1334: [vmnic0]
RX fp[1]: wrote prods bd_prod=4078 cqe_prod=4030 sge_prod=1024
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1351: [vmnic0]
enabled netpoll for q_index 1
2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1357: [vmnic0]
Enabling interrupt on vector # 3
2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1377: [vmnic0]
RX queue setup_queue successful for 1
2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1411: [vmnic0]
active Rx queue Count 2
2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1412: [vmnic0]
RX queue 1 successfully started
2020-02-02T00:16:02.098Z cpu23:2097316)qfle3:
qfle3_queue_remove_filter:2063: [vmnic0] QueueOps.queueRemoveFilter
2020-02-02T00:16:02.099Z cpu23:2097316)qfle3:
qfle3_remove_queue_filter:2012: [vmnic0] NetQ removed RX filter: queue:0
mac: 00:50:56:6f:59:4f filter id:3
2020-02-02T00:16:02.099Z cpu23:2097316)qfle3:
qfle3_queue_apply_filter:1923: [vmnic0] QueueOps.queueApplyFilter 1
2020-02-02T00:16:02.101Z cpu23:2097316)qfle3:
qfle3_apply_queue_mac_filter:1798: [vmnic0] NetQ set RX filter: queue:1
mac: 00:50:56:6f:59:4f filter id:0
2020-02-02T00:16:04.924Z cpu0:2099418 opID=cff7ff5)World: 11943: VC opID
sps-Main-52618-962-d0-d6-6321 maps to vmkernel opID cff7ff5
2020-02-02T00:16:04.924Z cpu0:2099418 opID=cff7ff5)Unmap6: 7133: [Unmap]
'datastore1':device(0x43078fc9a1f0)does not support unmap
2020-02-02T00:16:07.164Z cpu9:2271849)J6: 2651: 'Storage_Ceph_pool1':
Exiting async journal replay manager world
2020-02-02T00:16:09.665Z cpu12:2271850)J6: 2651: 'Storage_Ceph_pool2':
Exiting async journal replay manager world
2020-02-02T00:16:10.930Z cpu4:2271852)J6: 2651: 'datastore1': Exiting async
journal replay manager world
2020-02-02T00:16:11.094Z cpu22:2271853)J6: 2651: 'SSD1': Exiting async
journal replay manager world
2020-02-02T00:16:22.094Z cpu26:2097316)qfle3:
qfle3_queue_remove_filter:2063: [vmnic0] QueueOps.queueRemoveFilter
2020-02-02T00:16:22.097Z cpu26:2097316)qfle3:
qfle3_remove_queue_filter:2012: [vmnic0] NetQ removed RX filter: queue:1
mac: 00:50:56:6f:59:4f filter id:0
2020-02-02T00:16:22.098Z cpu26:2097316)qfle3:
qfle3_queue_apply_filter:1923: [vmnic0] QueueOps.queueApplyFilter 0
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3:
qfle3_apply_queue_mac_filter:1798: [vmnic0] NetQ set RX filter: queue:0
mac: 00:50:56:6f:59:4f filter id:3
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_queue_quiesce:1061:
[vmnic0] QueueOps.queueQuiesce
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_queue_quiesce:1069:
[vmnic0] RxQ, QueueIDVal:1
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rx_queue_stop:1558:
[vmnic0] qfle3_rx_queue_stop, QueueIDVal:1
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rq_stop:740: [vmnic0]
qfle3_rq_stop 1
2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rq_stop:811: [vmnic0]
Stopping queue 0
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:831: [vmnic0]
disable netpoll for q_index 1
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:842: [vmnic0]
Disabling interrupt on vector # 3
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:867: [vmnic0]
active Rx queue Count 1
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_queue_free:690:
[vmnic0] QueueOps.queueFree
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_queue_free:697:
[vmnic0] RxQ, QueueIDVal:1
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_free:618: [vmnic0]
Loop through 1 RSS queues 1
2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_cmd_remove_q:19191:
[vmnic0] Releasing Q idx 1
2020-02-02T00:17:05.925Z cpu4:2100126 opID=4a500647)World: 11943: VC opID
sps-Main-52618-962-3a-dd-633d maps to vmkernel opID 4a500647
2020-02-02T00:17:05.925Z cpu4:2100126 opID=4a500647)Unmap6: 7133: [Unmap]
'datastore1':device(0x43078fc99b10)does not support unmap
2020-02-02T00:17:11.930Z cpu15:2271855)J6: 2651: 'datastore1': Exiting
async journal replay manager world
2020-02-02T00:17:12.100Z cpu34:2271856)J6: 2651: 'SSD1': Exiting async
journal replay manager world
Nothing is recorded in the logs. Can you help me?
Em qui., 26 de dez. de 2019 às 17:44, Mike Christie <mchristi(a)redhat.com
<mailto:mchristi@redhat.com>> escreveu:
On 12/24/2019 06:40 AM, Gesiel Galvão Bernardes wrote:
> In addition: I turned off one of the GWs, and with just one it
works
> fine. When the two go up, one of the
images is changing the
"active /
> optimized" all time (where
generates the logs above) and
everything is
extremely
slow.
Your multipathing in ESX is probably misconfigured and you have set
it
up for active active, or one host can't
see all the iscsi paths
either
because it's not logged into all the
sessions or because the network
is
not up on one of the paths.
I'm using:
tcmu-runner-1.4
ceph-iscsi-3.3
ceph 13.2.7
Regards,
Gesiel
Em ter., 24 de dez. de 2019 às 09:09, Gesiel Galvão Bernardes
<gesiel.bernardes(a)gmail.com <mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>> escreveu:
Hi,
I am having an unusual slowdown using VMware with ISCSI gws. I
have
two ISCSI gateways with two RBD images. I
have checked the
following
in the logs:
Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: 2019-12-24
09:00:26.040 969
> [INFO] alua_implicit_transition:562 rbd/pool1.vmware_iscsi1:
> Starting lock acquisition operation.2019-12-24 09:00:26.040 969
> [INFO] alua_implicit_transition:557 rbd/pool1.vmware_iscsi1:
Lock
acquisition operation is already in process.2019-12-24
09:00:26.973
969 [WARN] tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired
exclusive lock.
Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24
09:00:28.099 969
[WARN] tcmu_notify_lock_lost:201
rbd/pool1.vmware_iscsi1:
Async lock
> drop. Old state 1
> Dec 24 09:00:28 ceph-iscsi2 tcmu-runner:
tcmu_notify_lock_lost:201
rbd/pool1.vmware_iscsi1: Async lock drop. Old state 1
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner:
alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting
lock
acquisition operation.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24
09:00:28.824 969
[INFO] alua_implicit_transition:562
rbd/pool1.vmware_iscsi1:
Starting lock acquisition operation.2019-12-24 09:00:28.990 969
[WARN] tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired
exclusive
lock.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Can anyone help-me please?
Gesiel
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to
ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>