Hi
Em dom., 9 de fev. de 2020 às 18:27, Mike Christie <mchristi(a)redhat.com>
escreveu:
On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes
wrote:
Hi,
Em qui., 6 de fev. de 2020 às 18:56, Mike Christie <mchristi(a)redhat.com
<mailto:mchristi@redhat.com>> escreveu:
On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote:
Em dom., 2 de fev. de 2020 às 00:37, Gesiel
Galvão Bernardes
<gesiel.bernardes(a)gmail.com <mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>> escreveu:
Hi,
Just now was possible continue this. Below is the information
required. Thanks advan
Hey, sorry for the late reply. I just back from PTO.
esxcli storage nmp device list -d
naa.6001405ba48e0b99e4c418ca13506c8e
naa.6001405ba48e0b99e4c418ca13506c8e
Device Display Name: LIO-ORG iSCSI Disk
(naa.6001405ba48e0b99e4c418ca13506c8e)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on;
explicit_support=off; explicit_allow=on; alua_followover=on;
action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current
Path=vmhba68:C0:T0:L0
Path Selection Policy Device Custom
Config:
Working Paths: vmhba68:C0:T0:L0
Is USB: false
........
Failed: H:0x0 D:0x2 P:0x0 Valid sense data:
0x2 0x4 0xa.
Act:FAILOVER
Are you sure you are using tcmu-runner 1.4? Is that the actual daemon
reversion running? Did you by any chance install the 1.4 rpm, but
you/it
did not restart the daemon? The error code
above is returned in 1.3
and
earlier.
You are probably hitting a combo of 2 issues.
We had only listed ESX 6.5 in the docs you probably saw, and in 6.7
the
value of action_OnRetryErrors defaulted to on
instead of off. You
should
set this back to off.
You should also upgrade to the current version of tcmu-runner 1.5.x.
It
should fix the issue you are hitting, so non
IO commands like
inquiry,
RTPG, etc are executed while failing
over/back, so you would not hit
the
problem where path initialization and path
testing IO is failed
causing
the path to marked as failed.
I updated tcmu-runner to 1.5.2, and change action_OnRetryErrors to off,
but the problem continue 😭
Attached is vmkernel.log.
When you stopped the iscsi gw at around 2020-02-09T01:51:25.820Z, how
many paths did your device have? Did:
esxcli storage nmp path list -d your_device
report only one path? Did
esxcli iscsi session connection list
show a iscsi connection to each gw?
Hmmm, I believe the problem may be here. I verified that I was listing
only one GW
for each path. So I ran a "rescan HBA" on VMware on both ESX,
now one of them lists the 3 (I added one more) gateways, but an ESX host
with the same configuration continues to list only one gateway. See the
different outputs:
[root@tcnvh7:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.107
SessionCreateTime: 01/19/20 00:11:25
ConnectionCreateTime: 01/19/20 00:11:25
ConnectionStartTime: 02/13/20 23:03:10
State: logged_in
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000002,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000002
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.2
RemoteAddress: 192.168.201.2
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000003,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000003
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.3
RemoteAddress: 192.168.201.3
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
=====
[root@tcnvh8:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.108
SessionCreateTime: 01/12/20 02:53:53
ConnectionCreateTime: 01/12/20 02:53:53
ConnectionStartTime: 02/13/20 23:06:40
State: logged_in
Is that the problem? Any ideas on how to proceed from here?
The logs look like when you brought the gw down, we lost the only path
we had. We then went into all paths down, so IO could
not execute. It
looks like the gw was brought back up at the end of the log and the path
seem to have got added back.