Hi
Em dom., 9 de fev. de 2020 às 18:27, Mike Christie <mchristi(a)redhat.com
<mailto:mchristi@redhat.com>> escreveu:
On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes wrote:
Hi,
Em qui., 6 de fev. de 2020 às 18:56, Mike Christie
<mchristi(a)redhat.com
<mailto:mchristi@redhat.com>
<mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>> escreveu:
On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote:
> Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes
> <gesiel.bernardes(a)gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com <mailto:gesiel.bernardes@gmail.com>>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>>> escreveu:
>
> Hi,
>
> Just now was possible continue this. Below is the
information
required. Thanks advan
Hey, sorry for the late reply. I just back from PTO.
esxcli storage nmp device list -d
naa.6001405ba48e0b99e4c418ca13506c8e
naa.6001405ba48e0b99e4c418ca13506c8e
Device Display Name: LIO-ORG iSCSI Disk
(naa.6001405ba48e0b99e4c418ca13506c8e)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on;
explicit_support=off; explicit_allow=on; alua_followover=on;
action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current
Path=vmhba68:C0:T0:L0
Path Selection Policy Device Custom
Config:
Working Paths: vmhba68:C0:T0:L0
Is USB: false
........
Failed: H:0x0 D:0x2 P:0x0 Valid sense data:
0x2 0x4 0xa.
Act:FAILOVER
Are you sure you are using tcmu-runner 1.4? Is that the actual
daemon
reversion running? Did you by any chance
install the 1.4 rpm,
but you/it
did not restart the daemon? The error code
above is returned
in 1.3 and
earlier.
You are probably hitting a combo of 2 issues.
We had only listed ESX 6.5 in the docs you probably saw, and
in 6.7 the
value of action_OnRetryErrors defaulted to on
instead of off.
You should
set this back to off.
You should also upgrade to the current version of tcmu-runner
1.5.x. It
should fix the issue you are hitting, so non
IO commands like
inquiry,
RTPG, etc are executed while failing
over/back, so you would
not hit the
problem where path initialization and path
testing IO is
failed causing
the path to marked as failed.
I updated tcmu-runner to 1.5.2, and change action_OnRetryErrors to
off,
but the problem continue 😭
Attached is vmkernel.log.
When you stopped the iscsi gw at around 2020-02-09T01:51:25.820Z, how
many paths did your device have? Did:
esxcli storage nmp path list -d your_device
report only one path? Did
esxcli iscsi session connection list
show a iscsi connection to each gw?
Hmmm, I believe the problem may be here. I verified that I was listing
only one GW for each path. So I ran a "rescan HBA" on VMware on both
ESX, now one of them lists the 3 (I added one more) gateways, but an ESX
host with the same configuration continues to list only one gateway. See
the different outputs:
[root@tcnvh7:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.107
SessionCreateTime: 01/19/20 00:11:25
ConnectionCreateTime: 01/19/20 00:11:25
ConnectionStartTime: 02/13/20 23:03:10
State: logged_in
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000002,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000002
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.2
RemoteAddress: 192.168.201.2
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000003,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000003
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.3
RemoteAddress: 192.168.201.3
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
=====
[root@tcnvh8:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.108
SessionCreateTime: 01/12/20 02:53:53
ConnectionCreateTime: 01/12/20 02:53:53
ConnectionStartTime: 02/13/20 23:06:40
State: logged_in
Is that the problem? Any ideas on how to proceed from here?
Yes. Normally, you would have the connection already created, and when
one path/gateway goes down, then the multipath layer will switch to
another path. When the path/gateway comes back up, the initiator side's
iscsi layer will reconnect automatically and the multipath layer will
re-setup the path structure, so it can failback if its a higher priority
path or failover later if other paths go down.
Something happened with the automatic path connection process on that
node. We know it works for that one gateway you brought up/down. For the
other gateways I would check:
1. Check that all target portals are being discovered. In the GUI screen
you entered in the discovery address, you should also see a list of all
target portals that were found in the static section. Do you only see 1
portal?
See here: