Em sex., 14 de fev. de 2020 às 13:25, Mike Christie <mchristi(a)redhat.com
<mailto:mchristi@redhat.com>> escreveu:
On 02/13/2020 08:52 PM, Gesiel Galvão Bernardes wrote:
Hi
Em dom., 9 de fev. de 2020 às 18:27, Mike Christie
<mchristi(a)redhat.com
<mailto:mchristi@redhat.com>
<mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>> escreveu:
On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes wrote:
Hi,
Em qui., 6 de fev. de 2020 às 18:56, Mike Christie
<mchristi(a)redhat.com
<mailto:mchristi@redhat.com>
<mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>
> <mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>
<mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>>> escreveu:
>
> On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote:
> > Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão
Bernardes
> > <gesiel.bernardes(a)gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>>
>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>
>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>
<mailto:gesiel.bernardes@gmail.com
<mailto:gesiel.bernardes@gmail.com>>>>> escreveu:
>
> Hi,
>
> Just now was possible continue this. Below is the
information
> > required. Thanks advan
>
>
> Hey, sorry for the late reply. I just back from PTO.
>
> >
> > esxcli storage nmp device list -d
> naa.6001405ba48e0b99e4c418ca13506c8e
> > naa.6001405ba48e0b99e4c418ca13506c8e
> > Device Display Name: LIO-ORG iSCSI Disk
> > (naa.6001405ba48e0b99e4c418ca13506c8e)
> > Storage Array Type: VMW_SATP_ALUA
> > Storage Array Type Device Config:
{implicit_support=on;
> > explicit_support=off;
explicit_allow=on;
alua_followover=on;
> > action_OnRetryErrors=on;
{TPG_id=1,TPG_state=ANO}}
> > Path Selection Policy: VMW_PSP_MRU
> > Path Selection Policy Device Config: Current
> Path=vmhba68:C0:T0:L0
> > Path Selection Policy Device Custom Config:
> > Working Paths: vmhba68:C0:T0:L0
> > Is USB: false
>
> ........
>
> > Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2
0x4
0xa.
> Act:FAILOVER
>
>
> Are you sure you are using tcmu-runner 1.4? Is that the
actual
daemon
> reversion running? Did you by any chance install the 1.4
rpm,
but you/it
did not restart the daemon? The error code
above is returned
in 1.3 and
earlier.
You are probably hitting a combo of 2 issues.
We had only listed ESX 6.5 in the docs you probably saw, and
in 6.7 the
> value of action_OnRetryErrors defaulted to on instead of
off.
You should
> set this back to off.
>
> You should also upgrade to the current version of
tcmu-runner
1.5.x. It
> should fix the issue you are hitting, so non IO commands
like
inquiry,
RTPG, etc are executed while failing
over/back, so you would
not hit the
problem where path initialization and path
testing IO is
failed causing
> the path to marked as failed.
>
>
> I updated tcmu-runner to 1.5.2, and change
action_OnRetryErrors to
off,
but the problem continue 😭
Attached is vmkernel.log.
When you stopped the iscsi gw at around
2020-02-09T01:51:25.820Z, how
many paths did your device have? Did:
esxcli storage nmp path list -d your_device
report only one path? Did
esxcli iscsi session connection list
show a iscsi connection to each gw?
Hmmm, I believe the problem may be here. I verified that I was listing
only one GW for each path. So I ran a "rescan HBA" on VMware on both
ESX, now one of them lists the 3 (I added one more) gateways, but
an ESX
host with the same configuration continues to
list only one
gateway. See
the different outputs:
[root@tcnvh7:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com
<http://iqn.2003-01.com>.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.107
SessionCreateTime: 01/19/20 00:11:25
ConnectionCreateTime: 01/19/20 00:11:25
ConnectionStartTime: 02/13/20 23:03:10
State: logged_in
vmhba68,iqn.2003-01.com
<http://iqn.2003-01.com>.redhat.iscsi-gw:iscsi-igw,00023d000002,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000002
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.2
RemoteAddress: 192.168.201.2
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
vmhba68,iqn.2003-01.com
<http://iqn.2003-01.com>.redhat.iscsi-gw:iscsi-igw,00023d000003,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000003
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.3
RemoteAddress: 192.168.201.3
LocalAddress: 192.168.201.107
SessionCreateTime: 02/13/20 23:09:16
ConnectionCreateTime: 02/13/20 23:09:16
ConnectionStartTime: 02/13/20 23:09:16
State: logged_in
=====
[root@tcnvh8:~] esxcli iscsi session connection list
vmhba68,iqn.2003-01.com
<http://iqn.2003-01.com>.redhat.iscsi-gw:iscsi-igw,00023d000001,0
Adapter: vmhba68
Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
ISID: 00023d000001
CID: 0
DataDigest: NONE
HeaderDigest: NONE
IFMarker: false
IFMarkerInterval: 0
MaxRecvDataSegmentLength: 131072
MaxTransmitDataSegmentLength: 262144
OFMarker: false
OFMarkerInterval: 0
ConnectionAddress: 192.168.201.1
RemoteAddress: 192.168.201.1
LocalAddress: 192.168.201.108
SessionCreateTime: 01/12/20 02:53:53
ConnectionCreateTime: 01/12/20 02:53:53
ConnectionStartTime: 02/13/20 23:06:40
State: logged_in
Is that the problem? Any ideas on how to proceed from here?
Yes. Normally, you would have the connection already created, and when
one path/gateway goes down, then the multipath layer will switch to
another path. When the path/gateway comes back up, the initiator side's
iscsi layer will reconnect automatically and the multipath layer will
re-setup the path structure, so it can failback if its a higher priority
path or failover later if other paths go down.
Something happened with the automatic path connection process on that
node. We know it works for that one gateway you brought up/down. For the
other gateways I would check:
1. Check that all target portals are being discovered. In the GUI screen
you entered in the discovery address, you should also see a list of all
target portals that were found in the static section. Do you only see 1
portal?
Yes, all targers are listen in "Targets" section. But in "Paths",
just
paths of host 192.168.201.1 are showing.
See here:
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.storage.do…
2. If you see all the portals then when you hit the rescan HBA button,
do you see any errors on the target side in /var/log/messages? Maybe
something about CHAP/login/auth errors?
When I hit rescan HBA, I received this messages in Logs (below). He
seems to complain about the username / password, but if that is the
case, the password is the same for all gateways, why does one connect
and another does not?
Did you change your config, but then maybe not re-update everything? It
looks like some config data is stale. For example, in your
gwcli/targetcli output there are only 2 iscsi gws at .1 and .2, but in
the vmware log we also see it trying to log into something at 192.168.201.3.
Did you maybe set up a portal/portals as static, then not modify the
settings used, or delete the static ones when rediscovering the new
target config?
I would reset the target, portal and chap settings on the vmware side.
ceph-iscsi2 - /var/log/messages
Feb 16 07:31:21 ceph-iscsi2 kernel: Security negotiation failed.
Feb 16 07:31:21 ceph-iscsi2 kernel: iSCSI Login negotiation failed.
tcnvh8 - /var/log/vmkernel.log
2020-02-16T07:25:44.718Z cpu13:2097316)qfle3:
qfle3_queue_remove_filter:2063: [vmnic0] QueueOps.queueRemoveFilter
2020-02-16T07:25:44.720Z cpu13:2097316)qfle3:
qfle3_remove_queue_filter:2012: [vmnic0] NetQ removed RX filter: queue:1
mac: 00:50:56:6c:24:9d filter id:0
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3:
qfle3_queue_apply_filter:1923: [vmnic0] QueueOps.queueApplyFilter 0
2020-02-16T07:25:44.721Z cpu13:2097316)WARNING: qfle3:
qfle3_apply_queue_mac_filter:1734: [vmnic0] NetQ could not add RX
filter, no filters for queue 0
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3: qfle3_queue_quiesce:1061:
[vmnic0] QueueOps.queueQuiesce
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3: qfle3_queue_quiesce:1069:
[vmnic0] RxQ, QueueIDVal:1
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3: qfle3_rx_queue_stop:1558:
[vmnic0] qfle3_rx_queue_stop, QueueIDVal:1
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3: qfle3_rq_stop:740:
[vmnic0] qfle3_rq_stop 1
2020-02-16T07:25:44.721Z cpu13:2097316)qfle3: qfle3_rq_stop:811:
[vmnic0] Stopping queue 0
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_rq_stop:831:
[vmnic0] disable netpoll for q_index 1
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_rq_stop:842:
[vmnic0] Disabling interrupt on vector # 3
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_rq_stop:867:
[vmnic0] active Rx queue Count 1
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_queue_free:690:
[vmnic0] QueueOps.queueFree
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_queue_free:697:
[vmnic0] RxQ, QueueIDVal:1
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_rq_free:618:
[vmnic0] Loop through 1 RSS queues 1
2020-02-16T07:25:44.724Z cpu13:2097316)qfle3: qfle3_cmd_remove_q:19191:
[vmnic0] Releasing Q idx 1
2020-02-16T07:25:49.833Z cpu37:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2170: socket 0x430fd072fc00 network resource
pool netsched.pools.persist.iscsi associated
2020-02-16T07:25:49.833Z cpu37:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2198: socket 0x430fd072fc00 network tracker id
974086138 tracker.iSCSI.192.168.201.2 associated
2020-02-16T07:25:49.833Z cpu37:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2170: socket 0x430fd097ae20 network resource
pool netsched.pools.persist.iscsi associated
2020-02-16T07:25:49.833Z cpu37:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2198: socket 0x430fd097ae20 network tracker id
974086138 tracker.iSCSI.192.168.201.3 associated
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:517: vmhba68:CH:1 T:0 CN:0: Failed to receive
data: Connection closed by peer
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:525: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 192.168.201.108:39679
<http://192.168.201.108:39679> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:50.341Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1235: vmhba68:CH:1 T:0 CN:0: Connection rx
notifying failure: Failed to Receive. State=Bound
2020-02-16T07:25:50.341Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: TARGET: (null) TPGT: 0
TSIH: 0]
2020-02-16T07:25:50.341Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1237: Conn [CID: 0 L: 192.168.201.108:39679
<http://192.168.201.108:39679> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:517: vmhba68:CH:2 T:0 CN:0: Failed to receive
data: Connection closed by peer
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:525: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:50.341Z cpu20:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 192.168.201.108:45598
<http://192.168.201.108:45598> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:25:50.341Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1235: vmhba68:CH:2 T:0 CN:0: Connection rx
notifying failure: Failed to Receive. State=Bound
2020-02-16T07:25:50.342Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: TARGET: (null) TPGT: 0
TSIH: 0]
2020-02-16T07:25:50.342Z cpu20:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1237: Conn [CID: 0 L: 192.168.201.108:45598
<http://192.168.201.108:45598> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:25:50.595Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:699: vmhba68:CH:1 T:0 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:2)
2020-02-16T07:25:50.595Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:700: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:50.595Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:701: Conn [CID: 0 L: 192.168.201.108:39679
<http://192.168.201.108:39679> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:50.600Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:699: vmhba68:CH:2 T:0 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:2)
2020-02-16T07:25:50.600Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:700: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:50.600Z cpu37:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:701: Conn [CID: 0 L: 192.168.201.108:45598
<http://192.168.201.108:45598> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:25:51.204Z cpu0:2837132)J6: 2651: 'Storage_Ceph_pool1':
Exiting async journal replay manager world
2020-02-16T07:25:55.360Z cpu38:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2170: socket 0x430fd06ef2a0 network resource
pool netsched.pools.persist.iscsi associated
2020-02-16T07:25:55.360Z cpu38:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2198: socket 0x430fd06ef2a0 network tracker id
974086138 tracker.iSCSI.192.168.201.2 associated
2020-02-16T07:25:55.361Z cpu38:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2170: socket 0x430fd0938540 network resource
pool netsched.pools.persist.iscsi associated
2020-02-16T07:25:55.361Z cpu38:2098255)iscsi_vmk:
iscsivmk_ConnNetRegister:2198: socket 0x430fd0938540 network tracker id
974086138 tracker.iSCSI.192.168.201.3 associated
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:517: vmhba68:CH:1 T:0 CN:0: Failed to receive
data: Connection closed by peer
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:525: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 192.168.201.108:17752
<http://192.168.201.108:17752> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1235: vmhba68:CH:1 T:0 CN:0: Connection rx
notifying failure: Failed to Receive. State=Bound
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: TARGET: (null) TPGT: 0
TSIH: 0]
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1237: Conn [CID: 0 L: 192.168.201.108:17752
<http://192.168.201.108:17752> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:517: vmhba68:CH:2 T:0 CN:0: Failed to receive
data: Connection closed by peer
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:525: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:55.868Z cpu21:2098243)WARNING: iscsi_vmk:
iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 192.168.201.108:15616
<http://192.168.201.108:15616> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1235: vmhba68:CH:2 T:0 CN:0: Connection rx
notifying failure: Failed to Receive. State=Bound
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: TARGET: (null) TPGT: 0
TSIH: 0]
2020-02-16T07:25:55.868Z cpu21:2098243)iscsi_vmk:
iscsivmk_ConnRxNotifyFailure:1237: Conn [CID: 0 L: 192.168.201.108:15616
<http://192.168.201.108:15616> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:25:56.121Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:699: vmhba68:CH:1 T:0 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:2)
2020-02-16T07:25:56.121Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:700: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:56.121Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:701: Conn [CID: 0 L: 192.168.201.108:17752
<http://192.168.201.108:17752> R: 192.168.201.2:3260
<http://192.168.201.2:3260>]
2020-02-16T07:25:56.127Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:699: vmhba68:CH:2 T:0 CN:0: iSCSI connection is
being marked "OFFLINE" (Event:2)
2020-02-16T07:25:56.127Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:700: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2020-02-16T07:25:56.127Z cpu38:2098255)WARNING: iscsi_vmk:
iscsivmk_StopConnection:701: Conn [CID: 0 L: 192.168.201.108:15616
<http://192.168.201.108:15616> R: 192.168.201.3:3260
<http://192.168.201.3:3260>]
2020-02-16T07:30:49.971Z cpu7:2098243)NMP:
nmp_ThrottleLogForDevice:3788: Cmd 0x28 (0x45a296a54040, 0) to dev
"naa.6001405ba48e0b99e4c418ca13506c8e" on path "vmhba68:C0:T0:L0"
Failed: H:0x0 D:0x8 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:NONE
What about in the /var/log/vmkernel.log on the initiator side? Any iscsi
errors?