[ceph-users] Re: Lock errors in iscsi gateway

29 Apr 2020

On 4/29/20 2:11 AM, Simone Lazzaris wrote:
...
  In data martedì 28 aprile 2020 18:41:27 CEST, Mike
Christie ha scritto:

  Could you send me:  

  1. The /var/log/messages for the initiator when
you do IO and see those  
  lock messages.  

 On the initiator (XenServer 7.1 which is based on CentOS AFAIK) the
 /var/log/messages is empty.

 I (sporadicly) see:

 Apr 29 09:00:36 xs-n1 systemd[1]: Starting Multipath Count Service...

 Apr 29 09:00:36 xs-n1 systemd[1]: Started Multipath Count Service.

 Apr 29 09:00:36 xs-n1 systemd[1]: Started Session 146 of user root.

 Apr 29 09:00:36 xs-n1 systemd[1]: Starting Session 146 of user root.

 Apr 29 09:00:40 xs-n1 multipathd: dm-3: remove map (uevent)

 Apr 29 09:00:40 xs-n1 multipathd: dm-3: devmap not registered, can't remove

 Apr 29 09:00:40 xs-n1 multipathd: dm-3: remove map (uevent)

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="PBD.get_all_records"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_uuid"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_name_label"];

 Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen]
 mpathalert=>xapi [label="host.get_all_records"];

  2. The output of  

  From one of the gateways:  
  # gwcli ls  

 Attached (gwcli.txt)

  From the initiator node you send the
/var/log/messages for:  
  # iscsiadm -m session -P 3  

 attacched (iscsi-session.txt)

  # multipath -ll  

 36001405d7480e5f84b94ab19ebeebd6c dm-0 LIO-ORG ,TCMU device    

 size=3.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw

 |-+- policy='queue-length 0' prio=50 status=active

 | `- 2:0:0:0 sdc 8:32 active ready running

 `-+- policy='queue-length 0' prio=10 status=enabled

   `- 3:0:0:0 sdb 8:16 active ready running

  3. version info:  

  # uname -a  

 On the Initiator:

 Linux xs-n1 4.4.0+2 #1 SMP Thu Jun 15 16:38:02 UTC 2017 x86_64 x86_64
 x86_64 GNU/Linux

 On the Target:

 Linux iscsi1 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC
 2020 x86_64 x86_64 x86_64 GNU/Linux

  If you using rpm do:  
  # rpm -q ceph-iscsi  
  # rpm -q tcmu-runner  
  # rpm -q python-rtslib  

 No, I've installed them from source on the target 

What version of tcmu-runner did you use? Was it one of the 1.4 or 1.5
releases or from the github master branch?

There was a bug in the older 1.4 release where due to a linux kernel
initiator side change the behavior for an error code we used went from
retrying for up to 5 minutes to 5 times. The 5 retries were then used in
less than a second, so we could see the issue you are seeing.

...

  To map that to an iscsi gateway then you can do
the following.  

  If sdb is the AO one, then run  

  iscsiadm -m session -P 3  

  Here you can see the sdXYZ name to iscsi session
mapping. The iscsi  
  session/connection's target IP address from
that command should match to  
  the gateway that is listed as the
"owner" of the LUN in the "gwcli ls"  
  output.  

 I see... thanks for the hint.

 I've done a test: I've unmapped all the drive, then mapped the first
 gateway (iscsi1) on all the nodes, waited, then mapped the second
 gateway, to be sure that all the nodes would see the first node as the
 active/master 

 Now things seems a little better in "normal" vm use: I only see the
 "Cannot send after transport endpoint shutdown." on the secondary target
 node.

 I do see some hopping between the nodes when importing a disk drive, but
 at this point I'm starting to suspect some strange activity from the Xen
 infrastructure in that circumstance.

 -- 

 *Simone Lazzaris*

  *Qcom S.p.A. a socio unico*

  simone.lazzaris(a)qcom.it <mailto:simone.lazzaris@qcom.it> | www.qcom.it
 <https://www.qcom.it   
  * LinkedIn <https://www.linkedin.com/company/qcom-spa>* | *Facebook*
 <http://www.facebook.com/qcomspa   

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Lock errors in iscsi gateway