In data lunedì 27 aprile 2020 18:46:09 CEST, Mike Christie ha scritto:
[snip]
Are you using the ceph-iscsi tools with tcmu-runner or
did you setup
tcmu-runner directly with targetcli?
I followed this guide:
https://docs.ceph.com/docs/master//rbd/iscsi-target-cli/[1]
and
configured the target with gwcli, so I think I'm using ceph-iscsi tools.
[snip]
You would see these:
1. when paths are discovered initially. The initiator is sending IO to
all paths at the same time, so the lock is bouncing between all the paths.
Ok, but the nodes are already configured and all path discovered. So that's not the
case.
You should only see this for 10-60 seconds depending
on how many paths
you have, number of nodes, etc. When the multipath layer kicks in and
adds the paths to the dm-multipath device then they should stop.
I have NO such logs when the system is running unless I start USING the luns
2. during failover/failback when the multipath layer
switches paths and
one path takes the lock from the previously used one.
No failover/failback is occuring.
Or, if you exported a disk to multiple initiator
nodes, and some
initiator nodes can't reach the active optimized path, so some
initiators are using the optimized path and some are using the
non-optimized path.
I do have exported the disk to multiple initiator nodes. How can I tell if they are using
all the
active path?
3. If you have misconfigured the system. If you used
active/active or
had initiator nodes discover different paths for the same disk or not
log into all the paths.
That may be the case, as I don't have much experience with multipath. Anyway,
following the
ceph guide, I've setup the device in /etc/multipath.conf like this:
device {
vendor "LIO-ORG"
product ".*"
path_grouping_policy "failover"
path_selector "queue-length 0"
path_checker "tur"
hardware_handler "1 alua"
prio "alua"
prio_args "exclusive_pref_bit"
failback 60
no_path_retry "queue"
fast_io_fail_tmo 25
}
and multipath -ll show this on all the six nodes:
36001405d7480e5f84b94ab19ebeebd6c dm-1 LIO-ORG ,TCMU device
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 16:0:0:0 sdb 8:16 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 15:0:0:0 sdc 8:32 active ready running
Al all nodes, one path (sdb 8:16) is always "active" with prio 50 and the other
(sdc 8:32) is
always "enabled" with prio 10. I haven't figured out how can I check which
iscsi-gateway is
mapped to the "active" path...
[snip]
Apr 27
17:36:01 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516
rbd/rbdindex0.scsidisk0: Could not check lock ownership. Error: Cannot
send after transport endpoint shutdown.
What are you using for path_checker in
/etc/multipath.conf on the
initiator side?
path_checker is set to "tur".
This is a bug but can be ignored. I am working on a
fix. Basically, we
the multipath layer is checking our state. We report we do not have the
lock correctly to the initiator, but we also get this log message over
and over when the multipath layer sends its path checker command.
And that's ok...
Thanks for all the help you can provide!
*Simone Lazzaris*
*Qcom S.p.A. a socio unico*
simone.lazzaris(a)qcom.it[2] |
www.qcom.it[3]
* LinkedIn[4]* | *Facebook[5]*
[6]
--------
[1]
https://docs.ceph.com/docs/master//rbd/iscsi-target-cli/
[2] mailto:simone.lazzaris@qcom.it
[3]
https://www.qcom.it
[4]
https://www.linkedin.com/company/qcom-spa
[5]
http://www.facebook.com/qcomspa
[6]
https://www.qcom.it/pdf/other/bannerfirmamail.gif