[ceph-users] Re: iSCSI Gateway reboots and permanent loss

4 Dec 2019

On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:
...
  Hi,

 Em qua., 4 de dez. de 2019 às 00:31, Mike Christie &lt;mchristi(a)redhat.com
 <mailto:mchristi@redhat.com>> escreveu:

     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
  Thanks. If I am reading this correctly the
ability to remove an iSCSI
 gateway would allow the remaining iSCSI gateways to take over for the
 removed gateway's LUN's as of > 3.0. Thats good, we run 3.2. However,
 because the actual update of the central config object happens      from the
  to-be-deleted iSCSI gateway, despite where the
gwcli command is      issued,
  it will fail to actually remove said gateway from
the object if that
 gateway is not functioning.  
     Yes.

 I guess this leaves the question still of how to proceed when one      of the
  iSCSI gateways fails permanently?  Is that
possible, or is it
 potentially possible other than manually intervening on the config  
     You could edit the gateway.cfg manually, but I would not do it, because
     it's error prone.

     It's probably safest to run in degraded mode and wait for an updated
     ceph-iscsi package with a fix. If you are running into the problem right
     now, I can bump the priority.

 I permanently lost a gateway. I can not leave running "degraded" because
 I need to add another redundancy gateway, and it does not allow with the
 gateway "offline".

 In this case, what can I do? If I create a new gateway with the same
 name and IP as the lost one, and then try to use "delete" in gwcli, will
 it work? 
Yes.

If you can have a temp stop in services you can also do the following as
a workaround:

0. Stop applications accessing iscsi luns, and have the initiator log
out of the iscsi target.

1. Stop ceph iscsi service. On all iscsi gw nodes do:

systemctl stop rbd-target-api

2. Delete gateway.cfg. This will delete the configuration info like the
target and its ACL and LUN mappings. It does not delete the actual
images or pools that you have data on.

rados -p rbd rm gateway.cfg

3. Start ceph iscsi services again. On all iscsi gw nodes do:

systemctl start rbd-target-api

4. Resetup target with gwcli. For the image/disk setup stage, instead of
doing the "create" command do the "attach"command:

attach pool=your_pool image=image_name

Then just re-add your target, ACLs and LUN mappings.

5. On the initiator side relogin to the iscsi target.

...

  object? If its not possible would the best course
of action be to have
 standby hardware and quickly recreate the node or perhaps run the
 gateways more ephemerally, from a VM or container?

 Thanks again.

 Respectfully,

 *Wes Dillingham*
 wes(a)wesdillingham.com <mailto:wes@wesdillingham.com>     
<mailto:wes@wesdillingham.com <mailto:wes@wesdillingham.com>>
  LinkedIn
<http://www.linkedin.com/in/wesleydillingham>

 On Tue, Dec 3, 2019 at 2:45 PM Mike Christie &lt;mchristi(a)redhat.com     
<mailto:mchristi@redhat.com>
  <mailto:mchristi@redhat.com
<mailto:mchristi@redhat.com>>> wrote:

     I do not think it's going to do what you want when the node      you want
to
      delete is down.

     It looks like we only temporarily stop the gw from being      exported. It
      does not update the gateway.cfg, because we
do the config      removal call
      on the node we want to delete.

     So gwcli would report success and the ls command will show it      as no
      longer running/exported, but if you restart
the rbd-target-api      service
      then it will show up again.

     There is an internal command to do what you want. I will post      a PR for
      gwlci and so it can be used by dashboard.

     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
     > If I recall correctly, the recent ceph-iscsi release      supports the
      > removal of a gateway via the
"gwcli". I think the Ceph      dashboard can
   do that
as well.

 On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham      &lt;wes(a)wesdillingham.com
<mailto:wes@wesdillingham.com>      <mailto:wes@wesdillingham.com
<mailto:wes@wesdillingham.com>>> wrote:
  >
> We utilize 4 iSCSI gateways in a cluster and have noticed the      following
during patching cycles when we sequentially reboot      single
      iSCSI-gateways:
     >>
     >> "gwcli" often hangs on the still-up iSCSI GWs but sometimes 
    still
      functions and gives the message:
 >
> "1 gateway is inaccessible - updates will be disabled"
>
> This got me thinking about what the course of action would be      should an
iSCSI gateway fail permanently or semi-permanently,      say a
      hardware issue. What would be the best course
of action to      instruct
      the remaining iSCSI gateways that one of them
is no longer      available
      and that they should allow updates again and
take ownership of the
     now-defunct-node's LUNS?
 >
> I'm guessing pulling down the RADOS config object and rewriting      it and
re-put'ing it followed by a rbd-target-api restart might do
     the trick but am hoping there is a more "in-band" and less
     potentially devastating way to do this.
     >>
     >> Thanks for any insights.
     >>
     >> Respectfully,
     >>
     >> Wes Dillingham
     >> wes(a)wesdillingham.com <mailto:wes@wesdillingham.com>     
<mailto:wes@wesdillingham.com <mailto:wes@wesdillingham.com>>
      >> LinkedIn
     >> _______________________________________________
     >> ceph-users mailing list -- ceph-users(a)ceph.io     
<mailto:ceph-users@ceph.io>
      <mailto:ceph-users@ceph.io
<mailto:ceph-users@ceph.io>>
     >> To unsubscribe send an email to ceph-users-leave(a)ceph.io     
<mailto:ceph-users-leave@ceph.io>
      <mailto:ceph-users-leave@ceph.io    
 <mailto:ceph-users-leave@ceph.io>>

      _______________________________________________
     ceph-users mailing list -- ceph-users(a)ceph.io
     <mailto:ceph-users@ceph.io>
     To unsubscribe send an email to ceph-users-leave(a)ceph.io
     <mailto:ceph-users-leave@ceph.io>

2024

2023

2022

2021

2020

2019

[ceph-users] Re: iSCSI Gateway reboots and permanent loss