[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] block-iscsi with Xen 4.5 / 4.6

On 4/05/2016 6:25 PM, George Dunlap wrote:
> On Wed, May 4, 2016 at 8:34 AM, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>> Hello,
>> I'm re-adding xen-devel in case someone else also wants to provide feedback.
>> On Wed, May 04, 2016 at 03:06:23PM +1000, Steven Haigh wrote:
>>> Hi Roger,
>>> I've been getting some good progress with iSCSI thanks to your insights.
>>> I'm now trying to add support for locking via Persistent Reservations to
>>> ensure that only one Dom0 can attach / use a single iSCSI target at once.
>> This might be problematic with migrations. IIRC there's a point during the
>> migration where both the sending and the receiving side have the disk open
>> at the same time. However Xen always makes sure that only one guest is
>> actually accessing the disk, either the one on the receiving side (if
>> everything has gone OK) or the one on the senders side (if migration has
>> failed).
>>> In a nutshell, my thoughts are to use the following to 'lock' a device:
>>>       ## Create a hex key for the lock from the systems IP.
>>>       key=$(gethostip -x $(uname -n))
>>>       sg_persist -d ${dev} -o -G -S ${key}
>>>       sg_persist -d ${dev} -o -R -K ${key} -T 6
>>> This registers the device, and sets an Exclusive Access (-T 6) flag on
>>> the iSCSI device which means nothing else will be able to open the
>>> device until the lock is removed.
>>> To unlock the device, on remove, we should do something like:
>>>       key=$(gethostip -x $(uname -n))
>>>         sg_persist -d ${dev} -o -L -K ${key} -T 6
>>>         sg_persist -d ${dev} -o -G -K ${key} -S 0
>>> This releases the device for other things to use.
>>> I've tried putting these in block-iscsi - by using a lock_device and
>>> unlock_device function and calling it after find_device in both attach()
>>> and remove().
>>> My problems:
>>> 1) -e is set on the script - and maybe elsewhere - so any time something
>>> returns non-zero, you can't clean up. For example, if you can't get a
>>> lock, you should make sure all locks are removed from the host in
>>> question and then detach the iSCSI target.
>> You can avoid this by adding something like:
>> sg_persist ... || true
>> Of course you can replace the "true" command with something else, like a
>> fatal message or some cleanup code. You can also place the command inside of
>> a conditional if you know it might fail:
>> if ! sg_persist ...; then
>>         fatal ...
>> fi
>> It is important for us to use the '-e' in order to make sure all the failure
>> points are correctly handled, without the '-e' some command might fail and
>> the script wouldn't realize.
> I realize I'm a bit in the minority here, but I've always thought this
> was rather a strange habit of bash scripts.  In every other language,
> you check the error codes of things that can fail and you handle them
> appropriately.  If you're just hacking something together, then "set
> -e" is probably OK, but wouldn't it make more sense in an
> infrastructure script like this to actually go through and handle all
> the errors?   Worst case you could just if ! [command] ; then exit 1 ;
> fi.

Then I'm in the minority too ;)

> Regarding the "maybe elsewhere" -- AFAIK the block-scsi script itself
> is run directly from libxl, so nothing "above" it should be setting
> -e; and it only includes block-common, which does not seem to be
> setting it.

Right now, even removing -e from line 1 causes something like this to
exit the script:

if [ $? != 0 ]; then

So, if I run the following, the script will always fail:

key=$(gethostip -x $(uname -n))
sg_persist -d ${dev} -o -G -S ${key} > /dev/null
if [ $? != 0 ]; then
        iscsiadm -m node -T ${iqn} --logout > /dev/null
        fatal "Could not obtain lock on $iqn"

man 3 sg3_utils (http://linux.die.net/man/8/sg3_utils) shows a possible
list of exit codes - which it would be useful to consult at least *some*
of them with useful errors - such as:

Exit status 3:
the DEVICE reports that it is not ready for the operation requested. The
device may be in the process of becoming ready (e.g. spinning up but not
at speed) so the utility may work after a wait.

Also possibly a case for locking not being supported.

> That said, in theory as Roger said, if you actually are checking the
> error code of all your commands (which you should be if you need to do
> clean-up on failure), then 'set -e' shouldn't actually be causing an
> exit.
>  -George

Steven Haigh

Email: netwiz@xxxxxxxxx
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: OpenPGP digital signature

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.