[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Effect in guest of a hard read error from the host's storage ?




George Dunlap <dunlapg@xxxxxxxxx> wrote:

>> On Sat, Apr 2, 2022 at 8:28 PM Simon Hobson <simon@xxxxxxxxxxxxxxxx> wrote:
>> Not having experienced it on any of my Xen systems I’ve managed in the past, 
>> I just wanted to check what happens when the host’s storage encounters an 
>> unrecoverable read error ?
>> My expectation would be that the error would be passed up the chain (host’s 
>> OS, virtual disk emulator, guest OS disk driver, guest OS filesystem, guest 
>> OS program), appear as an unrecoverable error to the guest, and hence passed 
>> up to the program attempting to read the file - so really nothing different 
>> to the same error on bare metal.
>> 
>> Has anyone encountered this, and can confirm if that’s how Xen+Linux handles 
>> it ?
>> 
> Juergen / Roger / Anthony -- any idea what would happen with either blkback 
> or qdisk on a hard read error?  (Or blktap, for that matter?)
>  
>> The reason for asking is that I also use Parallels on a Mac and have this 
>> situation. Cloning the host’s drive (Carbon Copy Cloner) “does the right 
>> thing” - CCC copies the files it can and reports the ones it can’t. A couple 
>> of the ones it can’t are virtual disk files for Parallels guests.
>> When I try to recover what I can (and any unreadable files might not be 
>> important anyway but I don’t know which they are) using the same technique 
>> in the guest, Parallels doesn’t behave like that. It pops up a dialog to say 
>> a critical error has occurred, with two options - retry (which of course 
>> doesn’t do anything useful), or to stop the guest. So the guest OS never 
>> gets to see the error, it just gets killed. Thus meaningful recovery is 
>> impossible as the guest gets stopped without being able to tell me what 
>> file(s) is affected.
>> I’ve been “discussing” this with Parallels support and at the moment I’ve 
>> reached the point where they’re telling me that this is normal when things 
>> are virtualised - which is setting off the BS detector for me.
>> 
> I think it probably depends on a lot of factors; it wouldn't surprise me if 
> more consumer-grade virtual machine software, like Parallels or VirtualBox 
> don't do the hard work of writing and testing those sorts of paths, while 
> more enterprise-grade software might.  That said, I don't think I've ever 
> heard anyone ask the question of Xen, so it's possible that it simply hasn't 
> been considered.
> 
> Another thing to consider is that the ability to pass the error on to the 
> guest in a system like Parallels depends not only on the correctness of its 
> own block datapath in the presence of errors, but also the correctness of the 
> surrounding operating system.  If the host OS doesn't handle those kinds of 
> errors gracefully in a reliable fashion, there's nothing Parallels can really 
> do to make up for it.

Yes, I imagine it’s “quite complicated” to catch all the possible exceptions.
But given that they do some versions which are definitely not “consumer 
orientated” and they have been doing it for a long time* I’d have expected 
better.

* I vaguely recall having been using it since an architecture change (dropping 
of a different endian emulation ?) between PowerPC 4 and PowerPC 5 stopped the 
old Virtual-PC emulator working. Something like 20 years ago.



Juergen Gross <jgross@xxxxxxxx> wrote:

> Any I/O error for a request issued via the PV block device protocol will be
> handed back to the frontend via a generic error "BLKIF_RSP_ERROR", which is
> handed back to the block layer as I/O error on the guest side.

Thanks, that sounds like exactly what I’d expect to happen.



Anthony PERARD <anthony.perard@xxxxxxxxxx> wrote:

> It seems to be the same when the backend is "qdisk", QEMU sets the
> request status to BLKIF_RSP_ERROR on I/O error.




Thanks all for taking the time to reply - and apologies for the delay in 
responding.

I had a third remote session with a support agent today (Indian ? call centre - 
nothing against them personally, but the old saying about “pay peanuts, get 
monkeys” comes to mind) - and as I’ve come to expect he’d not read all the 
notes, I had to explain yet again what the problem is, tactfully explain that 
"no none of this is inherent in virtualisation and it’s only Parallels 
implementation that makes it fail in this way”. Today I had the “pleasure” of 
demonstrating that no I can’t just copy the whole virtual disk file (it fails 
when it hits the unreadable block) and no I can’t just copy files using Finder 
within the guest (triggers Parallels to pop up the same critical warning error 
and only give options to retry, as if that’s going to do any different, or to 
kill the guest).
It will be interesting to see whether my request for escalation to a higher 
level gets noticed.
Almost as much fun as dealing with the hell desk when I have a problem with the 
Windows laptop I have to use for ${day_job} !


Thanks, Simon




  • Follow-Ups:

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.