[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] RE: Question on XenVbd_HwScsiResetBus in PV driver
> > Hi: > > Well, AFAK, There is a KeSwapProcessOrStack thread in Windonws kernel to > swap in/out thread > kernel Stack, and it is possible to cause BOSD code 0x77/0x7E, Which means the > IO page requestion can > not be complete successfully due to disk fail. This is reproduceable by > periodically "gdb attach tapdsik" > process in dom0, to simulate IO large response, larger than 10s. > > In fact, the IO stream from tapdisk is written to our own storage > cluster, and it supports > failover, but it takes time, so it means, when failover, the IO is hang from > VM side. When this > happen, we confront some bluescreens. > > Also I've done some experiments, test two scenerios, > 1) use current XenVbd_HwScsiResetBus, that is complete IO with > SRB_STATUS_BUS_RESET > 2) do nothing in XenVbd_HwScsiResetBus > Just use gdb tapdisk to hold IO periodically, it shows that 1) makes > higher possibilty blue > screen than 2)(in fact, we have'nt met bluescreen in 2)). > > Form the log, I see XenVbd_HwScsiResetBus every 14seconds( 10 Seconds + > 4S hold time) > in scenerio 1), but in 2) I just saw a fem of them(less than 10), It looks > like the driver call resetbus > on a few of times. > > So, I have below assumptions or questions: > 1) Only some of the IO failure will cause BOSD > 2) Do nothing in XenVbd_HwScsiResetBus is relatively good to minimize > the bluescreen posibity > 3) Well, I still confuse how is XenVbd_HwScsiResetBus called, and why > XenVbd_HwScsiResetBus not > called if nothing to be done in XenVbd_HwScsiResetBus. > 4) Is it ok do nothing in XenVbd_HwScsiResetBus? > > Could you help to clarify? Many thanks. > When Windows calls a drivers HwScsiResetBus the driver is supposed to perform the procedure described at http://msdn.microsoft.com/en-us/library/ff565331%28v=vs.85%29.aspx which is basically to cancel the IO and return all SRB's with a status of SRB_STATUS_BUS_RESET. It occurs to me that completing the requests while Dom0 still owns the buffers really is the wrong thing to do. Windows might reuse the buffers for something else while Dom0 might still write to them, which may well cause the crash you are seeing. There really is no mechanism for a DomU to indicate a reset to Dom0, so all we are doing is emulating it. We still have to wait for any outstanding requests. The only alternative would be to close and re-open the device, but that can't be done from within scsiport. I'll see what needs to be done to fix this bug. James _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |