[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?

Yuji Shimada <mailto:shimada-yxb@xxxxxxxxxxxxxxx> wrote:
> On Fri, 26 Sep 2008 12:36:21 +0800
> "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:

I changed the subject to reflect what's discussed.

> We have to solve many difficulties to keep guest domain running.
> How about following idea for first step?

Yes, agree.

>    Non-fatal error on I/O device:
>        - kill the domain with error source function.
>        - reset the function.

>From following staement in PCI-E 2.0 section 6.6.2: "Note that Port state 
>machines associated with Link functionality including those in the Physical 
>and Data Link Layers are not reset by FLR", I'm not sure if FLR is a right 
>method to handle the error situation. That's the reason I asked on how to 
>handle multiple-function devices.

>    Non-fatal error on PCI-PCI bridge.
>        - kill all domains with the functions under the PCI-PCI bridge.
>        - reset PCI-PCI bridge and secondary bus.
>    Fatal error:
>        - kill all domains with the functions under the same root port.
>        - reset the link (secondary bus reset on root port).

Agree. Basically I think the action of "reset PCI-PCI bridge and secondary bus" 
or "reset the link" has been done by AER core already. What we need define is 
PCI back's error handler.  In first step, the error handler will trigger domain 
reset, in future, more elegant action can be defined/implemented, Any idea?

> Note: we have to consider to prevent device from destroying other domain's
> memory.

Why should we consider destroy other domain's memory? I think VT-d should 
gurantee this.

>>>>> in guest side is required in the long term, because guest OS will be
>>>>> able to handle AER and recover error condition.
>>>> Yes, agree that if guest can do AER, it will enahnce reliability and
>>>> availability. But more elegant design is needed. For example, if
>>>> guest decide that the AER need root port reset link (switch link
>>>> reset should be ok unless SR-IOV), what shall host do? If host act
>>>> according to guest's suggestion, that may not be safe, I suspect.
>>> I agree with you.  Host should NOT act according to guest's
>>> suggestion. I think host should recover error condition with dom0
>>> linux's AER driver. AER emulation for guest is needed to make guest
>>> survive.
>> Have you considered implement just a virtual root port in qemu, not
>> the whoel RC? Not sure if any effort/function difference between
>> these two method.
> I think it is good to implement just a virtual root port in qemu. The
> reasons are followings.
> - OS does not control chipset-specific device in RC much. We don't
>  need to provide chipset-specific device to guest OS.
> - Firmware control chipset-specific device in RC. But guest firmware
>  is xen-specific. We don't need to provide chipset-specific device to
> guest firmware.
> Note: for first step, we don't need to implement virtual root port in
> qemu, because we kill guest domain.


> Thanks,
> --
> Yuji Shimada

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.