[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.



> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> Behalf Of Keir Fraser
> Sent: Friday, January 23, 2009 10:42 AM
>
> Ah, I know what it is! We actually free up bits of the Xen image at the end
> of Xen bootstrap, and these can now be allocated to a domain (e.g., dom0)
> and DMAed to. But these will be contained within the bounds of __pa(&_start)
> and __pa(&_end) and hence will not have been mapped in dom0'd vtd tables.
>
> Sadly the fact is that Xen relies on validity of memory from the domain heap
> as well as Xen heap anyway, so the avoidance of mapping Xen-critical memory
> in dom0 vtd tables is inadequate anyway, even on x86_32 and ia64.
>
> Also it's going to be hard to do better while keeping efficiency since if
> you only map dom0's pages in its vtd tables then PV backend drivers will not
> work (which rely on DMAing to/from other domain's pages via grant
> references). You'd have to dynamically map/unmap as grants get
> mapped/unmapped, and you may not want the performance hit of that.
>
> I'd personally vote for getting rid of xen_in_range(). Alternatively we
> could have it merely check for is_kernel_text(), but really I think since it
> is not in any way full protection from dom0 I wonder if it is worth the
> bother at all.
>
> What do you think?
>
>  -- Keir

Since this is somewhat similar to the issue I'm facing with the TXT patch, it 
does seem useful to have a good way of knowing where all of the hypervisor 
memory is.

I looked at is_kernel_text() and that only compares against _stext/_etext, 
which after looking at the xen.lds file, is really just some of the code of the 
hypervisor.  Is there any reason not to use [_stext, __init_begin) + 
[__per_cpu_start, __per_cpu_end] + [__bss_start, _end] + 
[bootsym_phys(trampoline_start), bootsym_phys(trampoline_end)] as a first 
approximation of hypervisor memory (I'm assuming that the code within 
[__init_begin, __init_end] is what you reclaim)?

While this still doesn't get the xen heap or domain heap, it at least gets us a 
little farther.

For the MAC aspect of the TXT patch, we need to know all of the code + data 
that could be used during resume and before the xen code that MACs everything 
else.  This includes the stack, page tables, etc.

We've also added a fn that checks the ACPI Sx addresses against xen memory 
(hypervisor + domain) to ensure that tboot can't be tricked into overwriting 
xen as part of S3.  This should be a more comprehensive check than for MAC, 
since there is no way of detecting if we missed some range.

Joe

>
> On 23/01/2009 17:30, "Kay, Allen M" <allen.m.kay@xxxxxxxxx> wrote:
>
> > I have not figured out why this is the problem yet but I know comment it out
> > makes the problem go away.  Leaving tboot_in_range() in does not cause this
> > problem.
> >
> > Allen
> >
> > -----Original Message-----
> > From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> > Sent: Friday, January 23, 2009 12:34 AM
> > To: Kay, Allen M; Li, Xin; Li, Haicheng; 'xen-devel@xxxxxxxxxxxxxxxxxxx'
> > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or
> > Dom0 kernel panic.
> >
> > Are you sure that is the problem? The xen_in_range() change should make the
> > dom0 VT-d table more permissive, and hence if anything less likely to
> > experience VT-d faults. Also it wouldn't seem to explain problems for HVM
> > guest passthrough.
> >
> >  -- Keir
> >
> > On 23/01/2009 01:01, "Kay, Allen M" <allen.m.kay@xxxxxxxxx> wrote:
> >
> >> Looks like the problem is caused by xen_in_range() call in
> >> vtd/iommu.c/intel_iommu_domain_init().  Definition of xen_in_range() was
> >> changed as part of the heap patch.
> >>
> >> I'm looking into change intel_iommu_domain_init() to just map pages in
> >> dom0->page_list.  However this looks to be more complicated as d->page_list
> >> is
> >> not initialized at this stage of the boot yet.
> >>
> >> Allen
> >>
> >> -----Original Message-----
> >> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Keir Fraser
> >> Sent: Thursday, January 22, 2009 1:23 AM
> >> To: Li, Xin; Li, Haicheng; 'xen-devel@xxxxxxxxxxxxxxxxxxx'
> >> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or
> >> Dom0 kernel panic.
> >>
> >> Mmm well not really. :-)
> >>
> >> Is there any assumption in the VT-d setup about preventing access to the 
> >> Xen
> >> heap, and could that be broken?
> >>
> >> Perhaps the VT-d pagetables are broken causing bad DMAs leading to data
> >> corruption and bad command packets?
> >>
> >>  -- Keir
> >>
> >> On 22/01/2009 08:58, "Li, Xin" <xin.li@xxxxxxxxx> wrote:
> >>
> >>> We are looking into the issue too. If you have any idea on how it's 
> >>> caused,
> >>> please tell us :-)
> >>> Thanks!
> >>> -Xin
> >>>
> >>>> -----Original Message-----
> >>>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >>>> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Keir Fraser
> >>>> Sent: Thursday, January 22, 2009 3:40 PM
> >>>> To: Li, Haicheng; 'xen-devel@xxxxxxxxxxxxxxxxxxx'
> >>>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption 
> >>>> or
> >>>> Dom0
> >>>> kernel panic.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> I haven't seen any problems outside of VT-d since c/s 19057, btw.
> >>>>
> >>>> -- Keir
> >>>>
> >>>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@xxxxxxxxx> wrote:
> >>>>
> >>>>> All,
> >>>>>
> >>>>> We met several system failures on different hardware platforms, which 
> >>>>> are
> >>>>> all
> >>>>> caused by VT-d fault.
> >>>>> err 1: disk is corrupted by VT-d fault on SATA.
> >>>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on 
> >>>>> UHCI.
> >>>>> err 3, Dom0 complains disk errors while creating HVM guests.
> >>>>>
> >>>>> The culprit would be changeset 19054 "x86_64: Remove
> >>>>> statically-partitioned
> >>>>> Xen heap.".
> >>>>>
> >>>>> Detailed error logs can be found via BZ#,
> >>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409.
> >>>>>
> >>>>>
> >>>>> -haicheng
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>>> http://lists.xensource.com/xen-devel
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>> http://lists.xensource.com/xen-devel
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.