[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes



>>> On 04.10.13 at 15:35, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> On Fri, Oct 04, 2013 at 07:53:20AM +0100, Jan Beulich wrote:
>> >>> On 03.10.13 at 02:53, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote:
>> > On Fri, 27 Sep 2013 07:54:39 +0100
>> > "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
>> > 
>> >> >>> On 27.09.13 at 02:17, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
>> >> >>> wrote:
>> >> > On Thu, 26 Sep 2013 09:02:41 +0100 "Jan Beulich"
>> >> > <JBeulich@xxxxxxxx> wrote:
>> >> >> >>> On 25.09.13 at 23:03, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
>> >> >> >>> wrote:
>> >> >> > +/*
>> >> >> > + * Set the 1:1 map for all non-RAM regions for dom 0. Thus,
>> >> >> > dom0 will have
>> >> >> > + * the entire io region mapped in the EPT/NPT.
>> >> >> > + *
>> >> >> > + * PVH FIXME: The following doesn't map MMIO ranges when they
>> >> >> > sit above the
>> >> >> > + *            highest E820 covered address.
>> >> >> 
>> >> >> This absolutely needs fixing before this can go in.
>> >> > 
>> >> > Any suggestions on how to fix it? Mapping all the way to end could
>> >> > result in a huge hap table. 
>> >> 
>> >> You'll probably need a call down from Dom0 telling you where it
>> >> finds/puts MMIO resources. Or perhaps that could be mapped
>> >> in on demand from the EPT fault handler (since these regions
>> >> shouldn't be subject to DMA, and hence IOMMU faults shouldn't
>> >> occur - perhaps that's even a reason to not share page tables
>> >> at least in dom0-strict mode)?
>> > 
>> > Thinking about mapping in on demand from the EPT fault handler, how
>> > would I know if the access beyond last e820 entry is genuine and not 
>> > a faulty pte in a buggy guest? Could I consult the mmconfig table (?) 
>> > or the ACPI table in xen? Any pointers would be helpful... my 
>> > knowledge runs out quickly here.
>> 
>> You'd have to inspect all the BARs of the devices the domain owns.
>> Hence the thought of having Dom0 tell you about those resource
>> assignments.
> 
> Doesn't that happen via PHYSDEVOP_pci_device_add hypercalls?

That may (and I think does) happen before resource assignment.

>> > FWIW, at present pv-ops linux doesn't allow any mmio access beyond
>> > the last e820 entry. So, we'd need a fix there too. In my very orig
>> > patch, I was updating all IO mappings on demand by putting hook
>> > in linux native_pte_update if it was _PAGE_BIT_IOMAP. Another 
>> > possibility would be do that for any mappings above the last
>> > e820 entry. What do you think?
>> 
>> Special casing IOMAP page table creation might be an option, but
>> has the downside of allowing kernel bugs to propagate into Xen's
>> view of the world.
>> 
>> > For testing purposes, do you have reference for hardware? I don't see 
>> > any here with such configuration.
>> 
>> Nothing specific, but I know that SR-IOV virtual functions easily
>> cause kernels to run out of MMIO space below 4G (namely when
>> the hole is only around 1Gb or even less), and Intel must have
>> knowledge of graphics cards having so huge a frame buffer that
>> it can only be mapped above 4G.
> 
> Right, but the BIOS Writers Guide and docs all talk setting the MCFG
> up for that. Granted the MCFG (or was the ACPI spec?) says that the 
> MCFG regions do not have to be defined in the E820.

What do MCFG regions have to do with device MMIO ones?

> You pointed out also that the MCFG entries might come out from
> the ACPI DSDT. Which I think all comes back to dom0 parsing this and
> providing this sort of information back to the hypervisor?

For the MCFG, yes. But not for individual BARs of devices.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.