[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes



On Fri, Oct 04, 2013 at 03:05:57PM +0100, Jan Beulich wrote:
> >>> On 04.10.13 at 15:35, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 
> >>> wrote:
> > On Fri, Oct 04, 2013 at 07:53:20AM +0100, Jan Beulich wrote:
> >> >>> On 03.10.13 at 02:53, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote:
> >> > On Fri, 27 Sep 2013 07:54:39 +0100
> >> > "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
> >> > 
> >> >> >>> On 27.09.13 at 02:17, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
> >> >> >>> wrote:
> >> >> > On Thu, 26 Sep 2013 09:02:41 +0100 "Jan Beulich"
> >> >> > <JBeulich@xxxxxxxx> wrote:
> >> >> >> >>> On 25.09.13 at 23:03, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
> >> >> >> >>> wrote:
> >> >> >> > +/*
> >> >> >> > + * Set the 1:1 map for all non-RAM regions for dom 0. Thus,
> >> >> >> > dom0 will have
> >> >> >> > + * the entire io region mapped in the EPT/NPT.
> >> >> >> > + *
> >> >> >> > + * PVH FIXME: The following doesn't map MMIO ranges when they
> >> >> >> > sit above the
> >> >> >> > + *            highest E820 covered address.
> >> >> >> 
> >> >> >> This absolutely needs fixing before this can go in.
> >> >> > 
> >> >> > Any suggestions on how to fix it? Mapping all the way to end could
> >> >> > result in a huge hap table. 
> >> >> 
> >> >> You'll probably need a call down from Dom0 telling you where it
> >> >> finds/puts MMIO resources. Or perhaps that could be mapped
> >> >> in on demand from the EPT fault handler (since these regions
> >> >> shouldn't be subject to DMA, and hence IOMMU faults shouldn't
> >> >> occur - perhaps that's even a reason to not share page tables
> >> >> at least in dom0-strict mode)?
> >> > 
> >> > Thinking about mapping in on demand from the EPT fault handler, how
> >> > would I know if the access beyond last e820 entry is genuine and not 
> >> > a faulty pte in a buggy guest? Could I consult the mmconfig table (?) 
> >> > or the ACPI table in xen? Any pointers would be helpful... my 
> >> > knowledge runs out quickly here.
> >> 
> >> You'd have to inspect all the BARs of the devices the domain owns.
> >> Hence the thought of having Dom0 tell you about those resource
> >> assignments.
> > 
> > Doesn't that happen via PHYSDEVOP_pci_device_add hypercalls?
> 
> That may (and I think does) happen before resource assignment.
> 
> >> > FWIW, at present pv-ops linux doesn't allow any mmio access beyond
> >> > the last e820 entry. So, we'd need a fix there too. In my very orig
> >> > patch, I was updating all IO mappings on demand by putting hook
> >> > in linux native_pte_update if it was _PAGE_BIT_IOMAP. Another 
> >> > possibility would be do that for any mappings above the last
> >> > e820 entry. What do you think?
> >> 
> >> Special casing IOMAP page table creation might be an option, but
> >> has the downside of allowing kernel bugs to propagate into Xen's
> >> view of the world.
> >> 
> >> > For testing purposes, do you have reference for hardware? I don't see 
> >> > any here with such configuration.
> >> 
> >> Nothing specific, but I know that SR-IOV virtual functions easily
> >> cause kernels to run out of MMIO space below 4G (namely when
> >> the hole is only around 1Gb or even less), and Intel must have
> >> knowledge of graphics cards having so huge a frame buffer that
> >> it can only be mapped above 4G.
> > 
> > Right, but the BIOS Writers Guide and docs all talk setting the MCFG
> > up for that. Granted the MCFG (or was the ACPI spec?) says that the 
> > MCFG regions do not have to be defined in the E820.
> 
> What do MCFG regions have to do with device MMIO ones?

Actually - nothing at all. I somehow was under the impression that
MCFG and MMIO regions would be in the same memory area (as in
MCFG follows the end of MMIO region). But of course
nothing would be that simple.
> 
> > You pointed out also that the MCFG entries might come out from
> > the ACPI DSDT. Which I think all comes back to dom0 parsing this and
> > providing this sort of information back to the hypervisor?
> 
> For the MCFG, yes. But not for individual BARs of devices.

So back to hooking up a new hypercall in the PCI subsystem when
resource assigment has been completed? And also if the PCI subsystem
decides to re-write the resource addresses to odd locations.

Can't one also trap for the configuration changes on the PCI
devices and extract the physical locations then?

> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.