Xen project Mailing List

Re: [Xen-devel] [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Fri, 4 Oct 2013 12:02:53 -0400

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, keir.xen@xxxxxxxxx

Delivery-date: Fri, 04 Oct 2013 16:03:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Oct 04, 2013 at 03:05:57PM +0100, Jan Beulich wrote: > >>> On 04.10.13 at 15:35, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > >>> wrote: > > On Fri, Oct 04, 2013 at 07:53:20AM +0100, Jan Beulich wrote: > >> >>> On 03.10.13 at 02:53, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote: > >> > On Fri, 27 Sep 2013 07:54:39 +0100 > >> > "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >> > > >> >> >>> On 27.09.13 at 02:17, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> > >> >> >>> wrote: > >> >> > On Thu, 26 Sep 2013 09:02:41 +0100 "Jan Beulich" > >> >> > <JBeulich@xxxxxxxx> wrote: > >> >> >> >>> On 25.09.13 at 23:03, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> > >> >> >> >>> wrote: > >> >> >> > +/* > >> >> >> > + * Set the 1:1 map for all non-RAM regions for dom 0. Thus, > >> >> >> > dom0 will have > >> >> >> > + * the entire io region mapped in the EPT/NPT. > >> >> >> > + * > >> >> >> > + * PVH FIXME: The following doesn't map MMIO ranges when they > >> >> >> > sit above the > >> >> >> > + * highest E820 covered address. > >> >> >> > >> >> >> This absolutely needs fixing before this can go in. > >> >> > > >> >> > Any suggestions on how to fix it? Mapping all the way to end could > >> >> > result in a huge hap table. > >> >> > >> >> You'll probably need a call down from Dom0 telling you where it > >> >> finds/puts MMIO resources. Or perhaps that could be mapped > >> >> in on demand from the EPT fault handler (since these regions > >> >> shouldn't be subject to DMA, and hence IOMMU faults shouldn't > >> >> occur - perhaps that's even a reason to not share page tables > >> >> at least in dom0-strict mode)? > >> > > >> > Thinking about mapping in on demand from the EPT fault handler, how > >> > would I know if the access beyond last e820 entry is genuine and not > >> > a faulty pte in a buggy guest? Could I consult the mmconfig table (?) > >> > or the ACPI table in xen? Any pointers would be helpful... my > >> > knowledge runs out quickly here. > >> > >> You'd have to inspect all the BARs of the devices the domain owns. > >> Hence the thought of having Dom0 tell you about those resource > >> assignments. > > > > Doesn't that happen via PHYSDEVOP_pci_device_add hypercalls? > > That may (and I think does) happen before resource assignment. > > >> > FWIW, at present pv-ops linux doesn't allow any mmio access beyond > >> > the last e820 entry. So, we'd need a fix there too. In my very orig > >> > patch, I was updating all IO mappings on demand by putting hook > >> > in linux native_pte_update if it was _PAGE_BIT_IOMAP. Another > >> > possibility would be do that for any mappings above the last > >> > e820 entry. What do you think? > >> > >> Special casing IOMAP page table creation might be an option, but > >> has the downside of allowing kernel bugs to propagate into Xen's > >> view of the world. > >> > >> > For testing purposes, do you have reference for hardware? I don't see > >> > any here with such configuration. > >> > >> Nothing specific, but I know that SR-IOV virtual functions easily > >> cause kernels to run out of MMIO space below 4G (namely when > >> the hole is only around 1Gb or even less), and Intel must have > >> knowledge of graphics cards having so huge a frame buffer that > >> it can only be mapped above 4G. > > > > Right, but the BIOS Writers Guide and docs all talk setting the MCFG > > up for that. Granted the MCFG (or was the ACPI spec?) says that the > > MCFG regions do not have to be defined in the E820. > > What do MCFG regions have to do with device MMIO ones? Actually - nothing at all. I somehow was under the impression that MCFG and MMIO regions would be in the same memory area (as in MCFG follows the end of MMIO region). But of course nothing would be that simple. > > > You pointed out also that the MCFG entries might come out from > > the ACPI DSDT. Which I think all comes back to dom0 parsing this and > > providing this sort of information back to the hypervisor? > > For the MCFG, yes. But not for individual BARs of devices. So back to hooking up a new hypercall in the PCI subsystem when resource assigment has been completed? And also if the PCI subsystem decides to re-write the resource addresses to odd locations. Can't one also trap for the configuration changes on the PCI devices and extract the physical locations then? > > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.