[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DRAFT RFC] PVHv2 interaction with physical devices



On 10/11/16 10:54, Roger Pau Monné wrote:
> On Wed, Nov 09, 2016 at 06:51:49PM +0000, Andrew Cooper wrote:
>> On 09/11/16 15:59, Roger Pau Monné wrote:
>>> Low 1MB
>>> -------
>>>
>>> When booted with a legacy BIOS, the low 1MB contains firmware related data
>>> that should be identity mapped to the Dom0. This include the EBDA, video
>>> memory and possibly ROMs. All non RAM regions below 1MB will be identity
>>> mapped to the Dom0 so that it can access this data freely.
>> Are you proposing a unilateral identity map of the first 1MB, or just
>> the interesting regions?
> The current approach identity maps the first 1MB except for RAM regions, 
> that are populated in the p2m, and the data in the original pages is copied 
> over. This is done because the AP boot trampoline is placed in the RAM 
> regions below 1MB, and the emulator is not able to execute code from pages 
> marked as p2m_mmio_direct.
>  
>> One thing to remember is the iBVT, for iscsi boot, which lives in
>> regular RAM and needs searching for.
> And I guess this is not static data that just needs to be read by the OS? 
> Then I will have to look into fixing the emulator to deal with 
> p2m_mmio_direct regions.

It lives in plain RAM, but is static iirc.  It should just need copying
into dom0's view.

>
>>> ACPI regions
>>> ------------
>>>
>>> ACPI regions will be identity mapped to the Dom0, this implies regions with
>>> type 3 and 4 in the e820 memory map. Also, since some BIOS report incorrect
>>> memory maps, the top-level tables discovered by Xen (as listed in the
>>> {X/R}SDT) that are not on RAM regions will be mapped to Dom0.
>>>
>>> PCI memory BARs
>>> ---------------
>>>
>>> PCI devices discovered by Xen will have it's BARs scanned in order to detect
>>> memory BARs, and those will be identity mapped to Dom0. Since BARs can be
>>> freely moved by the Dom0 OS by writing to the appropriate PCI config space
>>> register, Xen must trap those accesses and unmap the previous region and
>>> map the new one as set by Dom0.
>>>
>>> Limitations
>>> -----------
>>>
>>>  - Xen needs to be aware of any PCI device before Dom0 tries to interact 
>>> with
>>>    it, so that the MMIO regions are properly mapped.
>>>
>>> Interrupt management
>>> ====================
>>>
>>> Overview
>>> --------
>>>
>>> On x86 systems there are tree different mechanisms that can be used in order
>>> to deliver interrupts: IO APIC, MSI and MSI-X. Note that each device might
>>> support different methods, but those are never active at the same time.
>>>
>>> Legacy PCI interrupts
>>> ---------------------
>>>
>>> The only way to deliver legacy PCI interrupts to PVHv2 guests is using the
>>> IO APIC, PVHv2 domains don't have an emulated PIC. As a consequence the ACPI
>>> _PIC method must be set to APIC mode by the Dom0 OS.
>>>
>>> Xen will always provide a single IO APIC, that will match the number of
>>> possible GSIs of the underlying hardware. This is possible because ACPI
>>> uses a system cookie in order to name interrupts, so the IO APIC device ID
>>> or pin number is not used in _PTR methods.
>>>
>>> XXX: is it possible to have more than 256 GSIs?
>> Yes.  There is no restriction on the number of IO-APIC in a system, and
>> no restriction on the number of PCI bridges these IO-APICs serve.
>>
>> However, I would suggest it would be better to offer one a 1-to-1 view
>> of system IO-APICs to vIO-APICs in PVHv2 dom0, or the pin mappings are
>> going to get confused when reading the ACPI tables.
> Hm, I've been searching for this, but it seems to me that ACPI tables will 
> always use GSIs in APIC mode in order to describe interrupts, so it doesn't 
> seem to matter whether those GSIs are scattered across multiple IO APICs or 
> just a single one.

I will not be surprised if this plan turns out to cause problems.

Perhaps we can start out with just a single vIOAPIC and see if that
works in reality.

>
>>> The following registers reside in memory, and are pointed out by the Table 
>>> and
>>> PBA fields found in the PCI configuration space:
>>>
>>>  - Message address and data: writes and reads to those registers are trapped
>>>    by Xen, and the value is stored into an internal structure. This is later
>>>    used by Xen in order to configure the interrupt injected to the guest.
>>>    Writes to those registers with MSI-X already enabled will not cause a
>>>    reconfiguration of the interrupt.
>>>
>>>  - Vector control: writes and reads are trapped, clearing the mask bit (bit 
>>> 0)
>>>    will cause Xen to setup the configured interrupt if MSI-X is globally
>>>    enabled in the message control field.
>>>
>>>  - Pending bits array: writes and reads to this register are not trapped by
>>>    Xen.
>>>
>>> Limitations
>>> -----------
>>>
>>>  - Due to the fact that Dom0 is not able to parse dynamic ACPI tables,
>>>    some UART devices might only function in polling mode, because Xen
>>>    will be unable to properly configure the interrupt pins without Dom0
>>>    collaboration, and the UART in use by Xen should be explicitly 
>>> blacklisted
>>>    from Dom0 access.
>> This reminds me that we need to include some HPET quirks in Xen as well.
>>
>> There is an entire range of Nehalem era machines where Linux finds an
>> HPET in the IOH via quirks alone, and not via the ACPI tables, and
>> nothing in Xen currently knows to disallow this access.
> Hm, if it's using quirks it's going to be hard to prevent this. At worse 
> Linux is going to discover that the HPET is non-functional at least I 
> assume?

It is a PCI quirk on the southbridge to know how to find the system HPET
even though it isn't described in any ACPI tables.

As Xen doesn't know how to find this HPET and deny dom0 access to it,
dom0 finds it, disables legacy broadcast mode and reconfigures
interrupts behind Xen's back.  It also causes an hang during kexec
because the new kernel can't complete its timer calibration.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.