[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 18/30] xen/x86: setup PVHv2 Dom0 ACPI tables



On Thu, Oct 06, 2016 at 09:40:50AM -0600, Jan Beulich wrote:
> >>> On 27.09.16 at 17:57, <roger.pau@xxxxxxxxxx> wrote:
> > FWIW, I think that the current approach that I've used in order to craft the
> > MADT is not the best one, IMHO it would be better to place the MADT at the
> > end of the E820_ACPI region (expanding it's size one page), and modify the
> > XSDT/RSDT in order to point to it, that way we avoid shadowing any other
> > ACPI data that might be at the same page as the native MADT (and that needs
> > to be modified by Dom0).
> 
> I agree with the idea of placing MADT elsewhere, but I don't think you
> can rely on E820_ACPI to have room to grow into right after its end.

Right, I think picking some memory from a RAM region and marking it as 
E820_ACPI is the best approach.

> > @@ -50,6 +53,8 @@ static long __initdata dom0_max_nrpages = LONG_MAX;
> >  #define HVM_VM86_TSS_SIZE   128
> >  
> >  static unsigned int __initdata hvm_mem_stats[MAX_ORDER + 1];
> > +static unsigned int __initdata acpi_intr_overrrides = 0;
> > +static struct acpi_madt_interrupt_override __initdata *intsrcovr = NULL;
> 
> Pointless initializers.

Removed.

> > +static void __init acpi_zap_table_signature(char *name)
> > +{
> > +    struct acpi_table_header *table;
> > +    acpi_status status;
> > +    union {
> > +        char str[ACPI_NAME_SIZE];
> > +        uint32_t bits;
> > +    } signature;
> > +    char tmp;
> > +    int i;
> > +
> > +    status = acpi_get_table(name, 0, &table);
> > +    if ( ACPI_SUCCESS(status) )
> > +    {
> > +        memcpy(&signature.str[0], &table->signature[0], ACPI_NAME_SIZE);
> > +        for ( i = 0; i < ACPI_NAME_SIZE / 2; i++ )
> > +        {
> > +            tmp = signature.str[ACPI_NAME_SIZE - i - 1];
> > +            signature.str[ACPI_NAME_SIZE - i - 1] = signature.str[i];
> > +            signature.str[i] = tmp;
> > +        }
> > +        write_atomic((uint32_t*)&table->signature[0], signature.bits);
> > +    }
> > +}
> 
> Together with moving MADT elsewhere we should also move
> XSDT/RSDT, at which point we can simply avoid copying the
> pointers for tables we don't want Dom0 to see (and down the
> road we also have the option of adding tables).
> 
> The downside to both approaches is that this once again is a
> black-listing model, i.e. new structure types we may need to
> also suppress will remain visible to Dom0 until a patched
> hypervisor would become available.

Maybe we could do whitelisting instead? I can see that at least the 
following tables should be available to Dom0 if present, but TBH, it's hard 
to tell:

MADT, DSDT, FADT, SSDT, FACS, SBST, NFIT, MCFG, TCPA. Then ECDT, CPEP and 
RASF also seem fine to make available to Dom0, but I'm dubious.

But I agree that crafting a custom XSDT/RSDT is the best option here.
 
> > +                   pfn, pfn + nr_pages);
> > +            return rc;
> > +        }
> > +    }
> > +    /*
> > +     * Since most memory maps provided by hardware are wrong, make sure 
> > each
> > +     * top-level table is properly mapped into Dom0.
> > +     */
> 
> Please be more specific here - wrong in which way? Not all ACPI tables
> living in one of the two E820 type regions? But see also below.
> 
> In any event you need to be more careful here: Mapping ordinary RAM
> 1:1 into Dom0 can't end well. Also, once working with a cloned XSDT you
> won't need to cover tables here which have got hidden from Dom0.

I've found systems where some ACPI tables reside in memory holes or reserved 
regions. I don't think I've found a system where an ACPI table would reside 
in a RAM region, but I agree that it's worth adding a check here to make 
sure.

> > +    for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
> > +    {
> > +        pfn = PFN_DOWN(acpi_gbl_root_table_list.tables[i].address);
> > +        nr_pages = DIV_ROUND_UP(acpi_gbl_root_table_list.tables[i].length,
> > +                                PAGE_SIZE);
> > +        rc = modify_mmio_11(d, pfn, nr_pages, true);
> > +        if ( rc )
> > +        {
> > +            printk(
> > +                "Failed to map ACPI region %#lx - %#lx into Dom0 memory 
> > map\n",
> > +                   pfn, pfn + nr_pages);
> > +            return rc;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * Special treatment for memory < 1MB:
> > +     *  - Copy the data in e820 regions marked as RAM (BDA, EBDA...).
> 
> Copy? What if some of it needs to get modified to interact with
> firmware? I think you'd be better off mapping everything Xen
> doesn't use into Dom0, and only mapping fresh RAM pages
> over regions Xen does use (namely the trampoline).

Hm, that was my first approach (mapping the whole first MB into Dom0), but 
sadly it doesn't seem to work because FreeBSD at least places the AP boot 
trampoline there, and since APs are launched in 16b mode, the emulator 
cannot handle executing memory from MMIO regions and crashes the domain. 
That's why I had to map and copy data from RAM regions below 1MB. The BDA 
it's all static data AFAICT, and the EBDA should reside in a reserved 
region (or at least does on my systems).

Might it be possible to solve this by identity mapping the first 1MB, and 
marking the RAM regions there as p2m_ram_rw? Or would that create even 
further problems?

> > +     *  - Map any reserved regions as 1:1.
> 
> Why would reserved regions need such treatment only when below
> 1Mb?

The video RAM for the VGA display needs to be mapped into Dom0, and that's 
in the reserved region below 1MB (usually starting at 0xA0000).

I would like to avoid mapping all the reserved regions into Dom0 by default 
(and the IOMMU), but this might be needed on some systems as a workaround 
(specially those not providing or with wrong RMRR tables).

> > +    acpi_get_table_phys(ACPI_SIG_MADT, 0, &madt_addr[0], &size);
> > +    if ( !madt_addr[0] )
> > +    {
> > +        printk("Unable to find ACPI MADT table\n");
> > +        return -EINVAL;
> > +    }
> > +    if ( size > PAGE_SIZE )
> > +    {
> > +        printk("MADT table is bigger than PAGE_SIZE, aborting\n");
> > +        return -EINVAL;
> > +    }
> 
> This restriction for sure needs to go away.

Sure.

> > +    acpi_get_table_phys(ACPI_SIG_MADT, 2, &madt_addr[1], &size);
> 
> Why 2 (and not 1) when the first invocation used 0? But this is not
> going to be needed anyway with the alternative model.
> 
> > +    io_apic = (struct acpi_madt_io_apic *)(madt + 1);
> > +    io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
> > +    io_apic->header.length = sizeof(*io_apic);
> > +    io_apic->id = 1;
> > +    io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
> 
> I think you need to make as many IO-APICs available to Dom0 as
> there are hardware ones.

Right, I've been wondering about this, and then I need to expand the IO APIC 
emulation code so that Xen is able to emulate two IO-APICs.

Can I ask why do you think this is needed? If the number of pins in the 
multiple IO APIC case doesn't exceed 48 (which is what the emulated IO APIC 
provides), shouldn't this be enough?

> > +    if ( dom0_max_vcpus() > num_online_cpus() )
> > +    {
> > +        printk("CPU overcommit is not supported for Dom0\n");
> 
> ???

This is going away with the new MADT approach.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.