[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: [PATCH] Fix xen hang on intel westmere-EP



>>> On 24.08.11 at 04:16, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx] 
>> Sent: Tuesday, August 23, 2011 4:02 PM
>> >> Also, are you certain this problem exists only with this single ICH 
> variant?
>> > So far, we only observed it on ICH10 based chipset. Did you see
>> > another platform have the problems?
>> 
>> No, I haven't observed it on other platforms but
>> - the problematic bits exist in earlier ICH versions too, and
>> - there are ICH10 based systems that aren't showing any such bad
>>   behavior.
> 
> It depends on the BIOS. Some already disable the legacy USB emulation and 
> obviously you cannot see this problem on those platforms.

With the help of the debug printk()-s in the patch I was able to see that
on at least one unaffected platform, at least one of the USB emulation
bits was set nevertheless.

>> 
>> The keys I'm currently using (successfully tested on customer's and my 
> affected
>> machines) are BIOS and board vendor being "Intel". Only on those I'm then
>> looking for the ICH10 (and then all known variants). See below for the patch
>> (still containing some debugging bits).
>> 
>> > Actually, the best way to solve it is to enable the ACPI mode in Xen
>> > instead of in dom0. For enable ACPI, we need to write the value from
>> > FADT.ACPI_ENABLE to SMI_CMD. After writing the value, the SMI
>> > ownership will be disable by ACPI hardware and it also will disable some 
> logic
>> which is able to cause SMI.
>> > For example, the legacy USB circuit will be masked too. Because at
>> > this point, there have no need to use legacy usb emulation. This is
>> > also what linux upstream did. But I think it is too complicated to
>> > port this logic to xen. Anyway, if you have interesting, you can add
>> > this logic to xen and there have no need for this patch again.
>> 
>> Was this a pretty recent change in Linux? Otherwise, how do you explain 
> that,
>> with apparently lower probability, I'm observing the very same hang with 
> native
>> Linux? My assumption is that ACPI mode enabling is happening just too late 
> for
>> masking this problem.
> 
> Our bios team and me also had a try with native linux. But we cannot 
> reproduce it. So this maybe another potential issue which different from what 
> we are discussing now.

Pretty unlikely. Remember that for months you (as a company) claimed
not to be able to reproduce the issue on Xen. As it's even more
sporadic on native Linux, I'm not really surprised there's a problem
reproducing it there.

>> --- a/xen/arch/x86/dmi_scan.c
>> +++ b/xen/arch/x86/dmi_scan.c
>> @@ -10,6 +10,8 @@
>>  #include <asm/system.h>
>>  #include <xen/dmi.h>
>>  #include <xen/efi.h>
>> +#include <xen/pci.h>
>> +#include <xen/pci_regs.h>
>> 
>>  #define bt_ioremap(b,l)  ((void *)__acpi_map_table(b,l))  #define
>> bt_iounmap(b,l)  ((void)0) @@ -278,6 +280,31 @@ static __init int
>> broken_toshiba_keyboar
>>      return 0;
>>  }
>> 
>> +static int __init ich10_bios_quirk(struct dmi_system_id *d) {
>> +    u32 port, smictl;
>> +
>> +    if ( pci_conf_read16(0, 0x1f, 0, PCI_VENDOR_ID) != 0x8086 )
>> +        return 0;
>> +
>> +    switch ( pci_conf_read16(0, 0x1f, 0, PCI_DEVICE_ID) ) {
>> +    case 0x3a14:
>> +    case 0x3a16:
>> +    case 0x3a18:
>> +    case 0x3a1a:
>> +printk("ACPI base=%04x\n", port = pci_conf_read16(0, 0x1f, 0, 0x40));//temp
>> +        port = (port & 0xff80) + 0x30;
>> +        smictl = inl(port);
>> +printk("smictl=%08x\n", smictl);//temp
>> +        /* turn off LEGACY_USB{,2}_EN if enabled */
>> +        if ( smictl & 0x20008 )
>> +            outl(smictl & ~0x20008, port); printk("smictl:%08x\n",
>> +inl(port));//temp
>> +        break;
>> +    }
>> +
>> +    return 0;
>> +}
>> 
>>  #ifdef CONFIG_ACPI_SLEEP
>>  static __init int reset_videomode_after_s3(struct dmi_blacklist *d) @@
>> -342,6 +369,18 @@ static __initdata struct dmi_blacklist d
>>                      } },
>>  #endif
>> 
>> +    { ich10_bios_quirk, "Intel board & BIOS",
>> +            /*
>> +             * BIOS leaves legacy USB emulation enabled while
>> +             * SMM can't properly handle it.
>> +             */
>> +            {
>> +                    MATCH(DMI_BOARD_VENDOR, "Intel Corp"),
>> +                    MATCH(DMI_BIOS_VENDOR, "Intel Corp"),
>> +                    NO_MATCH, NO_MATCH
>> +            }
>> +    },
>> +
>>  #ifdef      CONFIG_ACPI_BOOT
>>      /*
>>       * If your system is blacklisted here, but you find that acpi=force
> 
> This patch is better. But do we really need to disable it on other ICH? If 
> you do seen this issue with other ICH, then you can do it. Otherwise, there 
> may cause some potential issue. Have you test your patch on those platform?

Obviously not, given the very limited set of systems on which we
observe the problem. But I really just listed the various ICH10 PCI IDs
- are you saying that the problematic BIOSes is reasonably certain to
have got used with only one of them? After all, not affecting other
platforms is what is being aimed at with the DMI match strings.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.