[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: [Xci-devel] Porblem with disabling and then re-enabling a PT device in Windows



Tom Rotenberg wrote:
> How do i know if i'm using 'ack_type_new', what does it mean?
> Do u have any idea, on how i can check inside domU windows XP (using
> WinDBG of-course) if the virtual local APIC/IOAPIC has EOI the
> interrupt? 

I didn't try windbg to check local/io apic before, although I suppose you can 
do that since that is simply MMIO access.
Also you can add a hotkey (i.e. the key pressed after the 3 "ctrl+a") to xen 
hypervisor to dump guest's virtual local apic/ioapic context.

--jyh

> 
> It happens every time... it's 100% reproduceable on that Dell machine.
> 
> On Thu, Nov 26, 2009 at 3:40 AM, Jiang, Yunhong
> <yunhong.jiang@xxxxxxxxx> wrote:
>> 
>> 
>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote:
>>> After digging more into this problem, i found out that the problem
>>> is because the interrupt generated on the wlan device, isn't being
>>> transfered to the domain, for some reason, after the device was
>>> re-enablked in Windows. I saw that, by connecting to the xen
>>> console, and then clicking 'i', and i got the following lines: ...
>>> (XEN)    Vec192 IRQ 17: type=IO-APIC-level   status=00000010
>>> in-flight=1 domain-list=0: 17(----),3: 17(---M),
>>> ...
>>> (XEN)       Apic 0x00, Pin 17: vector=192, delivery_mode=1,
>>> dest_mode=logical, delivery_status=1, polarity=1, irr=1,
>>> trigger=level, mask=0 .... 
>>> 
>>> You can see, that the interrupt 17, which is in my Windows domU, was
>>> generated, but still weren't injected to the CPU (the 'irr' is 1).
>>> So, i guess that this is what is causing the problem.
>>> Now, the only issue left, is why the hell, the interrupt isn't being
>>> injected to the domain?
>> 
>> I assume you are using ack_type_new on your system, am I right?
>> Usually it means guest has not EOI the interrupt, so that
> host has no chance to EOI the physical IOAPIC. Can you check
> the virtual Local APIC/IOAPIC for the guest to see if we have
> any finding?
>> BTW, does it happen everytime?
>> 
>> --jyh
>> 
>>> 
>>> Has anyone has any idea about it?
>>> 
>>> On Wed, Nov 25, 2009 at 6:31 PM, Tom Rotenberg
>>> <tom.rotenberg@xxxxxxxxx> wrote:
>>>> Well, i just performed some tests, and it doesn't look like the
>>>> disable_msi/enable_msi functions in pciback are being called at all
>>>> (moreover, not in the disable-enable from domU Windows XP), so i
>>>> don't think it's related. Also, since when, a config space write
>>>> from a guest domU triggers code in the pciback?
>>>> 
>>>> I think that it's not the problem here...
>>>> Maybe someone from the XCI can shed some light here, and tell us
>>>> how they solve it (or not)? since their code should run on the
>>>> same Dell machines, no? 
>>>> 
>>>> On Wed, Nov 25, 2009 at 5:13 PM, Kamala Narasimhan
>>>> <Kamala.Narasimhan@xxxxxxxxxx> wrote:
>>>>> I shouldn't have suggested that you build without pciback;
>>> I got carried away trying to make it simple for you :-);
>>> Obviously you would need it and I should have stopped with
>>> suggesting that you tweak it.
>>>>> 
>>>>> Here is the thought process that led to my suggestion -
>>>>> 
>>>>> Clearly, that bit is getting changed as indicated in your
>>> log.  It is unlikely that the guest is triggering that change
>>> which makes pciback a potential candidate to suspect as it
>>> does change pci configuration space bits.  I need to add some
>>> tracing and look at the path of execution to answer some of
>>> your specific questions accurately and I won't be able to do
>>> that right now but I can give some context to help you based
>>> on what I have experienced in comparable situation and based
>>> on that I would say pciback is one place to suspect.  To be a
>>> bit more specific I would say look into
>>> pciback_enable_msi/pciback_disable_msi code, add some tracing
>>> there, observe whether or not that code path is taken when the
>>> device is disabled/reenabled within guest etc.  To reiterate,
>>> these are mere suggestions but looks plausible based on prior
>>> observations.
>>>>> 
>>>>> Kamala
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Tom Rotenberg [mailto:tom.rotenberg@xxxxxxxxx]
>>>>>> Sent: Wednesday, November 25, 2009 9:22 AM
>>>>>> To: Kamala Narasimhan
>>>>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; xci-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> Subject: Re: [Xci-devel] Porblem with disabling and then
>>>>>> re-enabling a PT device in Windows
>>>>>> 
>>>>>> I am not sure i undertand how to test it...
>>>>>> 1) Avoid doing FLR for the device - isn';t that done only when
>>>>>> building the domain? does that happen when i disable the device
>>>>>> in domU? 2) Don't build pciback - and then, i won't bind the wlan
>>>>>> device to pciback? and change the xend scripts which check for
>>>>>> it? 3) Comment out the relevant code - which code??
>>>>>> 
>>>>>> I also don't understand, how could it be that the pciback device
>>>>>> is "messing" with it? isn't it supposed to be in-active when the
>>>>>> device is being used in PT? 
>>>>>> 
>>>>>> Tom
>>>>>> 
>>>>>> On Wed, Nov 25, 2009 at 4:12 PM, Kamala Narasimhan
>>>>>> <Kamala.Narasimhan@xxxxxxxxxx> wrote:
>>>>>>> There is a chance pciback is changing the bit you are referring
>>>>>> to.  To confirm that, just for testing purpose you might want to
>>>>>> avoid FLR for that device or simply not build pciback or comment
>>>>>> out relevant code in that module whichever is easier and see if
>>>>>> that helps.  If it does, you can then look into fixing the
>>>>>> problem the right way.
>>>>>>> 
>>>>>>> Kamala
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: xci-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xci-devel-
>>>>>>>> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Tom Rotenberg
>>>>>>>> Sent: Wednesday, November 25, 2009 8:09 AM
>>>>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx;
>>>>>>>> xci-devel@xxxxxxxxxxxxxxxxxxx Subject: [Xci-devel] Porblem
>>>>>>>> with disabling and then re-enabling a PT device in Windows 
>>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> (This is a continuation to my previous mail, but since it looks
>>>>>>>> like a different problem - i decided to open a new thread for
>>>>>>>> it) 
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> Problem Description:
>>>>>>>> ----
>>>>>>>> I am doing pass-through of an Intel wireless LAN device to a
>>>>>>>> Windows XP domU (my machine is Dell e6400), and it looks like
>>>>>>>> it's working ok. Then, i disable the device using Windows
>>>>>>>> device manager, and the device is now disabled, after that i
>>>>>>>> re-enable the device, and Windows re-enables the device
>>>>>>>> correctly. However, the wlan device seems to malfunction (it
>>>>>>>> can't turn on the WiFi of the computer), and can't connect to
>>>>>>>> wireless networks. I tried it, both with MSI translation on,
>>>>>>>> and with MSI translation off - it doesn't matter.
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> My analysis:
>>>>>>>> ----
>>>>>>>> 1) Well, taking a look at the real PCI config space, before
>>>>>>>> disable and after the (last) enable, shows that the difference
>>>>>>>> is at the Intx bit (read-only bit 3 at status register (offset
>>>>>>>> 0x6) at the PCI config space). Before disable, that bit was 0,
>>>>>>>> and after the last enable that bit was 1. This, according to my
>>>>>>>> understanding, means that the device is asserting it's IntX ,
>>>>>>>> and probably waiting for someone to handle it, no?
>>>>>>>> 
>>>>>>>> 2) When i tried to track when did this bit was changed - i
>>>>>>>> added a code which in every PCI config read, checks if that
>>>>>>>> bit was changed - and added a print when it changed. The
>>>>>>>> proper lines in the qemu log looks like this: ...
>>>>>>>> pt_pci_read_config: [00:01.0]: address=00f0 val=0x00000000
>>>>>>>> len=2 ACPI PCI hotplug: read addr=0x10c6, val=0x0f.
>>>>>>>> ACPI PCI hotplug: read addr=0x10c6, val=0x0f.
>>>>>>>> pt_pci_read_config: TEST CODE: STATUS CHNAGED! OLD: 0x10, NEW:
>>>>>>>> 0x18 pt_pci_read_config: [00:01.0]: address=0000
>>>>>>>> val=0x00008086 len=2 ... 
>>>>>>>> 
>>>>>>>> This implies that the bit was changed, about the same time that
>>>>>>>> Windows tried to start using it (because, i assume that it
>>>>>>>> tried using it, just after questioning the ACPI for the
>>>>>>>> existence of the device). No? 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Can someone help me with this?
>>>>>>>> 
>>>>>>>> (BTW - i am using Xen 3.4)
>>>>>>>> 
>>>>>>>> Tom
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Xci-devel mailing list
>>>>>>>> Xci-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>>> http://lists.xensource.com/mailman/listinfo/xci-devel
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.