[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [Xci-devel] Porblem with disabling and then re-enabling a PT device in Windows



Well, as an updtae, i found the followings:
after i do a disable, and then re-enable, the interrupt is "kind-of"
stuck for ~1:30 minutes, and after that, the interrupt is somehow
released, and everything works ok.
I tested it a few times, and in each time the interrupt was released
about ~1:30 minutes later.

Does that imples anything? or is that probably just a recover of the
Windows driver?

On Thu, Nov 26, 2009 at 9:55 AM, Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> wrote:
>
> Tom Rotenberg wrote:
>> How do i know if i'm using 'ack_type_new', what does it mean?
>> Do u have any idea, on how i can check inside domU windows XP (using
>> WinDBG of-course) if the virtual local APIC/IOAPIC has EOI the
>> interrupt?
>
> I didn't try windbg to check local/io apic before, although I suppose you can 
> do that since that is simply MMIO access.
> Also you can add a hotkey (i.e. the key pressed after the 3 "ctrl+a") to xen 
> hypervisor to dump guest's virtual local apic/ioapic context.
>
> --jyh
>
>>
>> It happens every time... it's 100% reproduceable on that Dell machine.
>>
>> On Thu, Nov 26, 2009 at 3:40 AM, Jiang, Yunhong
>> <yunhong.jiang@xxxxxxxxx> wrote:
>>>
>>>
>>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote:
>>>> After digging more into this problem, i found out that the problem
>>>> is because the interrupt generated on the wlan device, isn't being
>>>> transfered to the domain, for some reason, after the device was
>>>> re-enablked in Windows. I saw that, by connecting to the xen
>>>> console, and then clicking 'i', and i got the following lines: ...
>>>> (XEN)    Vec192 IRQ 17: type=IO-APIC-level   status=00000010
>>>> in-flight=1 domain-list=0: 17(----),3: 17(---M),
>>>> ...
>>>> (XEN)       Apic 0x00, Pin 17: vector=192, delivery_mode=1,
>>>> dest_mode=logical, delivery_status=1, polarity=1, irr=1,
>>>> trigger=level, mask=0 ....
>>>>
>>>> You can see, that the interrupt 17, which is in my Windows domU, was
>>>> generated, but still weren't injected to the CPU (the 'irr' is 1).
>>>> So, i guess that this is what is causing the problem.
>>>> Now, the only issue left, is why the hell, the interrupt isn't being
>>>> injected to the domain?
>>>
>>> I assume you are using ack_type_new on your system, am I right?
>>> Usually it means guest has not EOI the interrupt, so that
>> host has no chance to EOI the physical IOAPIC. Can you check
>> the virtual Local APIC/IOAPIC for the guest to see if we have
>> any finding?
>>> BTW, does it happen everytime?
>>>
>>> --jyh
>>>
>>>>
>>>> Has anyone has any idea about it?
>>>>
>>>> On Wed, Nov 25, 2009 at 6:31 PM, Tom Rotenberg
>>>> <tom.rotenberg@xxxxxxxxx> wrote:
>>>>> Well, i just performed some tests, and it doesn't look like the
>>>>> disable_msi/enable_msi functions in pciback are being called at all
>>>>> (moreover, not in the disable-enable from domU Windows XP), so i
>>>>> don't think it's related. Also, since when, a config space write
>>>>> from a guest domU triggers code in the pciback?
>>>>>
>>>>> I think that it's not the problem here...
>>>>> Maybe someone from the XCI can shed some light here, and tell us
>>>>> how they solve it (or not)? since their code should run on the
>>>>> same Dell machines, no?
>>>>>
>>>>> On Wed, Nov 25, 2009 at 5:13 PM, Kamala Narasimhan
>>>>> <Kamala.Narasimhan@xxxxxxxxxx> wrote:
>>>>>> I shouldn't have suggested that you build without pciback;
>>>> I got carried away trying to make it simple for you :-);
>>>> Obviously you would need it and I should have stopped with
>>>> suggesting that you tweak it.
>>>>>>
>>>>>> Here is the thought process that led to my suggestion -
>>>>>>
>>>>>> Clearly, that bit is getting changed as indicated in your
>>>> log.  It is unlikely that the guest is triggering that change
>>>> which makes pciback a potential candidate to suspect as it
>>>> does change pci configuration space bits.  I need to add some
>>>> tracing and look at the path of execution to answer some of
>>>> your specific questions accurately and I won't be able to do
>>>> that right now but I can give some context to help you based
>>>> on what I have experienced in comparable situation and based
>>>> on that I would say pciback is one place to suspect.  To be a
>>>> bit more specific I would say look into
>>>> pciback_enable_msi/pciback_disable_msi code, add some tracing
>>>> there, observe whether or not that code path is taken when the
>>>> device is disabled/reenabled within guest etc.  To reiterate,
>>>> these are mere suggestions but looks plausible based on prior
>>>> observations.
>>>>>>
>>>>>> Kamala
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Tom Rotenberg [mailto:tom.rotenberg@xxxxxxxxx]
>>>>>>> Sent: Wednesday, November 25, 2009 9:22 AM
>>>>>>> To: Kamala Narasimhan
>>>>>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; xci-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>> Subject: Re: [Xci-devel] Porblem with disabling and then
>>>>>>> re-enabling a PT device in Windows
>>>>>>>
>>>>>>> I am not sure i undertand how to test it...
>>>>>>> 1) Avoid doing FLR for the device - isn';t that done only when
>>>>>>> building the domain? does that happen when i disable the device
>>>>>>> in domU? 2) Don't build pciback - and then, i won't bind the wlan
>>>>>>> device to pciback? and change the xend scripts which check for
>>>>>>> it? 3) Comment out the relevant code - which code??
>>>>>>>
>>>>>>> I also don't understand, how could it be that the pciback device
>>>>>>> is "messing" with it? isn't it supposed to be in-active when the
>>>>>>> device is being used in PT?
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>> On Wed, Nov 25, 2009 at 4:12 PM, Kamala Narasimhan
>>>>>>> <Kamala.Narasimhan@xxxxxxxxxx> wrote:
>>>>>>>> There is a chance pciback is changing the bit you are referring
>>>>>>> to.  To confirm that, just for testing purpose you might want to
>>>>>>> avoid FLR for that device or simply not build pciback or comment
>>>>>>> out relevant code in that module whichever is easier and see if
>>>>>>> that helps.  If it does, you can then look into fixing the
>>>>>>> problem the right way.
>>>>>>>>
>>>>>>>> Kamala
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: xci-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xci-devel-
>>>>>>>>> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Tom Rotenberg
>>>>>>>>> Sent: Wednesday, November 25, 2009 8:09 AM
>>>>>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx;
>>>>>>>>> xci-devel@xxxxxxxxxxxxxxxxxxx Subject: [Xci-devel] Porblem
>>>>>>>>> with disabling and then re-enabling a PT device in Windows
>>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> (This is a continuation to my previous mail, but since it looks
>>>>>>>>> like a different problem - i decided to open a new thread for
>>>>>>>>> it)
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>> Problem Description:
>>>>>>>>> ----
>>>>>>>>> I am doing pass-through of an Intel wireless LAN device to a
>>>>>>>>> Windows XP domU (my machine is Dell e6400), and it looks like
>>>>>>>>> it's working ok. Then, i disable the device using Windows
>>>>>>>>> device manager, and the device is now disabled, after that i
>>>>>>>>> re-enable the device, and Windows re-enables the device
>>>>>>>>> correctly. However, the wlan device seems to malfunction (it
>>>>>>>>> can't turn on the WiFi of the computer), and can't connect to
>>>>>>>>> wireless networks. I tried it, both with MSI translation on,
>>>>>>>>> and with MSI translation off - it doesn't matter.
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>> My analysis:
>>>>>>>>> ----
>>>>>>>>> 1) Well, taking a look at the real PCI config space, before
>>>>>>>>> disable and after the (last) enable, shows that the difference
>>>>>>>>> is at the Intx bit (read-only bit 3 at status register (offset
>>>>>>>>> 0x6) at the PCI config space). Before disable, that bit was 0,
>>>>>>>>> and after the last enable that bit was 1. This, according to my
>>>>>>>>> understanding, means that the device is asserting it's IntX ,
>>>>>>>>> and probably waiting for someone to handle it, no?
>>>>>>>>>
>>>>>>>>> 2) When i tried to track when did this bit was changed - i
>>>>>>>>> added a code which in every PCI config read, checks if that
>>>>>>>>> bit was changed - and added a print when it changed. The
>>>>>>>>> proper lines in the qemu log looks like this: ...
>>>>>>>>> pt_pci_read_config: [00:01.0]: address=00f0 val=0x00000000
>>>>>>>>> len=2 ACPI PCI hotplug: read addr=0x10c6, val=0x0f.
>>>>>>>>> ACPI PCI hotplug: read addr=0x10c6, val=0x0f.
>>>>>>>>> pt_pci_read_config: TEST CODE: STATUS CHNAGED! OLD: 0x10, NEW:
>>>>>>>>> 0x18 pt_pci_read_config: [00:01.0]: address=0000
>>>>>>>>> val=0x00008086 len=2 ...
>>>>>>>>>
>>>>>>>>> This implies that the bit was changed, about the same time that
>>>>>>>>> Windows tried to start using it (because, i assume that it
>>>>>>>>> tried using it, just after questioning the ACPI for the
>>>>>>>>> existence of the device). No?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can someone help me with this?
>>>>>>>>>
>>>>>>>>> (BTW - i am using Xen 3.4)
>>>>>>>>>
>>>>>>>>> Tom
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Xci-devel mailing list
>>>>>>>>> Xci-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> http://lists.xensource.com/mailman/listinfo/xci-devel
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xci-devel mailing list
Xci-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xci-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.