[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-devel] create irq failed due to move_cleanup_count always being set



On 06/01/12 11:50, Liuyongan wrote:
>
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>> Sent: Friday, January 06, 2012 7:01 PM
>> To: Liuyongan
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Keir (Xen.org); Qianhuibin
>> Subject: Re: [xen-devel] create irq failed due to move_cleanup_count
>> always being set
>>
>> Could you please avoid top posing.
>>
>> On 06/01/12 06:04, Liuyongan wrote:
>>>    As only 33 domains were successfully created(and destroyed) before
>> the problem occurring,  there should be enough free IRQ  number and
>> vector number to allocate(suppose that irqs and vectors failed to
>> deallocate). And destroy_irq() will clear move_in_progress, so
>> move_cleanup_count must be setted?  Is this the case?
>>
>> Is it repeatably 33 domains, or was that a 1 off experiment?  Can you
>   No, it's not repeatable, this occurred 2 times, another one is after 152 
> domains.

Can you list all the failures you have seen with the number of domains? 
So far it seems that it has been 33 twice but many more some of the
time, which doesn't lend itself to saying "33 domains is a systematic
failure" for certain at the moment.

>> confirm exactly which version of Xen you are using, including changeset
>> if you know it?  Without knowing your hardware, it is hard to say if
>> there are actually enough free IRQs, although I do agree that what you
>> are currently seeing is buggy behavior.
>>
>> The per-cpu IDT functionality introduced in Xen-4.0 is fragile at the
>> best of times, and has had several bugfixes and tweaks to it which I am
>> not certain have actually found their way back to Xen-4.0.  Could you
>> try with Xen-4.1 and see if the problem persists?
>>
>> ~Andrew
>   As I could not make it re-occure in xen-4.0, trying xen-4.1 seems useless.
> I noticed a scenario:

I am confused.  Above, you say that the problem is repeatable, but here
you say it is not.

>    1) move_in_progress occure;
>    2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent;
>    3) the irq is destroyed, so cfg->vector is cleared, and etc.;
>    4) IRQ_MOVE_CLEANUP_VECTOR irq is responded.
>  
>   In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is also reset, 
> so in step 4) move_cleanup_count will failed to sub by one, and finally 
> leading to create_irq failure(right?);
>
>   In xen-4.0, step 3, and in my code vector_irq is not reset(this is a bug as 
> you'v mentioned),  I still could not figure out why 
> create_irq should failed.

The first point of debugging should be to see how create_irq is
failing.  Is it failing because of find_unassigned_irq() or because of
__assign_irq_vector().

Another piece of useful information would be what your guests are and
what they are trying to do with interrupts.  Are you using PCI passthrough?

~Andrew

>>>> -----Original Message-----
>>>> From: Liuyongan
>>>> Sent: Thursday, January 05, 2012 2:14 PM
>>>> To: Liuyongan; xen-devel@xxxxxxxxxxxxxxxxxxx;
>>>> andrew.cooper3@xxxxxxxxxx; keir@xxxxxxx
>>>> Cc: Qianhuibin
>>>> Subject: RE: [xen-devel] create irq failed due to move_cleanup_count
>>>> always being set
>>>>
>>>>> On 04/01/12 11:38, Andrew Cooper wrote:
>>>>>> On 04/01/12 04:37, Liuyongan wrote:
>>>>>> Hi, all
>>>>>>
>>>>>>     I'm using xen-4.0 to do a test. And when I create a domain, it
>>>> failed due to create_irq() failure. As only 33 domains were
>>>> successfully created and destroyed before I got the continuous
>>>> failures, and the domain just before the failure was properly
>>>> destroyed(at least destroy_irq() was properly called, which will
>> clear
>>>> move_in_progress, according to the prink-message). So I can conclude
>>>> for certain that __assign_irq_vector failed due to
>> move_cleanup_count
>>>> always being set.
>>>>> Is it always 33 domains it takes to cause the problem, or does it
>>>> vary?
>>>>> If it varies, then I think you want this patch
>>>>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01
>>>> which
>>>>> corrects the logic which works out which moved vectors it should
>>>> clean
>>>>> up.  Without it, stale irq numbers build up in the per-cpu
>> irq_vector
>>>>> tables leading to __assign_irq_vector failing with -ENOSPC as it
>> find
>>>>> find a vector to allocate.
>>>>   Yes, I've noticed this patch, as only 33 domains were created
>> before
>>>> the failures, so vectors of a given cpu should not have been used
>> up.
>>>> Besides, I got this problem after 143 domains were created another
>>>> time. But I could not repeat this problem manually as 4000+ domains
>>>> created successfully without this problem.
>>>>
>>>>>> //this is the normal case when create and destroy domain whose id
>> is
>>>> 31;
>>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0
>>>>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind
>>>>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79
>>>>>> (XEN) irq.c:223, destroy irq 77
>>>>>>
>>>>>> //domain id 32 is created and destroyed correctly also.
>>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0
>>>>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind
>>>>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79
>>>>>> (XEN) irq.c:223, destroy irq 77
>>>>>>
>>>>>> //all the subsequent domain creation failed, below lists only 3
>>>> times:
>>>>>> (XEN) physdev.c:88: dom33: can't create irq for msi!
>>>>>> (XEN) physdev.c:88: dom34: can't create irq for msi!
>>>>>> (XEN) physdev.c:88: dom35: can't create irq for msi!
>>>>>>
>>>>>>      I think this might be a bug and might have fixed, so I
>> compare
>>>> my code with 4.1.2 and search the mail list for potential patches.
>>>>
>> (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu
>>>> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch
>> which
>>>> add locks in __assign_irq_vector. Can anybody explain why this lock
>> is
>>>> needed? Or is there a patch that might fix my bug? Thx.
>>>>> This patch fixes a problem where IOAPIC line level interrupts cease
>>>> for
>>>>> a while.  It has nothing to do with MSI interrupts.  (Also, there
>> are
>>>> no
>>>>> locks altered, and xen-4.0-testing seems to have gained an
>> additional
>>>>> hunk in hvm/vmx code unrelated to the original patch.)
>>>>>
>>>>>>     Addition message: my board is arch-x86, no domains left when
>>>> failed to create new ones, create_irq failure lasted one day until I
>>>> reboot the board, and the irq number allocated is used certainly for
>> a
>>>> msi dev.
>>>>>> Yong an Liu
>>>>>> 2012.1.4
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> http://lists.xensource.com/xen-devel**************
>> --
>> Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
>> T: +44 (0)1223 225 900, http://www.citrix.com

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.