[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-devel] create irq failed due to move_cleanup_count always being set




> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: Friday, January 06, 2012 7:01 PM
> To: Liuyongan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Keir (Xen.org); Qianhuibin
> Subject: Re: [xen-devel] create irq failed due to move_cleanup_count
> always being set
> 
> Could you please avoid top posing.
> 
> On 06/01/12 06:04, Liuyongan wrote:
> >    As only 33 domains were successfully created(and destroyed) before
> the problem occurring,  there should be enough free IRQ  number and
> vector number to allocate(suppose that irqs and vectors failed to
> deallocate). And destroy_irq() will clear move_in_progress, so
> move_cleanup_count must be setted?  Is this the case?
> 
> Is it repeatably 33 domains, or was that a 1 off experiment?  Can you

  No, it's not repeatable, this occurred 2 times, another one is after 152 
domains.

> confirm exactly which version of Xen you are using, including changeset
> if you know it?  Without knowing your hardware, it is hard to say if
> there are actually enough free IRQs, although I do agree that what you
> are currently seeing is buggy behavior.
> 
> The per-cpu IDT functionality introduced in Xen-4.0 is fragile at the
> best of times, and has had several bugfixes and tweaks to it which I am
> not certain have actually found their way back to Xen-4.0.  Could you
> try with Xen-4.1 and see if the problem persists?
> 
> ~Andrew

  As I could not make it re-occure in xen-4.0, trying xen-4.1 seems useless.
I noticed a scenario:

   1) move_in_progress occure;
   2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent;
   3) the irq is destroyed, so cfg->vector is cleared, and etc.;
   4) IRQ_MOVE_CLEANUP_VECTOR irq is responded.
 
  In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is also reset, so 
in step 4) move_cleanup_count will failed to sub by one, and finally leading to 
create_irq failure(right?);

  In xen-4.0, step 3, and in my code vector_irq is not reset(this is a bug as 
you'v mentioned),  I still could not figure out why 
create_irq should failed.

> 
> >> -----Original Message-----
> >> From: Liuyongan
> >> Sent: Thursday, January 05, 2012 2:14 PM
> >> To: Liuyongan; xen-devel@xxxxxxxxxxxxxxxxxxx;
> >> andrew.cooper3@xxxxxxxxxx; keir@xxxxxxx
> >> Cc: Qianhuibin
> >> Subject: RE: [xen-devel] create irq failed due to move_cleanup_count
> >> always being set
> >>
> >>> On 04/01/12 11:38, Andrew Cooper wrote:
> >>>> On 04/01/12 04:37, Liuyongan wrote:
> >>>> Hi, all
> >>>>
> >>>>     I'm using xen-4.0 to do a test. And when I create a domain, it
> >> failed due to create_irq() failure. As only 33 domains were
> >> successfully created and destroyed before I got the continuous
> >> failures, and the domain just before the failure was properly
> >> destroyed(at least destroy_irq() was properly called, which will
> clear
> >> move_in_progress, according to the prink-message). So I can conclude
> >> for certain that __assign_irq_vector failed due to
> move_cleanup_count
> >> always being set.
> >>> Is it always 33 domains it takes to cause the problem, or does it
> >> vary?
> >>> If it varies, then I think you want this patch
> >>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01
> >> which
> >>> corrects the logic which works out which moved vectors it should
> >> clean
> >>> up.  Without it, stale irq numbers build up in the per-cpu
> irq_vector
> >>> tables leading to __assign_irq_vector failing with -ENOSPC as it
> find
> >>> find a vector to allocate.
> >>   Yes, I've noticed this patch, as only 33 domains were created
> before
> >> the failures, so vectors of a given cpu should not have been used
> up.
> >> Besides, I got this problem after 143 domains were created another
> >> time. But I could not repeat this problem manually as 4000+ domains
> >> created successfully without this problem.
> >>
> >>>> //this is the normal case when create and destroy domain whose id
> is
> >> 31;
> >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0
> >>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind
> >>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79
> >>>> (XEN) irq.c:223, destroy irq 77
> >>>>
> >>>> //domain id 32 is created and destroyed correctly also.
> >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0
> >>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind
> >>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79
> >>>> (XEN) irq.c:223, destroy irq 77
> >>>>
> >>>> //all the subsequent domain creation failed, below lists only 3
> >> times:
> >>>> (XEN) physdev.c:88: dom33: can't create irq for msi!
> >>>> (XEN) physdev.c:88: dom34: can't create irq for msi!
> >>>> (XEN) physdev.c:88: dom35: can't create irq for msi!
> >>>>
> >>>>      I think this might be a bug and might have fixed, so I
> compare
> >> my code with 4.1.2 and search the mail list for potential patches.
> >>
> (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu
> >> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch
> which
> >> add locks in __assign_irq_vector. Can anybody explain why this lock
> is
> >> needed? Or is there a patch that might fix my bug? Thx.
> >>> This patch fixes a problem where IOAPIC line level interrupts cease
> >> for
> >>> a while.  It has nothing to do with MSI interrupts.  (Also, there
> are
> >> no
> >>> locks altered, and xen-4.0-testing seems to have gained an
> additional
> >>> hunk in hvm/vmx code unrelated to the original patch.)
> >>>
> >>>>     Addition message: my board is arch-x86, no domains left when
> >> failed to create new ones, create_irq failure lasted one day until I
> >> reboot the board, and the irq number allocated is used certainly for
> a
> >> msi dev.
> >>>> Yong an Liu
> >>>> 2012.1.4
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>> http://lists.xensource.com/xen-devel**************
> 
> --
> Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
> T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.