[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: [PATCH] [RFC] VT-d: always clean up dpci timers.



At 18:08 -0700 on 20 Jul (1311185311), Kay, Allen M wrote:
> Hi Tim,
> 
> Can you provide the code flow that can cause this failure?
> 
> In pci_release_devices(), pci_clean_dpci_irqs() is called before
> "d->need_iommu = 0" in deassign_device().  If this path is taken, then
> it should not return from !need_iommu(d) in pci_clean_dpci_irqs().

The problem is that the xl toolstack has already deassigned the domain's
devices, using a hypercall to invoke deassign_device(), so by the time
the domain is destroyed, pci_release_devices() can't tell that it once
had a PCI device passed through.

It seems like the Right Thing[tm] would be for deassign_device() to find
and undo the relevant IRQ plumbing but I couldn't see how.  Is there a
mapping from bdf to irq in the iommu code or are they handled entirely
separately?

Tim.

> Allen
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxx] 
> Sent: Monday, July 18, 2011 9:39 AM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Cc: keir@xxxxxxx; Kay, Allen M
> Subject: [PATCH] [RFC] VT-d: always clean up dpci timers.
> 
> If a VM has all its PCI devices deassigned, need_iommu(d) becomes false
> but it might still have DPCI EOI timers that were init_timer()d but not
> yet kill_timer()d.  That causes xen to crash later because the linked
> list of inactive timers gets corrupted, e.g.:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82c480126256>] set_timer+0x1c2/0x24f
> (XEN)    [<ffff82c48011fbf8>] schedule+0x129/0x5dd
> (XEN)    [<ffff82c480122c1e>] __do_softirq+0x7e/0x89
> (XEN)    [<ffff82c480122c9d>] do_softirq+0x26/0x28
> (XEN)    [<ffff82c480153c85>] idle_loop+0x5a/0x5c
> (XEN)    
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'entry->next->prev == entry' failed at 
> /local/scratch/tdeegan/xen-unstable.hg/xen/include:172
> (XEN) ****************************************
> 
> The following patch makes sure that the domain destruction path always
> clears up the DPCI state even if !needs_iommu(d). 
> 
> Although it fixes the crash for me, I'm sufficiently confused by this
> code that I don't know whether it's enough.  If the dpci timer state
> gets freed earlier than pci_clean_dpci_irqs() then there's still a race,
> and some other function (reassign_device_ownership() ?) needs to sort
> out the timers when the PCI card is deassigned.
> 
> Allen, can you comment?
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> 
> diff -r ab6551e30841 xen/drivers/passthrough/pci.c
> --- a/xen/drivers/passthrough/pci.c   Mon Jul 18 10:59:44 2011 +0100
> +++ b/xen/drivers/passthrough/pci.c   Mon Jul 18 17:22:48 2011 +0100
> @@ -269,7 +269,7 @@ static void pci_clean_dpci_irqs(struct d
>      if ( !iommu_enabled )
>          return;
>  
> -    if ( !is_hvm_domain(d) || !need_iommu(d) )
> +    if ( !is_hvm_domain(d) )
>          return;
>  
>      spin_lock(&d->event_lock);
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.