[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)



On Fri, Aug 26, 2011 at 10:26:06AM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright wrote:
> > On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote:
> > >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote:
> > >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote:
> > >>>> I've just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0 (with
> > >>>> the vga-support patch backported). I can't get my DomU's to work due to
> > >>>> the phy disks and vifs timing out in DomU and looking through my logs
> > >>>> this morning I'm getting a consistent kernel bug report with xen
> > >>>> mentioned at the top of the stack trace and vifdisconnect mentioned on
> > >>> Yikes! Ian any ideas what to try?
> > >>>
> > >>> Anthony, can you compile the kernel with debug=y and when this happens
> > >>> see what 'xl dmesg' gives? Also there is also the 'xl debug-keys g' 
> > >>> which
> > >>> should dump the grants in use.. that might help a bit.
> > >> I've compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of other
> > >> config values appeared at this point, and I took defaults for them).
> > >>
> > >> The output from /var/log/messages & 'xl dmesg' is attached. There was no
> > >> output from 'xl debug-keys g'.
> > > Ok, so I am hitting this too - I was hoping that the patch from Stefano
> > > would have fixed the issue, but sadly it did not.
> > >
> > > Let me (I am traveling right now) see if I can come up with an internim
> > > solution until Ian comes with the right fix.
> > >
> > Hi Konrad - any progress on this - it's a bit of a show stopper for me.
> 
> What is interesting is that it happens only with 32-bit guests and with
> not-so fast hardware: Atom D510 for me and in your case MSI MS-7309 
> motherboard
> (with what kind of processor?). I've a 64-bit hypervisor - not sure if you
> are using a 32-bit or 64-bit.
> 
> I hadn't tried to reproduce this on the Atom D510 with a 64-bit Dom0.
> But I was wondering if you had this setup before - with a 64-bit dom0?
> Or is that really not an option with your CPU?

So while I am still looking at the hypervisor code to figure out why
it would give me:

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

I've cobbled this patch^H^H^Hhack to retry the transaction to see if this is
a tempory issue (race) or really - somehow that L1 PTE is gone.

If you could, can you try it out and see if the errors that are spit
are repeated - mainly the "Could not find L1 PTE". You will need to
run the hypervisor with "loglvl=all" to get that information.

to compile the hypervisor with debug=y to get that

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index fd00f25..7bee981 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1607,7 +1607,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
        struct gnttab_map_grant_ref op;
        struct xen_netif_tx_sring *txs;
        struct xen_netif_rx_sring *rxs;
-
+       int retry = 3;
        int err = -ENOMEM;
 
        vif->tx_comms_area = alloc_vm_area(PAGE_SIZE);
@@ -1620,7 +1620,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 
        gnttab_set_map_op(&op, (unsigned long)vif->tx_comms_area->addr,
                          GNTMAP_host_map, tx_ring_ref, vif->domid);
-
+       op.status = 0;
+retry_tx:
        if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
                BUG();
 
@@ -1628,6 +1629,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
                netdev_warn(vif->dev,
                            "failed to map tx ring. err=%d status=%d\n",
                            err, op.status);
+               if (retry-- > 0)
+                       goto retry_tx;
                err = op.status;
                goto err;
        }
@@ -1641,6 +1644,9 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
        gnttab_set_map_op(&op, (unsigned long)vif->rx_comms_area->addr,
                          GNTMAP_host_map, rx_ring_ref, vif->domid);
 
+       retry = 3;
+       op.status = 0;
+retry_rx:
        if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
                BUG();
 
@@ -1648,6 +1654,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
                netdev_warn(vif->dev,
                            "failed to map rx ring. err=%d status=%d\n",
                            err, op.status);
+               if (retry-- > 0)
+                       goto retry_rx;
                err = op.status;
                goto err;
        }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.