[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [linux-4.1 test] 63030: regressions - FAIL



On Thu, 2015-10-22 at 11:28 +0100, Wei Liu wrote:
> On Thu, Oct 22, 2015 at 10:50:54AM +0100, Ian Campbell wrote:
> > On Wed, 2015-10-21 at 18:34 +0100, Wei Liu wrote:
> > > On Wed, Oct 21, 2015 at 05:47:06PM +0100, Ian Campbell wrote:
> > > > On Tue, 2015-10-20 at 16:34 +0100, Ian Jackson wrote:
> > > > > Wei Liu writes ("Re: [Xen-devel] [linux-4.1 test] 63030:
> > > > > regressions 
> > > > > - FAIL"):
> > > > > > From mere code inspection and document of lwip 1.3.0 I think
> > > > > > mini
> > > > > -os
> > > > > > does send gratuitous ARP.
> > > > > 
> > > > > The guest is using the PVHVM drivers at this point, with the
> > > > > backend
> > > > > directly in dom0, so it is the guest's gratuitous arp which is
> > > > > needed,
> > > > > I think.
> > > > 
> > > > It would be worth investigating whether mini-os's gratuitous ARP
> > > > might
> > > > also be occurring and confusing things, e.g. by coming after and
> > > > therefore taking precedence over the one coming from the guest.
> > > > 
> > > 
> > > Several observations:
> > > 
> > > 1. The guest doesn't always send gratuitous arp -- but this might not
> > > be
> > >    the cause of this failure. Guest works fine when using qemu-trad
> > >    only.
> > 
> > As in it always sends the arp when using qemu-trad, or that it is fine
> > irrespective of not always sending it?
> > 
> 
> Whether or not stubdom is in use, the guest behaves the same -- it
> doesn't always send gratuitous arp.
> 
> When using qemu-trad alone, it's always fine when it doesn't send
> gratuitous arp because either there is cache in dom0 that already has
> guest mac address or the guest responses instantly to dom0 arp request.

Where has this cache entry come from? Any preexisting ARP cache would be
associated with vifX.0 and would go away when that device was destroyed and
replace with vif(X+1).0.

Also this only work for localhost migration. If the domain actually moved
to another host then the ARP is required in order for the physical switch
to learn the new location.

Thus it seems to me that not always sending the gratuitous ARP is the most
important thing to get to the bottom of here.

> So it comes down to the responsiveness of guest is the key.
> 
[...]
> > > 3. When using stubdom, guest is a lot less responsive. See two
> > >    experiments and analysis below.
> > 
> > Less responsive in use or only while migrating, or to ssh after
> > migration,
> > or to something else?
> > 
> 
> For every activity after migration for a period of time, including both
> arp request / reply and ssh connection.
> 
> > > Scenario 1:
> > >   xl shows "Migration successful."
> > >   ...30s...
> > >   xenbr0 receives gratuitous arp
> > >   ...1s...
> > >   ssh date command comes back
> > > 
> > > Scenario 2:
> > >   xenbr0 receives gratuitous arp
> > >   ...1s...
> > >   xl shows "Migration successful."
> > >   ssh date command comes back
> > > 
> > > When stubdom was not present I never saw scenario 1.

So in that case you only saw Scenario 2 which includes a "receives
gratuitous ARP". But above you state that even with non-stub case sometimes
the grauitous ARP is not sent. Is this a 3rd case which isn't mentioned
here?

> > It would be worth looking at the possibility of a delay between
> > "Migration
> > successful" and the target domain actually running. A 30s delay between
> > the
> > guest restarting and it sending the ARP would be pretty strange IMHO
> > 
> 
> The guest is in a weird state.
> 
> xl list shows the stubdom is in "b" state while guest has no state at
> all, heh.

Has it actually been started/unpaused then?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.