|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [linux-4.1 test] 63030: regressions - FAIL
On Thu, Oct 22, 2015 at 11:39:39AM +0100, Ian Campbell wrote:
> On Thu, 2015-10-22 at 11:28 +0100, Wei Liu wrote:
> > On Thu, Oct 22, 2015 at 10:50:54AM +0100, Ian Campbell wrote:
> > > On Wed, 2015-10-21 at 18:34 +0100, Wei Liu wrote:
> > > > On Wed, Oct 21, 2015 at 05:47:06PM +0100, Ian Campbell wrote:
> > > > > On Tue, 2015-10-20 at 16:34 +0100, Ian Jackson wrote:
> > > > > > Wei Liu writes ("Re: [Xen-devel] [linux-4.1 test] 63030:
> > > > > > regressions
> > > > > > - FAIL"):
> > > > > > > From mere code inspection and document of lwip 1.3.0 I think
> > > > > > > mini
> > > > > > -os
> > > > > > > does send gratuitous ARP.
> > > > > >
> > > > > > The guest is using the PVHVM drivers at this point, with the
> > > > > > backend
> > > > > > directly in dom0, so it is the guest's gratuitous arp which is
> > > > > > needed,
> > > > > > I think.
> > > > >
> > > > > It would be worth investigating whether mini-os's gratuitous ARP
> > > > > might
> > > > > also be occurring and confusing things, e.g. by coming after and
> > > > > therefore taking precedence over the one coming from the guest.
> > > > >
> > > >
> > > > Several observations:
> > > >
> > > > 1. The guest doesn't always send gratuitous arp -- but this might not
> > > > be
> > > > the cause of this failure. Guest works fine when using qemu-trad
> > > > only.
> > >
> > > As in it always sends the arp when using qemu-trad, or that it is fine
> > > irrespective of not always sending it?
> > >
> >
> > Whether or not stubdom is in use, the guest behaves the same -- it
> > doesn't always send gratuitous arp.
> >
> > When using qemu-trad alone, it's always fine when it doesn't send
> > gratuitous arp because either there is cache in dom0 that already has
> > guest mac address or the guest responses instantly to dom0 arp request.
>
> Where has this cache entry come from? Any preexisting ARP cache would be
> associated with vifX.0 and would go away when that device was destroyed and
> replace with vif(X+1).0.
>
No, vif-bridge script has two runes for off-lining a vif
brctl delif $bridge $vif
ifconfig $vif down
Neither of these causes cache entry to be flushed.
> Also this only work for localhost migration. If the domain actually moved
> to another host then the ARP is required in order for the physical switch
> to learn the new location.
>
> Thus it seems to me that not always sending the gratuitous ARP is the most
> important thing to get to the bottom of here.
>
That's another issue, but this would cause other error (no route to
host) instead of timeout. The failure exhibits timeout error -- let's do
one thing at a time.
> > So it comes down to the responsiveness of guest is the key.
> >
> [...]
> > > > 3. When using stubdom, guest is a lot less responsive. See two
> > > > experiments and analysis below.
> > >
> > > Less responsive in use or only while migrating, or to ssh after
> > > migration,
> > > or to something else?
> > >
> >
> > For every activity after migration for a period of time, including both
> > arp request / reply and ssh connection.
> >
> > > > Scenario 1:
> > > > xl shows "Migration successful."
> > > > ...30s...
> > > > xenbr0 receives gratuitous arp
> > > > ...1s...
> > > > ssh date command comes back
> > > >
> > > > Scenario 2:
> > > > xenbr0 receives gratuitous arp
> > > > ...1s...
> > > > xl shows "Migration successful."
> > > > ssh date command comes back
> > > >
> > > > When stubdom was not present I never saw scenario 1.
>
> So in that case you only saw Scenario 2 which includes a "receives
> gratuitous ARP". But above you state that even with non-stub case sometimes
> the grauitous ARP is not sent. Is this a 3rd case which isn't mentioned
> here?
>
Scenario 3:
xl shows "Migration successful."
dom0 sends arp request because arp cache entry not available
guest takes a long time to respond when using stubdom or responds
instantly when not using stubdom
Scenario 4:
xl shows "Migration successful."
(arp cache entry still available)
guest takes a long time to respond to ssh when using stubdom or
responds instantly when not using stubdom
> > > It would be worth looking at the possibility of a delay between
> > > "Migration
> > > successful" and the target domain actually running. A 30s delay between
> > > the
> > > guest restarting and it sending the ARP would be pretty strange IMHO
> > >
> >
> > The guest is in a weird state.
> >
> > xl list shows the stubdom is in "b" state while guest has no state at
> > all, heh.
>
> Has it actually been started/unpaused then?
>
Yes, of course -- otherwise the state would have been "p". And I
observed the transition from "p" to "weird state".
Wei.
> Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |