[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [linux-4.1 test] 63030: regressions - FAIL
On Wed, Oct 21, 2015 at 05:47:06PM +0100, Ian Campbell wrote: > On Tue, 2015-10-20 at 16:34 +0100, Ian Jackson wrote: > > Wei Liu writes ("Re: [Xen-devel] [linux-4.1 test] 63030: regressions > > - FAIL"): > > > From mere code inspection and document of lwip 1.3.0 I think mini > > -os > > > does send gratuitous ARP. > > > > The guest is using the PVHVM drivers at this point, with the backend > > directly in dom0, so it is the guest's gratuitous arp which is needed, > > I think. > > It would be worth investigating whether mini-os's gratuitous ARP might > also be occurring and confusing things, e.g. by coming after and > therefore taking precedence over the one coming from the guest. > Several observations: 1. The guest doesn't always send gratuitous arp -- but this might not be the cause of this failure. Guest works fine when using qemu-trad only. 2. Guest only sends one gratuitous arp at most. 3. When using stubdom, guest is a lot less responsive. See two experiments and analysis below. I statically add arp entry for guest interface because arp entry some times gets deleted. Note that this is not covering up the root cause of failure because the arp entry is normally deleted after a few migration iterations. The failure on merlot* mostly fail on first iteration. And when arp entry is not available, the error for ssh should be "No route to host", not "timed out". Furthermore when the arp entry is not available, dom0 naturally sends an arp request to guest. When stubdom is not in use, guest responded instantly, when stubdom is in use, guest was a lot less responsive. I use a script to repeat migration and ssh. i=1 while true; do echo "#### iteration $i" ssh localhost xl migrate wheezy-hvm localhost if [ $? != 0 ]; then echo "migration failed $?"; exit 1; fi timeout 40 ssh -o BatchMode=yes -o ConnectTimeout=100 -o ServerAliveInterval=100 root@xxxxxxxxxxxx date st=$? if [ $st != 0 ]; then echo "failed $st"; exit 1; fi i=$((i+1)) done At the same time tcpdump -i xenbr0 arp and host $GUEST_IP When stubdom is present. Scenario 1: xl shows "Migration successful." ...30s... xenbr0 receives gratuitous arp ...1s... ssh date command comes back Scenario 2: xenbr0 receives gratuitous arp ...1s... xl shows "Migration successful." ssh date command comes back When stubdom was not present I never saw scenario 1. Note that my machine is relative old (>6 years). It would never pass the test in osstest because in osstest the timeout is 10s. The slowness in osstest seems to be host specific because all failures in guest migrate test failed on merlot*. It's not only linux-4.1 is failing, other branches fail the same test step on merlot*, too. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |