[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] stubdom migration failure on merlot* XSM related (Was: [adhoc test] 65682: tolerable FAIL])



On Fri, 2015-12-11 at 14:05 +0000, Ian Campbell wrote:

So returning to the logs for 65682 (the first repro, which exhibited the
fail with the usual timeout).

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/info.html

First thing I noticed was the numa config
http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1-output-xl_vcpu-list
dom0 has 32 vcpus, allowed hard/soft on all processors.

the domU has 2 vcpus, with a soft preference for pcpu 16-23, while the stub
dom has a soft preference for 0-7. I think that's normal NUMA placement
doing what we want and while I've not thought about the wiseness (or
otherwise) of having the stubdom on another node I don't see why that
should cause a 10s+ delay to the network device forwarding.

I've confirmed inÂ
http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1--etc-network-interfaces.new
that STP is disabled.

The hotplug log is empty.

The migration was of dom5 + stubdom6 into dom7 + stubdom8.

Time line (from 13.ts-guest-localmigrate.log):

2015-12-10 20:10:30 Z migration begins
2015-12-10 20:13:43 Z migration completes ("execution took 193 seconds[<=2x400]"

2015-12-10 20:13:43 Z ssh to guest `date` begins
2015-12-10 20:13:53 Z ssh guest timesoutÂ
2015-12-10-20:13:58 Z ssh produces: "Thu Dec 10 20:13:58 UTC 2015."

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/serial-merlot1.log
(very abridged)

Dec 10 20:13:36.441014 (XEN) HVM5 save: CPU
[...]
Dec 10 20:13:36.641048 (XEN) HVM7 restore: CPU 0
[...]
Dec 10 20:13:36.705029 [ 1391.363224] device vif8.0 entered promiscuous mode
Dec 10 20:13:38.753001 [ 1391.369957] IPv6: ADDRCONF(NETDEV_UP): vif8.0: link 
is not ready
Dec 10 20:13:38.760973 (d8) Bootstrapping...
Dec 10 20:13:38.985061 (d8) backend at /local/domain/0/backend/vif/8/0
[...]
Dec 10 20:13:38.993010 (d8) mac is 5a:36:0e:92:00:02
Dec 10 20:13:38.993038 [ 1391.467483] IPv6: ADDRCONF(NETDEV_CHANGE): vif8.0: 
link becomes ready
Dec 10 20:13:39.001015 [ 1391.467574] xenbr0: port 4(vif8.0) entered forwarding 
state
Dec 10 20:13:39.009015 [ 1391.467608] xenbr0: port 4(vif8.0) entered forwarding 
state
[...]
Dec 10 20:13:39.273017 (d8) xs_read_watch() -> 
/local/domain/0/backend/vbd/8/5632/params hdc
[...]
Dec 10 20:13:39.281011 [ 1391.850153] device vif7.0 entered promiscuous mode
Dec 10 20:13:39.281047 [ 1391.856999] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link 
is not ready
Dec 10 20:13:39.289000 (XEN) irq.c:385: Dom7 callback via changed to Direct 
Vector 0xf3
[...]
Dec 10 20:13:39.369050 (d8) close(3)
Dec 10 20:13:39.377010 (d8) close network: backend at 
/local/domain/0/backend/vif/8/0
Dec 10 20:13:39.377045 [ 1391.996438] xen_netback:set_backend_state: 
backend/vif/8/0: prepare for reconnect
Dec 10 20:13:39.385011 [ 1392.034769] xen-blkback:ring-ref 9, event-channel 24, 
protocol 1 (x86_64-abi) 
Dec 10 20:13:39.424993 [ 1392.125011] xenbr0: port 2(vif6.0) entered disabled 
state
Dec 10 20:13:39.513026 [ 1392.125835] device vif6.0 left promiscuous mode
Dec 10 20:13:39.513062 [ 1392.125868] xenbr0: port 2(vif6.0) entered disabled 
state
Dec 10 20:13:39.520999 [ 1392.185454] xenbr0: port 3(vif5.0) entered disabled 
state
Dec 10 20:13:39.569013 [ 1392.186308] device vif5.0 left promiscuous mode
Dec 10 20:13:39.577026 [ 1392.186393] xenbr0: port 3(vif5.0) entered disabled 
state
Dec 10 20:13:39.584976 (XEN) avc:ÂÂdeniedÂÂ{ pcilevel } for domid=8 target=7 
scontext=system_u:system_r:dm_dom_t tcontext=system_u:system_r:domU_t_target 
tclass=hvm
Dec 10 20:13:39.793039 [ 1392.468784] xenbr0: port 4(vif8.0) entered disabled 
state
Dec 10 20:13:39.856990 [ 1409.562306] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: 
link becomes ready

Dec 10 20:13:56.953048 [ 1409.562434] xenbr0: port 5(vif7.0) entered forwarding 
state
Dec 10 20:13:56.953097 [ 1409.562450] xenbr0: port 5(vif7.0) entered forwarding 
state

(the end, until ts-log-capture sends some keys)

I think what we can see at Dec 10 20:13:39.289000 is the PVHVM stuff
initialising direct vector injection of event channels and the stubdom8
network close at Dec 10 20:13:39.377010 is the guest unplugging the
emulated devices, then vif8.0 (the corresponding stubdom vif) goes away and
vif7.0 (the guest's vif) becomes ready.

All of that is well within the window of the "xl migrate".

However as Wei noticed there is then a 17 second delay until vif7.0 enters
the forwarding state.

The source stubdom log (which includes the kernel serial log) is
http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1---var-log-xen-qemu-dm-debianhvm.guest.osstest.log
and the destination is
http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1---var-log-xen-qemu-dm-debianhvm.guest.osstest--incoming.log

Both lack timestamps so it's not easy to discern anything.

In the non-XSM caseÂ
2015-12-11 12:58:54 Z migration finishes
2015-12-11 12:58:54 Z ssh guest `date` starts
2015-12-11 12:58:55 Z date output from guest: Fri Dec 11 12:58:55 UTC 2015

and in the host serial logs:
Dec 11 12:58:50.573034 [ 1007.966381] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: 
link becomes ready

Dec 11 12:58:54.133079 [ 1007.966487] xenbr0: port 5(vif7.0) entered forwarding 
state
Dec 11 12:58:54.133149 [ 1007.966502] xenbr0: port 5(vif7.0) entered forwarding 
state

Note that there is a 4s (not 16s as with XSM) delay there.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.