[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] test report for Xen 4.3 RC1



Sorry for replying late. :-)

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> Sent: Tuesday, May 28, 2013 11:16 PM
> To: Ren, Yongjie; george.dunlap@xxxxxxxxxxxxx
> Cc: xen-devel@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian, Yongxue
> Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> 
> On Mon, May 27, 2013 at 03:49:27AM +0000, Ren, Yongjie wrote:
> > Hi All,
> > This is a report based on our testing for Xen 4.3.0 RC1 on Intel platforms.
> > (Sorry it's a little late. :-)  If the status changes, I'll have an update
> later.)
> 
> OK, I've some updates and ideas that can help with narrowing some of
> these
> issues down. Thank you for doing this.
> 
> >
> > Test environment:
> > Xen: Xen 4.3 RC1 with qemu-upstream-unstable.git
> > Dom0: Linux kernel 3.9.3
> 
> Could you please test v3.10-rc3. There have been some changes
> for the VCPU hotplug added in v3.10 that I am not sure whether
> they are in v3.9?
I didn't try every bug with v3.10.-rc3, but most of them still exist.

> > Hardware: Intel Sandy Bridge, Ivy Bridge, Haswell systems
> >
> > Below are the features we tested.
> > - PV and HVM guest booting (HVM: Ubuntu, Fedora, RHEL, Windows)
> > - Save/Restore and live migration
> > - PCI device assignment and SR-IOV
> > - power management: C-state/P-state, Dom0 S3, HVM S3
> > - AVX and XSAVE instruction set
> > - MCE
> > - CPU online/offline for Dom0
> > - vCPU hot-plug
> > - Nested Virtualization  (Please look at my report in the following link.)
> >  http://lists.xen.org/archives/html/xen-devel/2013-05/msg01145.html
> >
> > New bugs (4): (some of which are not regressions)
> > 1. sometimes failed to online cpu in Dom0
> >
> http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851
> 
> That looks like you are hitting the udev race.
> 
> Could you verify that these patches:
> https://lkml.org/lkml/2013/5/13/520
> 
> fix the issue (They are destined for v3.11)
> 
Not tried yet. I'll update it to you later.

> > 2. dom0 call trace when running sriov hvm guest with igbvf
> >
> http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1852
> >   -- a regression in Linux kernel (Dom0).
> 
> Hm, the call-trace you refer too:
> 
> [   68.404440] Already setup the GSI :37
> 
> [   68.405105] igb 0000:04:00.0: Enabling SR-IOV VFs using the module
> parameter is deprecated - please use the pci sysfs interface.
> 
> [   68.506230] ------------[ cut here ]------------
> 
> [   68.506265] WARNING: at
> /home/www/builds_xen_unstable/xen-src-27009-20130509/linux-2.6-pvop
> s.git/fs/sysfs/dir.c:536 sysfs_add_one+0xcc/0xf0()
> 
> [   68.506279] Hardware name: S2600CP
> 
> is a deprecated warning. Did you follow the 'pci sysfs' interface way?
> 
> Looking at da36b64736cf2552e7fb5109c0255d4af804f5e7
>     ixgbe: Implement PCI SR-IOV sysfs callback operation
> it says it is using this:
> 
> commit 1789382a72a537447d65ea4131d8bcc1ad85ce7b
> Author: Donald Dutile <ddutile@xxxxxxxxxx>
> Date:   Mon Nov 5 15:20:36 2012 -0500
> 
>     PCI: SRIOV control and status via sysfs
> 
>     Provide files under sysfs to determine the maximum number of VFs
>     an SR-IOV-capable PCIe device supports, and methods to enable and
>     disable the VFs on a per-device basis.
> 
>     Currently, VF enablement by SR-IOV-capable PCIe devices is done
>     via driver-specific module parameters.  If not setup in modprobe
> files,
>     it requires admin to unload & reload PF drivers with number of desired
>     VFs to enable.  Additionally, the enablement is system wide: all
>     devices controlled by the same driver have the same number of VFs
>     enabled.  Although the latter is probably desired, there are PCI
>     configurations setup by system BIOS that may not enable that to
> occur.
> 
>     Two files are created for the PF of PCIe devices with SR-IOV support:
> 
>         sriov_totalvfs  Contains the maximum number of VFs the device
>                         could support as reported by the TotalVFs
> register
>                         in the SR-IOV extended capability.
> 
>         sriov_numvfs    Contains the number of VFs currently enabled
> on
>                         this device as reported by the NumVFs
> register in
>                         the SR-IOV extended capability.
> 
>                         Writing zero to this file disables all VFs.
> 
>                         Writing a positive number to this file enables
> that
>                         number of VFs.
> 
>     These files are readable for all SR-IOV PF devices.  Writes to the
>     sriov_numvfs file are effective only if a driver that supports the
>     sriov_configure() method is attached.
> 
>     Signed-off-by: Donald Dutile <ddutile@xxxxxxxxxx>
> 
> 
> Can you try that please?
> 
Recently, one of my workmates already had a fix as below. 
https://lkml.org/lkml/2013/5/30/20
And, seems also already been fixed by another guy. 
https://patchwork.kernel.org/patch/2613481/

> 
> > 3. Booting multiple guests will lead Dom0 call trace
> >
> http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1853
> 
> That one worries me. Did you do a git bisect to figure out what
> is commit is causing this?
> 
I only found this bug on some Intel ~EX server. 
I don't know which version on Xen/Dom0 can work fine.
If anyone want to reproduce or debug it, it should be good.
And our team is trying to debug it internally first.

> > 4. After live migration, guest console continuously prints "Clocksource
> tsc unstable"
> >
> http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1854
> 
> This looks like a current bug with QEMU unstable missing a ACPI table?
> 
> Did you try booting the guest with the old QEMU?
> 
> device_model_version = 'qemu-xen-traditional'
> 
This issue still exists with traditional qemu-xen.
After more testing, this bug can't reproduced by some other guests.
RHEL6.4 guest will have this issue after live migration, while RHEL6.3 & 
Fedora 17 & Ubuntu 12.10 guests can work fine.

> >
> > Old bugs: (11)
> > 1. [ACPI] Dom0 can't resume from S3 sleep
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707
> 
> That should be fixed in v3.11 (as now we have the fixes)
> Could you try v3.10 with the Rafael's ACPI tree merged in?
> (so the patches that he wants to submit for v3.11)
> 
I re-tested with Rafel's linux-pm.git tree (master and acpi-hotplug branch), 
and found Dom0 S3 sleep/resume can't work, either.

> > 2. [XL]"xl vcpu-set" causes dom0 crash or panic
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730
> 
> That I think is fixed in v3.10. Could you please check v3.10-rc3?
> 
Still exists on v3.10-rc3.
The following command lines can reproduce it:
# xl vcpu-set 0 1
# xl vcpu-set 0 20

> > 3. Sometimes Xen panic on ia32pae Sandybridge when restore guest
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1747
> 
> That looks to be with v2.6.32. Is the issue present with v3.9
> or v3.10-rc3?
>
We didn't test ia32pae Xen for a long time. 
Now, we only cover ia32e Xen/Dom0.
So, this bug is only a legacy issue. 
If we have effort to verify it, we'll update it in the bugzilla.

> > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM guest
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822
> 
> That I believe was an QEMU bug:
> http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html
> 
> which should be in QEMU traditional now (05-21 was when it went
> in the tree)
> 
In this year or past year, this bug always exists (at least in our testing).
'xl vcpu-set' can't decrease the vCPU number of a HVM guest

- Jay

> > 5. Dom0 cannot be shutdown before PCI device detachment from guest
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1826
> 
> Ok, I can reproduce that too.
> 
> > 6. xl pci-list shows one PCI device (PF or VF) could be assigned to two
> different guests
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1834
> 
> OK, I can reproduce that too:
> 
> > xl create  /vm-pv.cfg
> Parsing config from /vm-pv.cfg
> libxl: error: libxl_pci.c:1043:libxl__device_pci_add: PCI device 0:1:0.0 is 
> not
> assignable
> Daemon running with PID 3933
> 
> 15:11:17 # 16 :/mnt/lab/latest/
> > xl pci-list 1
> Vdev Device
> 05.0 0000:01:00.0
> 
> > xl list
> Name                                        ID   Mem VCPUs
>       State   Time(s)
> Domain-0                                     0  2047     4
> r-----      26.7
> latest                                       1  2043     1
> -b----       5.3
> latestadesa                                  4  1024     3
> -b----       5.1
> 
> 15:11:20 # 20 :/mnt/lab/latest/
> > xl pci-list 4
> Vdev Device
> 00.0 0000:01:00.0
> 
> 
> The rest I hadn't had a chance to look at. George, have you seen
> these issues?
> 
> > 7. [upstream qemu] Guest free memory with upstream qemu is 14MB
> lower than that with qemu-xen-unstable.git
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1836
> > 8. [upstream qemu]'maxvcpus=NUM' item is not supported in upstream
> QEMU
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1837
> > 9. [upstream qemu] Guest console hangs after save/restore or
> live-migration when setting 'hpet=0' in guest config file
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1838
> > 10. [upstream qemu] 'xen_platform_pci=0' setting cannot make the
> guest use emulated PCI devices by default
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1839
> > 11. Live migration fail when migrating the same guest for more than 2
> times
> >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1845
> >
> > Best Regards,
> >      Yongjie (Jay)
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.