Xen project Mailing List

Re: [Xen-devel] test report for Xen 4.3 RC1

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

From: "Ren, Yongjie" <yongjie.ren@xxxxxxxxx>

Date: Wed, 5 Jun 2013 10:14:53 +0000

Accept-language: zh-CN, en-US

Cc: "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "Xu, YongweiX" <yongweix.xu@xxxxxxxxx>, "Liu, SongtaoX" <songtaox.liu@xxxxxxxxx>, "Tian, Yongxue" <yongxue.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 05 Jun 2013 10:16:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac5h1YPU9PILZzh9SxGNfMXvRWPk+g==

Thread-topic: [Xen-devel] test report for Xen 4.3 RC1

> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] > Sent: Wednesday, June 05, 2013 12:36 AM > To: Ren, Yongjie > Cc: george.dunlap@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian, > Yongxue; xen-devel@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1 > > On Tue, Jun 04, 2013 at 03:59:33PM +0000, Ren, Yongjie wrote: > > Sorry for replying late. :-) > > > > > -----Original Message----- > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] > > > Sent: Tuesday, May 28, 2013 11:16 PM > > > To: Ren, Yongjie; george.dunlap@xxxxxxxxxxxxx > > > Cc: xen-devel@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian, > Yongxue > > > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1 > > > > > > On Mon, May 27, 2013 at 03:49:27AM +0000, Ren, Yongjie wrote: > > > > Hi All, > > > > This is a report based on our testing for Xen 4.3.0 RC1 on Intel > platforms. > > > > (Sorry it's a little late. :-) If the status changes, I'll have an > > > > update > > > later.) > > > > > > OK, I've some updates and ideas that can help with narrowing some of > > > these > > > issues down. Thank you for doing this. > > > > > > > > > > > Test environment: > > > > Xen: Xen 4.3 RC1 with qemu-upstream-unstable.git > > > > Dom0: Linux kernel 3.9.3 > > > > > > Could you please test v3.10-rc3. There have been some changes > > > for the VCPU hotplug added in v3.10 that I am not sure whether > > > they are in v3.9? > > I didn't try every bug with v3.10.-rc3, but most of them still exist. > > > > > > Hardware: Intel Sandy Bridge, Ivy Bridge, Haswell systems > > > > > > > > Below are the features we tested. > > > > - PV and HVM guest booting (HVM: Ubuntu, Fedora, RHEL, Windows) > > > > - Save/Restore and live migration > > > > - PCI device assignment and SR-IOV > > > > - power management: C-state/P-state, Dom0 S3, HVM S3 > > > > - AVX and XSAVE instruction set > > > > - MCE > > > > - CPU online/offline for Dom0 > > > > - vCPU hot-plug > > > > - Nested Virtualization (Please look at my report in the following > link.) > > > > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01145.html > > > > > > > > New bugs (4): (some of which are not regressions) > > > > 1. sometimes failed to online cpu in Dom0 > > > > > > > > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851 > > > > > > That looks like you are hitting the udev race. > > > > > > Could you verify that these patches: > > > https://lkml.org/lkml/2013/5/13/520 > > > > > > fix the issue (They are destined for v3.11) > > > > > Not tried yet. I'll update it to you later. > > Thanks! > > We tested kernel 3.9.3 with the 2 patches you mentioned, and found this bug still exist. For example, we did CPU online-offline for Dom0 for 100 times, and found 2 times (of 100 times) failed. > > > > 2. dom0 call trace when running sriov hvm guest with igbvf > > > > > > > > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1852 > > > > -- a regression in Linux kernel (Dom0). > > > > > > Hm, the call-trace you refer too: > > > > > > [ 68.404440] Already setup the GSI :37 > > > > > > [ 68.405105] igb 0000:04:00.0: Enabling SR-IOV VFs using the > module > > > parameter is deprecated - please use the pci sysfs interface. > > > > > > [ 68.506230] ------------[ cut here ]------------ > > > > > > [ 68.506265] WARNING: at > > > > /home/www/builds_xen_unstable/xen-src-27009-20130509/linux-2.6-pvop > > > s.git/fs/sysfs/dir.c:536 sysfs_add_one+0xcc/0xf0() > > > > > > [ 68.506279] Hardware name: S2600CP > > > > > > is a deprecated warning. Did you follow the 'pci sysfs' interface way? > > > > > > Looking at da36b64736cf2552e7fb5109c0255d4af804f5e7 > > > ixgbe: Implement PCI SR-IOV sysfs callback operation > > > it says it is using this: > > > > > > commit 1789382a72a537447d65ea4131d8bcc1ad85ce7b > > > Author: Donald Dutile <ddutile@xxxxxxxxxx> > > > Date: Mon Nov 5 15:20:36 2012 -0500 > > > > > > PCI: SRIOV control and status via sysfs > > > > > > Provide files under sysfs to determine the maximum number of > VFs > > > an SR-IOV-capable PCIe device supports, and methods to enable > and > > > disable the VFs on a per-device basis. > > > > > > Currently, VF enablement by SR-IOV-capable PCIe devices is done > > > via driver-specific module parameters. If not setup in modprobe > > > files, > > > it requires admin to unload & reload PF drivers with number of > desired > > > VFs to enable. Additionally, the enablement is system wide: all > > > devices controlled by the same driver have the same number of > VFs > > > enabled. Although the latter is probably desired, there are PCI > > > configurations setup by system BIOS that may not enable that to > > > occur. > > > > > > Two files are created for the PF of PCIe devices with SR-IOV > support: > > > > > > sriov_totalvfs Contains the maximum number of VFs the > device > > > could support as reported by the > TotalVFs > > > register > > > in the SR-IOV extended capability. > > > > > > sriov_numvfs Contains the number of VFs currently > enabled > > > on > > > this device as reported by the NumVFs > > > register in > > > the SR-IOV extended capability. > > > > > > Writing zero to this file disables all VFs. > > > > > > Writing a positive number to this file > enables > > > that > > > number of VFs. > > > > > > These files are readable for all SR-IOV PF devices. Writes to the > > > sriov_numvfs file are effective only if a driver that supports the > > > sriov_configure() method is attached. > > > > > > Signed-off-by: Donald Dutile <ddutile@xxxxxxxxxx> > > > > > > > > > Can you try that please? > > > > > Recently, one of my workmates already had a fix as below. > > https://lkml.org/lkml/2013/5/30/20 > > And, seems also already been fixed by another guy. > > https://patchwork.kernel.org/patch/2613481/ > > > > Great! Care to update the bug with said relevant information? Yes, updated in bugzilla. > > > > > > > 3. Booting multiple guests will lead Dom0 call trace > > > > > > > > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1853 > > > > > > That one worries me. Did you do a git bisect to figure out what > > > is commit is causing this? > > > > > I only found this bug on some Intel ~EX server. > > I don't know which version on Xen/Dom0 can work fine. > > If anyone want to reproduce or debug it, it should be good. > > And our team is trying to debug it internally first. > > Ah, OK. Then please continue on debugging it. Thanks! > > > > > > 4. After live migration, guest console continuously prints > "Clocksource > > > tsc unstable" > > > > > > > > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1854 > > > > > > This looks like a current bug with QEMU unstable missing a ACPI table? > > > > > > Did you try booting the guest with the old QEMU? > > > > > > device_model_version = 'qemu-xen-traditional' > > > > > This issue still exists with traditional qemu-xen. > > After more testing, this bug can't reproduced by some other guests. > > RHEL6.4 guest will have this issue after live migration, while RHEL6.3 & > > Fedora 17 & Ubuntu 12.10 guests can work fine. > > There is a recent thread on this where the culprit was the PV timeclock > not being updated correctly. But that would seem to be at odds with > your reporting - where you are using Fedora 17 and it works fine. > > Hm, I am at loss on this one. > Hm, but my test result is as I described. > > > > > > > > > > Old bugs: (11) > > > > 1. [ACPI] Dom0 can't resume from S3 sleep > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707 > > > > > > That should be fixed in v3.11 (as now we have the fixes) > > > Could you try v3.10 with the Rafael's ACPI tree merged in? > > > (so the patches that he wants to submit for v3.11) > > > > > I re-tested with Rafel's linux-pm.git tree (master and acpi-hotplug > branch), > > and found Dom0 S3 sleep/resume can't work, either. > > The patches he has to submit for v3.11 are in the linux-next branch. > You need to use that branch. > Dom0 S3 sleep/resume doesn't work with linux-next branch, either. attached the log. > > > > > > 2. [XL]"xl vcpu-set" causes dom0 crash or panic > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730 > > > > > > That I think is fixed in v3.10. Could you please check v3.10-rc3? > > > > > Still exists on v3.10-rc3. > > The following command lines can reproduce it: > > # xl vcpu-set 0 1 > > # xl vcpu-set 0 20 > > Ugh, same exact stack trace? And can you attach the full dmesg or serial > output (so that Ican see what there is at bootup) > Yes, the same. Also attached in this mail. > > > > > > 3. Sometimes Xen panic on ia32pae Sandybridge when restore guest > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1747 > > > > > > That looks to be with v2.6.32. Is the issue present with v3.9 > > > or v3.10-rc3? > > > > > We didn't test ia32pae Xen for a long time. > > Now, we only cover ia32e Xen/Dom0. > > So, this bug is only a legacy issue. > > If we have effort to verify it, we'll update it in the bugzilla. > > How about just dropping that bug as 'WONTFIX'. > Agree. I'll close it as "WONTFIX". > > > > > > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM guest > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822 > > > > > > That I believe was an QEMU bug: > > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html > > > > > > which should be in QEMU traditional now (05-21 was when it went > > > in the tree) > > > > > In this year or past year, this bug always exists (at least in our testing). > > 'xl vcpu-set' can't decrease the vCPU number of a HVM guest > > Could you retry with Xen 4.3 please? > With Xen 4.3 & Linux:3.10.0-rc3, I can't decrease the vCPU number of a guest.

Attachment: dom0-s3.log
Description: dom0-s3.log

Attachment: xl-vcpu-set-dom0-trace.log
Description: xl-vcpu-set-dom0-trace.log

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.