[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] APIC error on CPU[x] in xcp dom0


  • To: "xen.org" <ian.jackson@xxxxxxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Tue, 25 Jan 2011 10:52:42 +0000
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 25 Jan 2011 02:53:24 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; b=Z88JeBjRtzNgQVJIx6p9/vz/l6kvq+lMW7XX+aHygHBKtbSKLiStdnkD6EaozldUEH akZYFf0YoUFRHRaacllQk3N1Aa+WQSRMi1l3MJA+hNuZ8MqCZGjLpU8jJixqOZXoUaeW TakMyTkI114YyNc3RzNpsk4gxn58ez4RUWlSE=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Something strange about this error.  Comparing the serial logs of the
failure on "leaf-beetle":
http://www.chiark.greenend.org.uk/~xensrcts/logs/5231/test-i386-xcpkern-i386-xl/serial-leaf-beetle.log

to a successful boot on the same machine:
http://www.chiark.greenend.org.uk/~xensrcts/logs/5161/test-i386-xcpkern-i386-xl/serial-leaf-beetle.log

the thing that stands out are some scary messages from Xen during the
failed boot:
---
(XEN) CPU counter reports 4094 correctable hardware errors that were
Jan 24 17:37:24  not reported by the status MSRs
...
Jan 24 17:37:27 (XEN) APIC error on CPU3: 00(08)
Jan 24 17:37:27 (XEN) APIC error on CPU2: 00(08)
Jan 24 17:37:27 (XEN) APIC error on CPU0: 00(08)
Jan 24 17:37:27 (XEN) APIC error on CPU1: 00(08)
...
Jan 24 17:37:30 (XEN) CPU counter reports 4094 correctable hardware
errors that were not reporte
Jan 24 17:37:30 d by the status MSRs
---
Immediately after which, the sata driver complains that the "identify"
command failed:
---
Jan 24 17:37:33 ata1.00: qc timeout (cmd 0xec)
Jan 24 17:37:33 ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 24 17:37:33 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 24 17:37:43 ata1.00: qc timeout (cmd 0xec)
Jan 24 17:37:43 ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 24 17:37:43 ata1: limiting SATA link speed to 1.5 Gbps
Jan 24 17:37:43 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 24 17:38:13 ata1.00: qc timeout (cmd 0xec)
Jan 24 17:38:13 ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 24 17:38:14 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
---

The "correctable hardware errors" messages are present in the
successful log as well, but not the APIC error messages.

Who does development / maintenance on the xcp kernel?  Is that a Novell thing?

 -George

On Mon, Jan 24, 2011 at 6:17 PM, xen.org <ian.jackson@xxxxxxxxxxxxx> wrote:
> flight 5231 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/5231/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking:
>  test-i386-xcpkern-i386-xl     5 xen-boot                   fail REGR. vs. 
> 5174
>
> Tests which did not succeed, but are not blocking,
> including regressions (tests previously passed) regarded as allowable:
>  test-amd64-amd64-win         16 leak-check/check             fail   never 
> pass
>  test-amd64-amd64-xl-win       7 windows-install              fail   never 
> pass
>  test-amd64-i386-rhel6hvm-amd  8 guest-saverestore            fail   never 
> pass
>  test-amd64-i386-rhel6hvm-intel  8 guest-saverestore            fail never 
> pass
>  test-amd64-i386-win-vcpus1   16 leak-check/check             fail   never 
> pass
>  test-amd64-i386-win          16 leak-check/check             fail   never 
> pass
>  test-amd64-i386-xl-win-vcpus1  7 windows-install              fail  never 
> pass
>  test-amd64-xcpkern-i386-rhel6hvm-amd  8 guest-saverestore      fail never 
> pass
>  test-amd64-xcpkern-i386-rhel6hvm-intel  8 guest-saverestore    fail never 
> pass
>  test-amd64-xcpkern-i386-win  16 leak-check/check             fail   never 
> pass
>  test-amd64-xcpkern-i386-xl-win  7 windows-install              fail never 
> pass
>  test-i386-i386-win           16 leak-check/check             fail   never 
> pass
>  test-i386-i386-xl-win         7 windows-install              fail   never 
> pass
>  test-i386-xcpkern-i386-win   16 leak-check/check             fail   never 
> pass
>
> version targeted for testing:
>  xen                  b59f04eb8978
> baseline version:
>  xen                  003acf02d416
>
> ------------------------------------------------------------
> People who touched revisions under test:
>  Allen Kay <allen.m.kay@xxxxxxxxx>
>  Fengzhe Zhang <fengzhe.zhang@xxxxxxxxx>
>  George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>  Ian Campbell <ian.campbell@xxxxxxxxxx>
>  Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>  Kamala Narasimhan <kamala.narasimhan@xxxxxxxxxx>
>  Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>  Tim Deegan <Tim.Deegan@xxxxxxxxxx>
>  Wei Wang <wei.wang2@xxxxxxx>
> ------------------------------------------------------------
>
> jobs:
>  build-i386-xcpkern                                           pass
>  build-amd64                                                  pass
>  build-i386                                                   pass
>  build-amd64-oldkern                                          pass
>  build-i386-oldkern                                           pass
>  build-amd64-pvops                                            pass
>  build-i386-pvops                                             pass
>  test-amd64-amd64-xl                                          pass
>  test-amd64-i386-xl                                           pass
>  test-i386-i386-xl                                            pass
>  test-amd64-xcpkern-i386-xl                                   pass
>  test-i386-xcpkern-i386-xl                                    fail
>  test-amd64-i386-rhel6hvm-amd                                 fail
>  test-amd64-xcpkern-i386-rhel6hvm-amd                         fail
>  test-amd64-i386-xl-credit2                                   pass
>  test-amd64-xcpkern-i386-xl-credit2                           pass
>  test-amd64-i386-rhel6hvm-intel                               fail
>  test-amd64-xcpkern-i386-rhel6hvm-intel                       fail
>  test-amd64-i386-xl-multivcpu                                 pass
>  test-amd64-xcpkern-i386-xl-multivcpu                         pass
>  test-amd64-amd64-pair                                        pass
>  test-amd64-i386-pair                                         pass
>  test-i386-i386-pair                                          pass
>  test-amd64-xcpkern-i386-pair                                 pass
>  test-i386-xcpkern-i386-pair                                  pass
>  test-amd64-amd64-pv                                          pass
>  test-amd64-i386-pv                                           pass
>  test-i386-i386-pv                                            pass
>  test-amd64-xcpkern-i386-pv                                   pass
>  test-i386-xcpkern-i386-pv                                    pass
>  test-amd64-i386-win-vcpus1                                   fail
>  test-amd64-i386-xl-win-vcpus1                                fail
>  test-amd64-amd64-win                                         fail
>  test-amd64-i386-win                                          fail
>  test-i386-i386-win                                           fail
>  test-amd64-xcpkern-i386-win                                  fail
>  test-i386-xcpkern-i386-win                                   fail
>  test-amd64-amd64-xl-win                                      fail
>  test-i386-i386-xl-win                                        fail
>  test-amd64-xcpkern-i386-xl-win                               fail
>
>
> ------------------------------------------------------------
> sg-report-flight on woking.cam.xci-test.com
> logs: /home/xc_osstest/logs
> images: /home/xc_osstest/images
>
> Logs, config files, etc. are available at
>    http://www.chiark.greenend.org.uk/~xensrcts/logs
>
> Test harness code can be found at
>    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
>
>
> Not pushing.
>
> ------------------------------------------------------------
> changeset:   22799:b59f04eb8978
> tag:         tip
> user:        Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> date:        Fri Jan 21 18:06:23 2011 +0000
>
>    libxl, minios: stubdom console based save/restore
>
>    Add two "special" PV consoles to stubdoms that are going to be used
>    to send and receive the qemu-xen save files on save/restore.
>
>    Use the second PV console to send the qemu-xen save file and the third
>    PV console to receive the qemu-xen save file on restore.
>
>    Fix the console shutdown function free_consfront that is called when the
>    qemu save file is closed.
>
>    Stubdom save/restore is still broken with xend.
>
>    Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>    Committed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>
>
> changeset:   22798:d6a79425a287
> user:        Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> date:        Fri Jan 21 18:03:18 2011 +0000
>
>    QEMU_TAG update
>
>
> changeset:   22797:94b4361852ba
> user:        Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> date:        Fri Jan 21 18:00:39 2011 +0000
>
>    QEMU_TAG update
>
>
> changeset:   22796:8b6e7f43683e
> user:        Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> date:        Fri Jan 21 17:57:31 2011 +0000
>
>    tools: Change the name of the qemu save file at restore time
>
>    Change the name of the qemu save file to qemu-resume at restore time so
>    that it doesn't collide with the name of the next save file.
>
>    Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>    Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>    Committed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>
>
> changeset:   22795:17181eaedb78
> user:        Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> date:        Fri Jan 21 17:45:56 2011 +0000
>
>    libxl: Fix for Windows UP crash on boot using xl
>
>    When parameter "vcpus" is not explicitly set in config file, cur_vpus
>    is initialized to 0 instead of 1, which causes domU crash.
>
>    Signed-off-by: Fengzhe Zhang <fengzhe.zhang@xxxxxxxxx>
>    Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>    Committed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>
>
> changeset:   22794:c342aa1476e8
> user:        Kamala Narasimhan <kamala.narasimhan@xxxxxxxxxx>
> date:        Fri Jan 21 17:44:07 2011 +0000
>
>    libxl: fix free of constant string in cdrom insert path
>
>    Signed-off-by: Kamala Narasimhan <kamala.narasimhan@xxxxxxxxxx>
>    Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>    Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>
>
> changeset:   22793:ca64bd257a6f
> parent:      22792:e9277ab43947
> parent:      22791:1637fdbfc21e
> user:        Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> date:        Fri Jan 21 16:04:59 2011 +0000
>
>    Merge
>
>
> changeset:   22792:e9277ab43947
> parent:      22787:003acf02d416
> user:        George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> date:        Fri Jan 21 15:37:36 2011 +0000
>
>    p2m: Allow non-leaf entries to be replaced by leaf entries
>
>    Allow l2 and l3 p2m tables to be replaced with 2MB and 1GB pages
>    respectively, freeing the p2m table page properly.  This allows, for 
> example,
>    a sequence of 512 singleton zero pages to be replaced with a superpage
>    populate-on-demand entry.
>
>    Changes:
>    * Add a p2m_free_ptp() corresponding to p2m_alloc_ptp(), which will
>    handle everything related to the freeing properly.
>    * Add p2m_free_entry(), based on ept_free_entry(), which will free
>    intermediate tables recursively.
>    * For both ept and p2m, when replacing non-leaf entries with leaf
>    entries, keep old entry and call *_free_entry() after new entry
>    has been written and proper flushes have been done.
>
>    Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>    Signed-off-by: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
>
>
> changeset:   22791:1637fdbfc21e
> user:        Keir Fraser <keir@xxxxxxx>
> date:        Fri Jan 21 16:01:12 2011 +0000
>
>    [VTD][QUIRK] turn off Sandybridge IGD quirk by default
>
>    Turn off Sandybridge IGD quirk by default until potential issues such
>    as MMIO register conflict with OS device driver and proper locking in
>    preamble and postamble functions are addressed.
>
>    Signed-off-by: Allen Kay <allen.m.kay@xxxxxxxxx>
>
>
> changeset:   22790:43592043cefc
> user:        Keir Fraser <keir@xxxxxxx>
> date:        Fri Jan 21 16:00:05 2011 +0000
>
>    svm: migrate pirq with vcpu
>
>    hvm_migrate_pirqs() is missing in svm_do_resume().
>
>    Signed-off-by: Wei Wang <wei.wang2@xxxxxxx>
>
>
> changeset:   22789:9bd5f65050f8
> user:        Keir Fraser <keir@xxxxxxx>
> date:        Fri Jan 21 15:58:27 2011 +0000
>
>    x86:x2apic: Disable x2apic on x86-32 permanently
>
>    x2apic initialization on x86_32 uses vcpu pointer before it is
>    initialized. As x2apic is unlikely to be used on x86_32, this patch
>    disables x2apic permanently on x86_32. It also asserts the sanity of
>    vcpu pointer before dereference to prevent further misuse.
>
>    Signed-off-by: Fengzhe Zhang <fengzhe.zhang@xxxxxxxxx>
>
>
> changeset:   22788:5852612cd4c4
> user:        Keir Fraser <keir@xxxxxxx>
> date:        Fri Jan 21 15:57:18 2011 +0000
>
>    x86_32: Assert map_domian_page() not used too early.
>
>    Signed-off-by: Fengzhe Zhang <fengzhe.zhang@xxxxxxxxx>
>
>
> changeset:   22787:003acf02d416
> user:        Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> date:        Thu Jan 20 17:04:06 2011 +0000
>
>    libxl: Make domain_shutdown fail if graceful not possible
>
>    Currently "xl shutdown" (like "xm shutdown") is not capable of doing
>    the proper ACPI negotiation with an HVM no-pv-drivers guest which
>    would be necessary for a graceful shutdown.
>
>    Instead (following the ill-advised lead of "xm shutdown") it simply
>    shoots the guest in the head.
>
>    This patch changes the behaviour so that "xl shutdown" fails if the
>    domain cannot be shut down gracefully for this reason and suggests in
>    the error message using destroy instead.
>
>    Also, check whether the PV shutdown protocol is available before we
>    try to use it.
>
>    Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>    Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>    Committed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>
>
> ========================================
> commit 6a6ffc07f465d5abc94b45ce6fe389369d1395ab
> Author: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> Date:   Fri Jan 21 18:01:53 2011 +0000
>
>    qemu-xen: disable buffering on the save file for stubdoms
>
>    We need to issue reads with the exact number of bytes to read the
>    qemu-xen save file, and to do that this patch disables buffering on all
>    the savevm reads/writes for stubdoms.
>
>    [This is pretty horrid; hopefully there will be better solution for 4.2 
> -iwj]
>
>    Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.