[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b


  • To: "Woller, Thomas" <thomas.woller@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
  • Date: Thu, 10 Jan 2008 22:54:23 +0000
  • Delivery-date: Thu, 10 Jan 2008 14:56:04 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AchTvZRmzXPzqiDHRt2NZV4oQGTBBwAAgByAAALmgp8AADCpsAACJ0NGAAHL2Zc=
  • Thread-topic: [Xen-devel] RE: BUG() w/ HVM win2k3 64b

Fixed by c/s 16704.

 K.

On 10/1/08 22:02, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote:

> Oh, the bug is obvious actually. It's introduced by 16491, and is because
> dst.type is getting clobbered to OP_NONE before it is tested for OP_REG.
> I'll sort out a fix.
> 
>  Thanks!
>  Keir
> 
> On 10/1/08 21:11, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:
> 
>>> 16489 and 16491 are obviously suspects. You might also try current tip
>>> (-rc5) as some emulator bugs were fixed in the last day or
>>> so. 
>> 16491 just failed a few mins ago.  16490 passed at 9 hours, although
>> could use more time.
>> We are down to 3 1P test systems available for use till next week, and
>> will start up:
>> 1) 16701 minus 16491
>> 2) 16701
>> 3) 16701
>> 
>> And let them run overnight, which *should* be enough time.  If above all
>> fail, we'll have to go back and work with 16489/16490 more closely with
>> more time in test.
>> 
>>> Was your successful 16488 test stressful enough to be
>>> confident that it's not a false negative (for the bug)?
>> Yes, 2 systems confirmed 16488 passed.   Btw 3.1.3 passes also.
>> 
>> tom
>> 
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>>> Sent: Thursday, January 10, 2008 2:56 PM
>>> To: Woller, Thomas; xen-devel@xxxxxxxxxxxxxxxxxxx
>>> Subject: Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
>>> 
>>> 16489 and 16491 are obviously suspects. You might also try current tip
>>> (-rc5) as some emulator bugs were fixed in the last day or
>>> so. Was your successful 16488 test stressful enough to be
>>> confident that it's not a false negative (for the bug)?
>>> 
>>>  -- Keir
>>> 
>>> On 10/1/08 19:36, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:
>>> 
>>>>> We have seen failures with changesets >= 16492, latest tested was
>>>>> 16676 that fails, and c/s 16488 passes without issue.
>>>> clarification to my email, was thinking that c/s 16491 was
>>> the problem 
>>>> (not 16492 as I indicated),
>>>> 
>>>> 16492 has failed tests, and 16491 c/s is running fine right
>>> now - but 
>>>> need more test time on that c/s to see if it will fail.
>>>> 
>>>> So, just to be clear, still don't have a handle on which
>>> specific c/s 
>>>> is the problem, but still seems around 1649x-ish
>>>> 
>>>> Tom
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Woller, Thomas
>>>>> Sent: Thursday, January 10, 2008 1:18 PM
>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> Cc: Woller, Thomas
>>>>> Subject: BUG() w/ HVM win2k3 64b
>>>>> 
>>>>> We are observing a BUG() with 3.2/unstable.  This problem takes a
>>>>> number of hours to reproduce - anywhere from 4 to 12+
>>> hours, and only
>>>>> with windows 2003 64b HVM multi-vcpu guest so far under
>>> heavy stress 
>>>>> load.
>>>>> 
>>>>> Only reproduceable using Shadow Paging, we have not see
>>> the problem 
>>>>> using nested paging.
>>>>> 
>>>>> We have seen failures with changesets >= 16492, latest tested was
>>>>> 16676 that fails, and c/s 16488 passes without issue.
>>>>> 
>>>>> We have tried to narrow down the issue to a specific
>>> changeset, and 
>>>>> overnight testing seems to indicate that changeset 14692
>>> might be the 
>>>>> culprit.  Not quite confirmed until additional testing completes
>>>>> tomorrow on c/s 14691 and 14690.  We will know more EOD
>>> thursday if 
>>>>> these 2 pass testing.
>>>>> 
>>>>> We will startup some testing using 16701 also to make sure
>>> that it is 
>>>>> not resolved with post 16676 patches.  I'll also try to startup a
>>>>> test with removing c/s 16492 from 16701 base and see if that helps
>>>>> this specific problem.  All of this testing though will not finish
>>>>> till towards end of next week due to largescale move of
>>> lab/offices 
>>>>> starting tomorrow - and with 3.2 almost out, would like to
>>> see this 
>>>>> figured out before release.
>>>>> 
>>>>> Reproduced on 1P family11h and family10h systems, but unable to
>>>>> reproduce on 2P+ systems so far.  We don't believe we are
>>>>> seeing any sort of h/w anomoly at this point.   have not
>>>>> tried reproducing on VT boxes.
>>>>> 
>>>>> We are able to reproduce using 2 64b windows Guests,
>>> currently we are
>>>>> using 2 or 4 VCPUs, but have not tried reducing to single VCPU.
>>>>> 
>>>>> Any debug thoughts are appreciated.
>>>>> 
>>>>> Looks like the dst.mem.seg is invalid for the read() in
>>> Grp5 case 2/4 
>>>>> (jmp/call), which results in the BUG() later.
>>>>> 
>>>>> X86_emulate:
>>>>> ...
>>>>>     case 0xff: /* Grp5 */
>>>>>         switch ( modrm_reg & 7 )
>>>>>         {
>>>>>         case 0: /* inc */
>>>>>             emulate_1op("inc", dst, _regs.eflags);
>>>>>             break;
>>>>>         case 1: /* dec */
>>>>>             emulate_1op("dec", dst, _regs.eflags);
>>>>>             break;
>>>>>         case 2: /* call (near) */
>>>>>         case 4: /* jmp (near) */
>>>>>             dst.type = OP_NONE;
>>>>>             if ( (dst.bytes != 8) && mode_64bit() )
>>>>>             {
>>>>>                 dst.bytes = op_bytes = 8;
>>>>>                 if ( dst.type == OP_REG )
>>>>>                     dst.val = *dst.reg;
>>>>>                 else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
>>>>>                                           &dst.val, 8,
>>> ctxt)) != 0 )
>>>>>                     goto done;
>>>>>          
>>>>> 
>>>>> Guest config:
>>>>> HVM Windows 2003 64b
>>>>> vcpus=4
>>>>> memory=1024
>>>>> pae/acpi/apic=1
>>>>> 
>>>>> BUG() info.
>>>>> (XEN) Xen BUG at svm.c:599
>>>>> (XEN) ----[ Xen-3.2.0-rc3  x86_64  debug=n  Tainted:    C ]----
>>>>> (XEN) CPU:    2
>>>>> (XEN) RIP:    e008:[<ffff828c80165205>]
>>>>> svm_get_segment_register+0x145/0x170
>>>>> (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor
>>>>> (XEN) rax: ffff8300a6e0ff28   rbx: ffff8300a7dde000   rcx:
>>>>> 00000000a6e0fa28
>>>>> (XEN) rdx: ffff830b14f09f54   rsi: 00000000a6e0fa28   rdi:
>>>>> ffff8300a7ddc080
>>>>> (XEN) rbp: ffff830b14f09f54   rsp: ffff8300a6e0f850   r8:
>>>>> ffff8300a6e0fc98
>>>>> (XEN) r9:  ffff8300a6e0f8c8   r10: 0000000000000000   r11:
>>>>> 0000000000000001
>>>>> (XEN) r12: ffff8300a6e0f8c8   r13: 0000000000000001   r14:
>>>>> 00000000a6e0fa28
>>>>> (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4:
>>>>> 00000000000006f0
>>>>> (XEN) cr3: 000000003b75b000   cr2: 000000000247f000
>>>>> (XEN) ds: 0000   es: 0000   fs: 0053   gs: 002b   ss: 0000
>>>   cs: e008
>>>>> (XEN) Xen stack trace from rsp=ffff8300a6e0f850:
>>>>> (XEN)    ffff830b14f09f54 0000000000000000 ffff828c80178eea
>>>>> ffff8300a6e0fc98
>>>>> (XEN)    ffff828c80179d0c ffff8300a6e0f8d0 ffff8300a6e0fb20
>>>>> 0000000000000001
>>>>> (XEN)    0000000000000008 ffff8300a6e0fc98 ffff8300a6e0fc98
>>>>> 0000000000000004
>>>>> (XEN)    ffff828c80179e46 0000000000000000 fffffadff3c54040
>>>>> fffffadff04cbde0
>>>>> (XEN)    0000000000000002 ffff828c801c18e0 0000000000000008
>>>>> 0000000000000000
>>>>> (XEN)    ffff828c80146be5 0000000000000001 ffff8300a6e0ff28
>>>>> 000000003a4002e7
>>>>> (XEN)    00000002a6e0fb87 ffff8300a6e0fbc8 0000001100000000
>>>>> 0000000080a572b0
>>>>> (XEN)    ffff8300a6e0f9d8 ffff828c801c18e0 0000000000000000
>>>>> 0000000000000000
>>>>> (XEN)    00000006a6e0fbc8 fffff80000812be8 0000468c8015a2b0
>>>>> ffff8300a6e0fb03
>>>>> (XEN)    0000000000000296 0000000000000002 ffff8300a7dd2080
>>>>> 0000000000000000
>>>>> (XEN)    ffff828c8013974a 0000000000000000 00000000ffffffff
>>>>> ffff830000000046
>>>>> (XEN)    ffff8300a7dd37e0 fffffadff04cbe00 fffffadff04cbd70
>>>>> ffff8300a7dcd7e0
>>>>> (XEN)    ffff828c80161206 fffff80000341070 fffffadff410d040
>>>>> 0000000000000000
>>>>> (XEN)    fffffadff41171f0 0000000000000080 fffffadff35ce040
>>>>> fffff78000000008
>>>>> (XEN)    0000000000000000 0000000000000000 fffffadff35ce040
>>>>> fffffadff1a73010
>>>>> (XEN)    fffffadff3699f90 fffffadff3699f90 fffffadff35ce040
>>>>> fffffadff3c54040
>>>>> (XEN)    0000000000000003 fffff80001272bae 0000000000000000
>>>>> 0000000000000246
>>>>> (XEN)    fffffadff04cbd70 0000000000000000 5555555555555555
>>>>> 5555555555555555
>>>>> (XEN)    5555555555555555 5555555555555555 00000001801324cd
>>>>> 0000000000000004
>>>>> (XEN)    ffffffffffffffff ffff8300a7ddc080 000fffff80001272
>>>>> ffff8300a6e0fba4
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff828c80165205>] svm_get_segment_register+0x145/0x170
>>>>> (XEN)    [<ffff828c80178eea>] hvm_get_seg_reg+0x3a/0x40
>>>>> (XEN)    [<ffff828c80179d0c>] hvm_translate_linear_addr+0x3c/0xa0
>>>>> (XEN)    [<ffff828c80179e46>] hvm_read+0x36/0xe0
>>>>> (XEN)    [<ffff828c80146be5>] x86_emulate+0x3f35/0x9940
>>>>> (XEN)    [<ffff828c8013974a>] smp_send_event_check_mask+0x3a/0x40
>>>>> (XEN)    [<ffff828c80161206>] vlapic_write+0x546/0x7e0
>>>>> (XEN)    [<ffff828c8017f3f5>]
>>>>> sh_gva_to_gfn__shadow_4_guest_4+0xc5/0x150
>>>>> (XEN)    [<ffff828c80152d27>] __hvm_copy+0x97/0x280
>>>>> (XEN)    [<ffff828c8017f2ba>] guest_walk_tables+0x80a/0x880
>>>>> (XEN)    [<ffff828c8017a206>] shadow_init_emulation+0x126/0x160
>>>>> (XEN)    [<ffff828c80182bd5>]
>>>>> sh_page_fault__shadow_4_guest_4+0xdb5/0xe80
>>>>> (XEN)    [<ffff828c80128259>] context_switch+0xb79/0xbc0
>>>>> (XEN)    [<ffff828c8016753c>] svm_vmexit_handler+0x6ac/0x1a70
>>>>> (XEN)    [<ffff828c801160bf>] schedule+0x25f/0x290
>>>>> (XEN)    [<ffff828c8015fcbd>] vlapic_has_pending_irq+0x2d/0x70
>>>>> (XEN)    [<ffff828c80163dc6>] svm_intr_assist+0x46/0x140
>>>>> (XEN)    [<ffff828c801692d4>] svm_stgi_label+0x8/0x14
>>>>> (XEN)    
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 2:
>>>>> (XEN) Xen BUG at svm.c:599
>>>>> (XEN) ****************************************
>>>>> (XEN)
>>>>> (XEN) Manual reset required ('noreboot' specified)
>>>>> 
>>>>>   --Tom
>>>>> 
>>>>> thomas.woller@xxxxxxx  +1-512-602-0059 AMD Corporation - Operating
>>>>> Systems Research Center
>>>>> 5204 E. Ben White Blvd. UBC1
>>>>> Austin, Texas 78741
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.