[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] BUG() w/ HVM win2k3 64b


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Woller, Thomas" <thomas.woller@xxxxxxx>
  • Date: Thu, 10 Jan 2008 13:18:28 -0600
  • Cc: "Woller, Thomas" <thomas.woller@xxxxxxx>
  • Delivery-date: Thu, 10 Jan 2008 11:19:32 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AchTvZRmzXPzqiDHRt2NZV4oQGTBBw==
  • Thread-topic: BUG() w/ HVM win2k3 64b

We are observing a BUG() with 3.2/unstable.  This problem takes a number
of hours to reproduce - anywhere from 4 to 12+ hours, and only with
windows 2003 64b HVM multi-vcpu guest so far under heavy stress load.  

Only reproduceable using Shadow Paging, we have not see the problem
using nested paging.

We have seen failures with changesets >= 16492, latest tested was 16676
that fails, and c/s 16488 passes without issue.  

We have tried to narrow down the issue to a specific changeset, and
overnight testing seems to indicate that changeset 14692 might be the
culprit.  Not quite confirmed until additional testing completes
tomorrow on c/s 14691 and 14690.  We will know more EOD thursday if
these 2 pass testing. 

We will startup some testing using 16701 also to make sure that it is
not resolved with post 16676 patches.  I'll also try to startup a test
with removing c/s 16492 from 16701 base and see if that helps this
specific problem.  All of this testing though will not finish till
towards end of next week due to largescale move of lab/offices starting
tomorrow - and with 3.2 almost out, would like to see this figured out
before release.

Reproduced on 1P family11h and family10h systems, but unable to
reproduce on 2P+ systems so far.  We don't believe we are seeing any
sort of h/w anomoly at this point.   have not tried reproducing on VT
boxes.

We are able to reproduce using 2 64b windows Guests, currently we are
using 2 or 4 VCPUs, but have not tried reducing to single VCPU.

Any debug thoughts are appreciated.

Looks like the dst.mem.seg is invalid for the read() in Grp5 case 2/4
(jmp/call), which results in the BUG() later.  

X86_emulate:
...
    case 0xff: /* Grp5 */
        switch ( modrm_reg & 7 )
        {
        case 0: /* inc */
            emulate_1op("inc", dst, _regs.eflags);
            break;
        case 1: /* dec */
            emulate_1op("dec", dst, _regs.eflags);
            break;
        case 2: /* call (near) */
        case 4: /* jmp (near) */
            dst.type = OP_NONE;
            if ( (dst.bytes != 8) && mode_64bit() )
            {
                dst.bytes = op_bytes = 8;
                if ( dst.type == OP_REG )
                    dst.val = *dst.reg;
                else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
                                          &dst.val, 8, ctxt)) != 0 )
                    goto done;
         

Guest config:
HVM Windows 2003 64b
vcpus=4
memory=1024
pae/acpi/apic=1

BUG() info.
(XEN) Xen BUG at svm.c:599
(XEN) ----[ Xen-3.2.0-rc3  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff828c80165205>]
svm_get_segment_register+0x145/0x170
(XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor
(XEN) rax: ffff8300a6e0ff28   rbx: ffff8300a7dde000   rcx:
00000000a6e0fa28
(XEN) rdx: ffff830b14f09f54   rsi: 00000000a6e0fa28   rdi:
ffff8300a7ddc080
(XEN) rbp: ffff830b14f09f54   rsp: ffff8300a6e0f850   r8:
ffff8300a6e0fc98
(XEN) r9:  ffff8300a6e0f8c8   r10: 0000000000000000   r11:
0000000000000001
(XEN) r12: ffff8300a6e0f8c8   r13: 0000000000000001   r14:
00000000a6e0fa28
(XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4:
00000000000006f0
(XEN) cr3: 000000003b75b000   cr2: 000000000247f000
(XEN) ds: 0000   es: 0000   fs: 0053   gs: 002b   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8300a6e0f850:
(XEN)    ffff830b14f09f54 0000000000000000 ffff828c80178eea
ffff8300a6e0fc98
(XEN)    ffff828c80179d0c ffff8300a6e0f8d0 ffff8300a6e0fb20
0000000000000001
(XEN)    0000000000000008 ffff8300a6e0fc98 ffff8300a6e0fc98
0000000000000004
(XEN)    ffff828c80179e46 0000000000000000 fffffadff3c54040
fffffadff04cbde0
(XEN)    0000000000000002 ffff828c801c18e0 0000000000000008
0000000000000000
(XEN)    ffff828c80146be5 0000000000000001 ffff8300a6e0ff28
000000003a4002e7
(XEN)    00000002a6e0fb87 ffff8300a6e0fbc8 0000001100000000
0000000080a572b0
(XEN)    ffff8300a6e0f9d8 ffff828c801c18e0 0000000000000000
0000000000000000
(XEN)    00000006a6e0fbc8 fffff80000812be8 0000468c8015a2b0
ffff8300a6e0fb03
(XEN)    0000000000000296 0000000000000002 ffff8300a7dd2080
0000000000000000
(XEN)    ffff828c8013974a 0000000000000000 00000000ffffffff
ffff830000000046
(XEN)    ffff8300a7dd37e0 fffffadff04cbe00 fffffadff04cbd70
ffff8300a7dcd7e0
(XEN)    ffff828c80161206 fffff80000341070 fffffadff410d040
0000000000000000
(XEN)    fffffadff41171f0 0000000000000080 fffffadff35ce040
fffff78000000008
(XEN)    0000000000000000 0000000000000000 fffffadff35ce040
fffffadff1a73010
(XEN)    fffffadff3699f90 fffffadff3699f90 fffffadff35ce040
fffffadff3c54040
(XEN)    0000000000000003 fffff80001272bae 0000000000000000
0000000000000246
(XEN)    fffffadff04cbd70 0000000000000000 5555555555555555
5555555555555555
(XEN)    5555555555555555 5555555555555555 00000001801324cd
0000000000000004
(XEN)    ffffffffffffffff ffff8300a7ddc080 000fffff80001272
ffff8300a6e0fba4
(XEN) Xen call trace:
(XEN)    [<ffff828c80165205>] svm_get_segment_register+0x145/0x170
(XEN)    [<ffff828c80178eea>] hvm_get_seg_reg+0x3a/0x40
(XEN)    [<ffff828c80179d0c>] hvm_translate_linear_addr+0x3c/0xa0
(XEN)    [<ffff828c80179e46>] hvm_read+0x36/0xe0
(XEN)    [<ffff828c80146be5>] x86_emulate+0x3f35/0x9940
(XEN)    [<ffff828c8013974a>] smp_send_event_check_mask+0x3a/0x40
(XEN)    [<ffff828c80161206>] vlapic_write+0x546/0x7e0
(XEN)    [<ffff828c8017f3f5>] sh_gva_to_gfn__shadow_4_guest_4+0xc5/0x150
(XEN)    [<ffff828c80152d27>] __hvm_copy+0x97/0x280
(XEN)    [<ffff828c8017f2ba>] guest_walk_tables+0x80a/0x880
(XEN)    [<ffff828c8017a206>] shadow_init_emulation+0x126/0x160
(XEN)    [<ffff828c80182bd5>]
sh_page_fault__shadow_4_guest_4+0xdb5/0xe80
(XEN)    [<ffff828c80128259>] context_switch+0xb79/0xbc0
(XEN)    [<ffff828c8016753c>] svm_vmexit_handler+0x6ac/0x1a70
(XEN)    [<ffff828c801160bf>] schedule+0x25f/0x290
(XEN)    [<ffff828c8015fcbd>] vlapic_has_pending_irq+0x2d/0x70
(XEN)    [<ffff828c80163dc6>] svm_intr_assist+0x46/0x140
(XEN)    [<ffff828c801692d4>] svm_stgi_label+0x8/0x14
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Xen BUG at svm.c:599
(XEN) ****************************************
(XEN) 
(XEN) Manual reset required ('noreboot' specified)

  --Tom

thomas.woller@xxxxxxx  +1-512-602-0059
AMD Corporation - Operating Systems Research Center
5204 E. Ben White Blvd. UBC1
Austin, Texas 78741




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.