[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Guest-vs-Host MTRR/PAT conflict and a crash?


  • To: "Su, Disheng" <disheng.su@xxxxxxxxx>
  • From: "David Stone" <unclestoner@xxxxxxxxx>
  • Date: Wed, 2 Jan 2008 16:41:50 -0500
  • Cc: Xen Developers <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 02 Jan 2008 13:42:28 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=VTbSxVtwtHVEKBW05TlDfgFlY59DBTctT/HFCzlGfeTiwFsptTUOUxbQBMzohSZsNrqAymcBc0W3RUs1tuVVJJ/jU7F899NO3wf7oD1dDXRGrKTZTYgZh/58zTIC5aVdO2CenpC/3NAOj+WnZuyx898ryJurWoRZOg92Q9xUrJA=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thanks for your response.  I've done a bit more troubleshooting on
this.  Below is the error message again:

> >>> root@localhost xen]# (XEN) mtrr.c:552:d1 Conflict occurs for a given
> >>> guest l1e flags:63 at 10000000 (the effective mm type:6), because
> >>> the host mtrr type is:0 (XEN) CPU 1: Machine Check Exception:
> >>> 0000000000000005 (XEN) Bank 0: b200004000000800
> >>> (XEN) Bank 5: b200121020080400
> >>> (XEN)
> >>> (XEN) ****************************************
> >>> (XEN) Panic on CPU 1:
> >>> (XEN) CPU context corrupt****************************************
> >>> (XEN) (XEN) Reboot in five seconds..

I know that theoretically the memory cache-type mismatch shouldn't
directly cause a Machine Check, but I can't help but think it's
related...I see the machine check if and only if I see the cache-type
mismatch and they happen in quick succession.

The guest physical address is 0x10000000 as shown above.  I added more
tracing and found that it corresponds to host address 0x80020000.
>From the qemu logs I also found that this is a PCI BAR for my PCI
Express graphics card that I am trying to pass through via IOMMU (see
below)
  pt_register_regions: IO region registered (size=0x00010000
base_addr=0x80020000).
With lspci on Dom0 I confirmed that 0x80020000 is a 64KB region of
address space assigned to the PCI-XP graphics card.  It is marked
non-prefetchable.  (The card also has a 256MB region assigned to it as
prefetchable.)  I also found that both the guest PAT and the guest
MTRR for 0x10000000 classify that address as type 6
(MTRR_TYPE_WRBACK), making the guest effective type also 6.

So my first question is, does anyone have a guess as to what this 64KB
region assigned to the graphics card is for?  I assume the 256MB
region is the general-purpose video memory for textures, vertices,
etc.

The mtrr warning message happens when the shadow page table is getting
updated as the guest is trying to update his page tables.  But why
would the guest only update the PTE at the beginning of the 64KB
region, and not all 64KB/4KB=16 PTEs in the region?  I assume the
guest isn't updating them all, because then I would get 16 of the mtrr
warning messages?  I wonder if the guest is updating the page table
(causing the MTRR warning but succeeding), and then trying to
read/write from that page, and this is timing out causing the machine
check?

Any help is much appreciated!
Dave

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.