[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fbdev graphics broken in xen/next dom0



On 03/27/2010 02:14 AM, Arvind R wrote:
On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh<ewalsh@xxxxxxxxxxxxx>  wrote:
On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote:
<  --- snip --->
I have attached the serial console output and dmesg output.  The
initcall and drm debug stuff is present.

Also, I get something new when I run the test program.  It prints out:

# ./silly
Mapped /dev/fb0 at 0x7f3237175000
Killed

Message from syslogd@moss-flapper at Mar 25 19:25:52 ...
  kernel:Bad pagetable: 000f [#1] SMP

<  --- snip --->
silly: Corrupted page table at address 7f3237175000
PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f
Bad pagetable: 000f [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat nf_nat 
nfsd lockd nfs_acl auth_rpcgss export]
<  --- snip --->
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

You look to have a i915 framebuffer on your box.

I *think* that the i915 is not using KMS and the TTM stuff, so the
patch that Arvind posted would probably not help you.
http://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg48668.html

So, lets boot your kernel with these command line parameters to get more
data: debug initcall_debug drm.debug=255
<  --- snip --->

e-mail thread titled: "Nouveau on dom0". It covers the gamma of things
to troubleshoot this.
This is related and most probably due to the same bit. xf86-video-fbdev works
on bare-metal boot on XenNext with the nouveaufb driver but not on Xen.
Have upgraded whole chain to tip except xen which is 3.4.3rc3
Here is the syslog trace:
kernel: ------------[ cut here ]------------
kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86()
kernel: Hardware name: System Product Name
kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm
drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc
ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse
kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5wd #1
kernel: Call Trace:
kernel:  [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
kernel:  [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
kernel:  [<ffffffff8103ce54>] ? warn_slowpath_common+0x77/0xa3
kernel:  [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
kernel:  [<ffffffff8100c436>] ? xen_leave_lazy_mmu+0x25/0x43
kernel:  [<ffffffff81090c49>] ? copy_page_range+0x76/0x7f8
kernel:  [<ffffffff8100ddc9>] ? xen_force_evtchn_callback+0x9/0xa
kernel:  [<ffffffff8100e572>] ? check_events+0x12/0x20
kernel:  [<ffffffff8100e55f>] ? xen_restore_fl_direct_end+0x0/0x1
kernel:  [<ffffffff8103b1f2>] ? dup_mm+0x276/0x409
kernel:  [<ffffffff8103bd82>] ? copy_process+0x9c8/0x10ff
kernel:  [<ffffffff8103c5ff>] ? do_fork+0x146/0x2c0
kernel:  [<ffffffff810110a3>] ? stub_clone+0x13/0x20
kernel:  [<ffffffff81010d82>] ? system_call_fastpath+0x16/0x1b
kernel: ---[ end trace c58bf004d15b0c42 ]---

Xorg.log ends with the same message as originally with trying
accelerated nouveau with misleading
XKB: Failed to compile keymap

fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c
which does a mmap as in silly.c.  As far as X is concerned, everything
is fine, but there is obviously a page-fault problem. Will have to setup
debug options and trace :-(

The 'corrupted page table' syndrome is also present in the accelerated
nouveau with AGP cards - so it may be linked to this problem. At least
this problem can be repeated on many platforms :-)

The "corrupt pagetable" comes from the pte having invalid reserved bits set in it. I think the failure path is this:

The bad bits get set because someone is doing a pfn->mfn conversion on a page which is already an mfn, and doesn't have a valid pfn->mfn mapping, and the result of the conversion is either 0xff... or 0x7f... (I forget right now). But either way, a whole lot of bits get set, but nothing useful. I'm not quite sure why Xen isn't complaining about this at set-pte time, but perhaps it looks vaguely valid to it (perhaps it sees the invalid flags, knows the pte can't be used to access anything, and allows it to be set?). But this fault is happening because usermode gets a tlb miss, and the CPU finds a pte with reserved bits set, and raises the fault.

I'm not sure about the mm/pat.c warning thought. I had a quick look at that code, but it wasn't obvious to me what's going on there. Something about handing the IO mapping during a fork(). Not sure if its related or not.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.