[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [Patch RFC] ttm: nouveau accelerated on Xen pv-ops kernel



xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote on 03/23/2010 02:21:31 AM:

> On Tue, Mar 23, 2010 at 2:44 AM, Michael D Labriola <mlabriol@xxxxxxxx> 
wrote:
> > xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote on 03/20/2010 02:01:54 AM:
> >
> >> On Fri, Mar 19, 2010 at 8:59 PM, Michael D Labriola 
<mlabriol@xxxxxxxx>
> > wrote:
> >> > xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote on 03/18/2010 02:09:08 
AM:
> >> >
> >> >> On Wed, Mar 17, 2010 at 1:09 AM, Michael D Labriola
> > <mlabriol@xxxxxxxx>
> >> > wrote:
> >> >> > Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote on 
03/16/2010
> >> >> > 01:21:35 PM:
> >> >> >
> >> >> >> > > > And my X log ends abruptly after this line:
> >> >> >> > > > (II) NOUVEAU(0): Opened GPU Channel 1
> >> >> >> > > >
> >> >> >> > > > Any ideas?
> >> >> >> > > >
> >> >> >> > >
> >> >> >> > > Well, this is generally the symptom that someone is 
confusing
> >> > mfns
> >> >> > and
> >> >> >> > > pfns, and therefore ends up incorrectly setting the 
_PAGE_IO
> > flag
> >> > in
> >> >> >
> >> >> >> > > some pte.  If you run it under strace, can you identify 
which
> >> >> > mapping
> >> >> >> > > the fault is happening in?
> >> >> >> >
> >> >> >> > I've attached the output of 'strace -o strace-Xorg Xorg'.
> >  Figuring
> >> >> > out
> >> >> >> > which mapping the fault is happening in is a little over my
> > head,
> >> > I'm
> >> >> >> > afraid.  If you need different arguments to strace, let me 
know
> > and
> >> >> > I'll
> >> >> >> > do it again.
> >> >> >>
> >> >> >> So just to be sure, you took the 2.6.32 (xen/next or
> >> >> >> xen/stable-2.6.32.x), copied the include and nouveu directory 
from
> >> > ..?
> >> >> >> 2.6.33? and then ran this.
> >> >> >
> >> >> > I actually took a slightly more sadistic route than Arvind... 
;-)
> >  A
> >> > while
> >> >> > back, I backported the important stuff from the Nouveau kernel 
git
> >> > tree
> >> >> > back to v2.6.31.  Basically guessed at which commits were
> > important,
> >> > wrote
> >> >> > a script to cherry pick each and every one, and spent an entire 
day
> >> >> > reading commit logs, resolving conflicts, and figuring out which
> > other
> >> >> > non-drm commits I needed.  Sounds retarded, I know, but it was a
> >> > pretty
> >> >> > interesting way to get myself up to speed with the code base. 
 The
> >> >> > resulting 2.6.31-nouveau kernel runs like a champ on all my
> > hardware.
> >> >> >
> >> >> > Then I merged that into my clone of Jeremy's xen/master which I 
use
> >> > with
> >> >> > Xen 3.4.2.
> >> >> >
> >> >> > Since then, I've been periodically cherry picking all new 
commits
> > off
> >> > the
> >> >> > nouveau tree.  Also had to rebuild Xorg 7.5 to use xorg-server
> > 1.7.5,
> >> > new
> >> >> > libdrm, mesa, and xf86-video-nouveau all from their respective 
git
> >> > trees
> >> >> > as of yesterday.  (drm and xf86-video-nouveau are on their 
master
> >> >> > branches, mesa is on the 7.8 branch)
> >> >> >
> >> >> > This all works great using xen/master bare metal.  It used to 
work
> >> > fine on
> >> >> > my old GeForce2 MX based systems in Xen.  Arvind's patch made it
> > work
> >> > on
> >> >> > my nice new systems in Xen, but broke it on the old ones.
> >  Everything
> >> >> > still works fine bare metal.
> >> >> >
> >> >> >> Did you have to edit your xorg.conf file or
> >> >> >> it ran just fine?
> >> >> >
> >> >> > Well, I had to create an xorg.conf that looks like this:
> >> >> >
> >> >> > Section "Device"
> >> >> >  Identifier "foo"
> >> >> >  Driver "nouveau"
> >> >> > EndSection
> >> >> >
> >> >> > Otherwise it uses the 'nv' driver...  and I haven't stumbled 
onto
> > how
> >> > to
> >> >> > get nouveau to automatically get used (aside from uninstalling 
the
> > nv
> >> >> > driver).
> >> >> >
> >> >> >
> >> >> >> Was this Fedora 13 or Fedora 12?
> >> >> >
> >> >> > This is all being done on a custom 32bit Linux distro that we 
use
> > for
> >> > our
> >> >> > tightly configuration controlled system deliveries.  It was 
fedora
> >> > based a
> >> >> > looooooooong time ago (FC5), but is completely unrecognizable 
now.
> >> >> >
> >> >> >
> >> >> >> Arvind explanation about the Nvidia driver pointed out that the
> >> > NVidia
> >> >> >> driver (drm/nouvue) can operate on different channels. Where
> > channel
> >> > 1
> >> >> >> is the framebuffer, and the other are for well, KMS, and other
> >> >> >> applications.
> >> >> >>
> >> >> >> I belive I was looking at the wrong section of the drivers (not
> > the
> >> >> >> drivers/video/gpu ones)- this certainly looks to be the issues 
the
> >> >> >> Jeremy mentioned.
> >> >> >>
> >> >> >> Also I would suggest you load drm with the debug variable set 
to
> > the
> >> > 255
> >> >> >> to get most of what his happening.
> >> >> >
> >> >> > I'll try that.
> >> >> >
> >> >> >
> >> >> >> Based on your strace, the last call is:
> >> >> >> 4000)                          = 0x9324000
> >> >> >> write(0, "(II) NOUVEAU(0): Opened GPU chan"..., 38) = 38
> >> >> >> ioctl(11, 0xc0106445, 0x930a908)        = 0
> >> >> >> ioctl(11, 0x400c6444, 0xbfd2a210)       = 0
> >> >> >> +++ killed by SIGKILL +++
> >> >> >>
> >> >> >> I cannot find what 0x45 is in the upstream Linux, so you must 
be
> >> > using a
> >> >> >> different nouv* driver than that. The 0x44 is:
> >> >> >>
> >> >> >>   DRM_IOCTL_DEF(DRM_NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info,
> >> > DRM_AUTH),
> >> >> >>
> >> >> >> Which looks to be pretty harmless. I presume it is the next 
thing
> >> > (using
> >> >> >> the address returned) that the X driver tries to do that makes 
it
> > go
> >> >> > boom.
> >> >> >
> >> >> I suspect that the ioctl is prior to a modeset operation. And the
> >> >> mode-setting is 'booming'.
> >> >> My kernel config has VGA console built-in fbcon as a module and I 
do
> >> >> a switch to
> >> >> nouveaufb at runlevel 2. Also note that the default modeset
> >> >> parameter is -1 and
> >> >> if VGA-CONSOLE is enabled, then modeset is set to 0 in the driver
> >> >> initialisation
> >> >> - which maybe the problem. Do you have modeset=1 as module 
parameter?
> >> >
> >> > I wasn't setting any module params for nouveau.  Adding 'options
> > nouveau
> >> > modeset=1' to modprobe.conf didn't seem to make any difference.
> >> >
> >> > I've got the following in my .config:
> >> >
> >> > CONFIG_VGA_CONSOLE=y
> >> > CONFIG_FB=y
> >> > CONFIG_FB_VGA16=m
> >> > CONFIG_FB_VESA=y
> >> > CONFIG_FB_EFI=y
> >> > CONFIG_FB_NVIDIA=m
> >> > CONFIG_FB_NVIDIA_I2C=y
> >> > CONFIG_FB_NVIDIA_BACKLIGHT=y
> >> >
> >>  - EMBEDDED  - this will enable VGA_CONSOLE selection. Set sub-menu
> >> choices as needed
> >>  - VGA_CONSOLE builtin
> >>  - FB as module
> >>  - FRAMEBUFFER_CONSOLE as a module. Enables late loading of nouveau
> >>  * Foll. required to avoid cfb_copyarea, cfb_fillrectangle,
> >> cfb_imageblit linking problems with
> >>     out-of-tree nouveau builds
> >>  - FB_VGA16 as module - supported by all nVidia cards.
> >>    or
> >>  - FB_NVIDIA as module - only works for older cards.
> >>
> >> For out-of-tree nouveau builds, DO NOT select ANY accelerated drivers
> >> - that would enable
> >> the old in-tree DRM. New TTM / DRM modulesare in the new driver/gpu
> > tree.
> >>
> >> For in-tree builds, if nouveau is NOT in the initrd-image, system 
will
> >> boot on vga console
> >> >
> >> > How do you force the nouveaufb switch at runlevel 2?  My screen
> > obviously
> >> > switches into KMS mode while udev is starting up.
> >> You can switch to the accelerated framebuffer console by
> >> modprobe nouveau
> >> modprobe fbcon
> >> fbcon will take-over console from the built-in VGA. See
> >> Documenation/fb/fbcon.txt
> >
> > Ok, thanks.  Now I've got everything compiled as modules and can load 
them
> > post-boot to switch to the nouveau framebuffer console.  That actually
> > didn't change the X behavior at all, though.  I still get the exact 
same
> > "X: Corrupted page table" messages in dmesg and my Xorg.log is just 
ending
> > with "NOUVEAU(0): Opened GPU channel 1".
> This is strange - channel 1 is the console channel. This appears in 
dmesg on
> nouveaufb initialisation before EDID probe to find connected outputs.
> Start X manually to avoid confusion of logs.

I've been testing this by booting to runlevel 3 and starting gdm.  I'll 
double check that I get the same results running Xorg by hand, although 
gdm does something to give me my console back after the X crash...


> Have attached ttm_xen.patch which updates vm_page_prot after changing 
flags.
> This is not done in the mainline drm-tree. But in the xen (old)
> drm-tree this is done in
> BOTH ttm_bo_mmap AND ttm_fbdev_mmap - and the attached patch does both,
> along with the conditional VM_IO in bo_mmap. And the second vm_page_prot
> update is for fbdev_mmap which corresponds to channel 1. Cross 
> fingers and try!

I'll go try that.


> > If the old nvidiafb is loaded, nouveau cannot install (and vice-versa)
> >
> > Well, everything seems to load just fine.  I get a nice teeny font and
> > dmesg messages saying I'm using nouveaufb.
> You should have got it earlier too - didn't you?

Yeah, I had that before.


> >> does NOT affect unaccelerated X on the older cards?
> >
> > Which accelerated modes are you refering to?  My understanding was 
that
> > the old GeForce2 cards should work for nouveaufb, the 2d xf86-nouveau
> > driver, and gallium's swrast_dri stuff (via AIGLX), but not gallium's 
new
> > dri_nouveau stuff.
> Right. But gallium's swrast_dri AND dri_nouveau are still 'unsupported',
> to be tried at own risk. nouveau_dri was working enough to run fgfs with
> mesa-7.7, but now with mesa-7.9, glxgears works not fgfs - segfaults in
> libdrm_nouveau.

Correct.  I've been having rather good luck with it until this.  I can 
recompile mesa and leave out the nouveau and swrast stuff to see if that 
helps, but my impression was that this is crashing before any of that code 
even gets used.  And it does work bare-metal and did work in xen prior to 
that last patch.  What's the fallback if both gallium's nouveau and swrast 
libs are missing, anyway?


> >> Xorg used to hang saying 'Opened Channel 2' and not 1.
> >
> > Now that's strange.  Every single one of my boxes says Opened Channel 
1,
> > with now reference to channel 2 at all.
> Channel 1 in dmesg/syslog;  Xorg.log snippet:
> (II) LoadModule: "shadowfb"
> (II) Loading /usr/lib/xorg/modules/libshadowfb.so
> (II) Module shadowfb: vendor="X.Org Foundation"
>     compiled for 1.7.5, module version = 1.0.0
>     ABI class: X.Org ANSI C Emulation, version 0.4
> (--) Depth 24 pixmap format is 32 bpp
> (II) NOUVEAU(0): Opened GPU channel 2  <initial hang point>
> (II) NOUVEAU(0): [DRI2] Setup complete    <after patch>
> (II) NOUVEAU(0): GART: 512MiB available
> (II) NOUVEAU(0): GART: Allocated 16MiB as a scratch buffer
> (II) EXA(0): Driver allocated offscreen pixmaps
> (II) EXA(0): Driver registered support for the following operations:
> (II)         Solid
> (II)         Copy
> (II)         Composite (RENDER acceleration)
> (II)         UploadToScreen
> (II)         DownloadFromScreen
> (==) NOUVEAU(0): Backing store disabled
> (==) NOUVEAU(0): Silken mouse enabled
> (II) NOUVEAU(0): [XvMC] Associated with Nouveau GeForce 8/9 Textured 
Video.
> (II) NOUVEAU(0): [XvMC] Extension initialized.
> 
> 
> Try with
> Option "ShadowFB"  "true"
> in Device section of xorg.conf (turns off acceleration) to check. The 
option
> also sets NoAccel on and X should use the FB device

Which should make it mind-bogglingly slow, right?  I'll try this as well.


> 
> So the cards that don't work are AGP cards?

Yes.  GeForce2 MX200 AGP.


---
Michael D Labriola
Electric Boat
mlabriol@xxxxxxxx
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.