[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0 / hypervisor hang on dom0 boot



On Thu, May 16, 2013 at 01:07:05PM +0200, Dietmar Hahn wrote:
> Am Mittwoch 15 Mai 2013, 10:42:17 schrieb Jan Beulich:
> > >>> On 15.05.13 at 11:12, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> > > Am Mittwoch 15 Mai 2013, 09:35:46 schrieb Jan Beulich:
> > >> >>> On 15.05.13 at 08:53, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> 
> > >> >>> wrote:
> > >> > I tried iommu=debug and I can't see any faulting messages but Iam not
> > >> > familiar with this code.
> > >> > I attached the logging, maybe anyone can have a look on this.
> > 
> > Perhaps only (if at all) by instrumenting the hypervisor. The
> > question of course is how easily/quickly you can narrow down the
> > code region that it might be dying in. And whether it's a hypervisor
> > action at all that causes the hang (as opposed to something the
> > DRM code in Dom0 does).
> 
> I added some debug code to the linux kernel and could track down the
> point of the hang. I used openSuSE kernel 3.7.10-1.4 but I looked at newer
> kernels and found that the code is similar.
> 
> i915_gem_init_global_gtt(...)
>  ...
>  intel_gtt_clear_range(start / PAGE_SIZE, (end-start) / PAGE_SIZE);
>  ...
> 
> void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
> {
>         unsigned int i;
> 
>     ---> A printk(...) here is seen on serial line!
> 
>         for (i = first_entry; i < (first_entry + num_entries); i++) {
>                 
> intel_private.driver->write_entry(intel_private.base.scratch_page_dma,
>                                                   i, 0);
>         }
> 
>     ---> A printk(...) here is never seen!
> 
>         readl(intel_private.gtt+i-1);
> }
> 
> The function behind the pointer intel_private.driver->write_entry is
> i965_write_entry(). And the interesting instruction seems to be:
>   writel(addr | pte_flags, intel_private.gtt + entry);
> 
> I added another printk() on start of the function i965_write_entry().
> And surprisingly  after printing a lot of messages the kernel came up!!!
> But now I had other problems like losing the audio device (maybe timeouts).
> So maybe the hang is a timing problem?
> 
> What I wanted to check is, what the hypervisor is doing while the system 
> hangs.
> Has anybody an idea maybe a timer and after 30s printing a dump of the stack 
> of
> all cpus?

Yes. Can you try the two attached patches please.

> Thanks.
> 
> Dietmar.
> 
> 
> -- 
> Company details: http://ts.fujitsu.com/imprint.html
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
> 

Attachment: 0001-drm-i915-Don-t-leak-a-page-in-case-of-DMA-error-mapp.patch
Description: Text document

Attachment: 0002-drm-i915-Sync-the-scratch-page-after-writting-values.patch
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.