[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0 / hypervisor hang on dom0 boot


According to Kevin (our expert on Xen & graphics):

Haven't seen that specific problem.

The only trick for i915 in dom0, in my mind, is to have CONFIG_DMAR enabled in
dom0 although dom0 is not actually exposed with a VT-d engine. This sets 
flag in i915, ensures i915 to use Xen DMA interface instead of virt_to_phys, so 
MFN is written to GTT entries. Otherwise, having GPFN in GTT entries is bogus, 
GPU will DMA to wrong locations then, and thus cause random issues.

Once we also identified a regression in 3.8, where need_dmar is not honored in
i915 driver:

commit 20652097dadd9a7fb4d652f25466299974bc78f9
Author: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>
Date:   Thu Dec 13 23:47:47 2012 +0800

    drm/i915: Fix missed needs_dmar setting
    From Ben's AGP dependence removal change, "needs_dmar" flag has not
    been properly setup for new chips using new GTT init function. This
    one adds missed setting of that flag to make sure we do pci mappings
    with IOMMU enabled.
    Signed-off-by: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>
    Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>

So, I didn't see the exact same phenomenon, but GTT is always the 1st 
culprit to study when seeing such issue.

Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: Jan Beulich [mailto:JBeulich@xxxxxxxx] 
Sent: Thursday, May 16, 2013 6:10 AM
To: Dugger, Donald D; Dietmar Hahn
Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxx; Konrad Rzeszutek Wilk
Subject: Re: [Xen-devel] dom0 / hypervisor hang on dom0 boot

>>> On 16.05.13 at 13:07, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> The function behind the pointer intel_private.driver->write_entry is
> i965_write_entry(). And the interesting instruction seems to be:
>   writel(addr | pte_flags, intel_private.gtt + entry);
> I added another printk() on start of the function i965_write_entry().
> And surprisingly  after printing a lot of messages the kernel came up!!!
> But now I had other problems like losing the audio device (maybe timeouts).
> So maybe the hang is a timing problem?

Apparently. As the caller is running this in a loop, did you check
whether it's the first or always the same entry that it hangs on?

> What I wanted to check is, what the hypervisor is doing while the system 
> hangs.

Probably nothing in this case, as it doesn't get involved in the MMIO
write being carried out.

> Has anybody an idea maybe a timer and after 30s printing a dump of the stack 
> of all cpus?

That would be the watchdog, which you said doesn't kick in either.

I'm afraid this is a problem with the graphics device's processing of
the written data (locking up the machine at the bus level). Without
help from someone knowing what the driver is supposed to do here,
and what therefore might be going wrong, I don't see good chances
of making progress here. Don - any idea who that could be?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.