[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled




> -----Original Message-----
> From: Ben Guthro [mailto:ben.guthro@xxxxxxxxx]
> Sent: Thursday, June 06, 2013 11:08 PM
> To: Zhang, Xiantao
> Cc: Jan Beulich; Ben Guthro; Andrew Cooper; xen-devel
> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled
> 
> On Jun 6, 2013, at 11:06 AM, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>
> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Thursday, June 06, 2013 2:59 PM
> >> To: Ben Guthro
> >> Cc: Andrew Cooper; Zhang, Xiantao; xen-devel
> >> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled
> >>
> >>>>> On 06.06.13 at 01:53, Ben Guthro <ben@xxxxxxxxxx> wrote:
> >>> On Wed, Jun 5, 2013 at 4:27 PM, Ben Guthro <ben@xxxxxxxxxx> wrote:
> >>>> On Wed, Jun 5, 2013 at 11:38 AM, Jan Beulich <JBeulich@xxxxxxxx>
> wrote:
> >>>>>>>> On 05.06.13 at 17:25, Ben Guthro <ben@xxxxxxxxxx> wrote:
> >>>>>> On Wed, Jun 5, 2013 at 11:14 AM, Jan Beulich <JBeulich@xxxxxxxx>
> >> wrote:
> >>>>>>> Depending on whether ATS is in use, more than one invalidation
> >>>>>>> can be done in the processing here - could you therefore check
> >>>>>>> whether there's any sign of ATS use ("iommu=verbose" should
> >>>>>>> make you see respective messages), and if so see whether
> >>>>>>> disabling it ("ats=off") makes a difference?
> >>>>>>
> >>>>>> ATS does not appear to be running:
> >>>>>>
> >>>>>> (XEN) [VT-D]dmar.c:737: Host address width 36
> >>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD:
> >>>>>> (XEN) [VT-D]dmar.c:412:   dmaru->address = fed90000
> >>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed90000 iommu->reg =
> >> ffff82c3ffd57000
> >>>>>> (XEN) [VT-D]iommu.c:1199: cap = c0000020e60262 ecap = f0101a
> >>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:02.0
> >>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD:
> >>>>>> (XEN) [VT-D]dmar.c:412:   dmaru->address = fed91000
> >>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed91000 iommu->reg =
> >> ffff82c3ffd56000
> >>>>>> (XEN) [VT-D]iommu.c:1199: cap = c9008020660262 ecap = f0105a
> >>>>>> (XEN) [VT-D]dmar.c:354:  IOAPIC: 0000:f0:1f.0
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.0
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.1
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.2
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.3
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.4
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.5
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.6
> >>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.7
> >>>>>> (XEN) [VT-D]dmar.c:426:   flags: INCLUDE_ALL
> >>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR:
> >>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:1d.0
> >>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:1a.0
> >>>>>> (XEN) [VT-D]dmar.c:625:   RMRR region: base_addr ba8d5000
> >> end_address
> >>>>>> ba8ebfff
> >>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR:
> >>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:02.0
> >>>>>> (XEN) [VT-D]dmar.c:625:   RMRR region: base_addr bb800000
> >> end_address
> >>>>>> bf9fffff
> >>>>>>
> >>>>>> I would expect a line with "found ACPI_DMAR_ATSR" to be printed, if it
> >>>>>> was found.
> >>>>>
> >>>>> Right. So one less variable.
> >>>>
> >>>> Some more info.
> >>>> Ross Philipson provided me with a handy utility to dump a bunch more
> >>>> info about the DMAR tables, and with some more trace, this appears to
> >>>> be tied to the IGD.
> >>>>
> >>>> Early in the boot process, I see queue_invalidate_wait() called for
> >>>> DRHD unit 0, and 1
> >>>> (unit 0 is wired up to the IGD, unit 1 is everything else)
> >>>>
> >>>> Up until i915 does the following, I see that unit being flushed with
> >>>> queue_invalidate_wait() :
> >>>>
> >>>> [    0.704537] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> >>>> [    0.704537] ENERGY_PERF_BIAS: View and update with x86_energy_p
> >>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0
> >>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0
> >>>> [    1.983028] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to
> >>>> bit banging on pin 5
> >>>> [    2.253551] fbcon: inteldrmfb (fb0) is primary device
> >>>> [    3.111838] Console: switching to colour frame buffer device 170x48
> >>>> [    3.171631] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
> >>>> [    3.171634] i915 0000:00:02.0: registered panic notifier
> >>>> [    3.173339] acpi device:00: registered as cooling_device1
> >>>> [    3.173401] ACPI: Video Device [VID] (multi-head: yes  rom: no  post: 
> >>>> no)
> >>>> [    3.173962] input: Video Bus as
> >>
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
> >>>> [    3.174232] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on
> >>> minor 0
> >>>> [    3.174258] ahci 0000:00:1f.2: version 3.0
> >>>> [    3.174270] xen: registering gsi 19 triggering 0 polarity 1
> >>>> [    3.174274] Already setup the GSI :19
> >>>>
> >>>>
> >>>> After that - the unit never seems to be flushed.
> >>>>
> >>>> ...until we enter into the S3 hypercall, which loops over all DRHD
> >>>> units, and explicitly flushes all of them via iommu_flush_all()
> >>>>
> >>>> It is at that point that it hangs up when talking to the device that
> >>>> the IGD is plumbed up to.
> >>>>
> >>>>
> >>>> Does this point to something in the i915 driver doing something that
> >>>> is incompatible with Xen?
> >>>
> >>> I actually separated it from the S3 hypercall, adding a new debug key
> >>> 'F' - to just call iommu_flush_all()
> >>> I can crash it on demand with this.
> >>>
> >>> Booting with "i915.modeset=0 single" (to prevent both KMS, and Xorg) -
> >>> it does not occur.
> >>> So, that pretty much narrows it down to the IGD, in my mind.
> >>
> >> Indeed, I agree. Yet I can't in any way comment on what or why.
> >> Xiantao (perhaps some graphics person would good to be Cc-ed
> >> here too)?
> > Hi, Jan/Ben
> > Thanks for your analysis! Could you try to enable  "snb_igd_quirk"  to have 
> > a
> try ?  thanks!
> > Xiantao
> >
> 
> 
> Thanks for your reply. I tried this param yesterday, but it did not
> change the behavior.
Okay, I recalled one bug in IGD i915 driver is found recently, and it may bring 
some errors  to VT-d,  and should be fixed in latest kernel.  Could you try 
latest kernel 3.9.4 or 3.10-rcx ?   
Xiantao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.