[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxx>
Date: Mon, 19 May 2014 12:42:12 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 19 May 2014 10:42:18 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>


On 05/19/2014 12:29 PM, Tomasz Wroblewski wrote:

On 05/16/2014 04:36 PM, Jan Beulich wrote:
On 16.05.14 at 13:38, <JBeulich@xxxxxxxx> wrote:
On 16.05.14 at 13:18, <tomasz.wroblewski@xxxxxxxxx> wrote:
If I coded up a patch to deal with this on -unstable, would you be
able to test that?
Willing to give it a go (xen major version updates are oftenproblematicto do though so can't promise success). What would your patch bedoing?
Adding entries to MTRR for the relocated regions?
This and properly declare the region in ACPI's _CRS. For starters I'll
probably try keeping the WB default overlaid with UC variable ranges,
as that's going to be the less intrusive change.
Okay here are two patches - the first to deal with the above mentioned
items, and the second to further increase correctness and at once
shrink the number of MTRR regions needed.

Afaict they apply equally well to stable-4.3, master, and staging.

But to be honest I don't expect any performance improvement, all
I'd expect is that BARs relocated above 4Gb would now get treated
equally to such below 4Gb - UC in all cases.
Thanks Jan. I've tried the patches and you're correct, putting UC inMTRR for the relocated region didn't help the issue. However, I had tohack that manually - the codepaths to do that in your hvmloader patchwere not activating. The hvmloader is not programming guest pci barsto 64bit regions at all, rather still programming them with 32 bitregions... upon a look this seems because using_64bar conditon, aswell as bar64_relocate in hvmloader/pci.c is always false.
So bar relocation to 64bit is not happening, but ram relocation as perthe code tagged as /* Relocate RAM that overlaps PCI space (in64k-page chunks). */ is happening. This maybe is correct (?), althoughI think the fact that RAM is relocated but not the BAR causes thetools (i.e. qemu) to lose sight of what memory is used for mmio and asyou mentioned in one of the previous posts, the calls which would setit to mmio_direct in p2m table are not happening. Our qemu is prettyancient and doesn't support 64bit bars so its not super trivial toverify whether relocating bars to 64bit would help. Trying to makesense out of this..

Actually seems to be like the plausible explanation for the performanceissues we see could be that

- some region of guest space has been relocated by hvmloader out to64bit memory to enlarge pci mmio hole (which stays in 32bit space)- BUT the caching on that relocated region is UC since at the time ofthe relocation MTRR was disabled and that caused the EPT entry to get UCtype.- however since this is just some region of guest memory not actuallyused for mmio, just relocated out of mmio hole, the caching should be WB- guest doesn't use that region for mmio but for some other tasks,access to that is slow and slows the guest down.- as you mentioned it might already be fixed on unstable since EPTs areupdated there when mtrr is enabled.

That would explain why retaining old loop fixed by XSA-60 fixes the perfissue, since it runs at the time mtrr is enabled, it reverts therelocated region to WB (which is correct I guess for the non mmio-regions)



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich

References:
- [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski

Prev by Date: Re: [Xen-devel] Problem bringing up Xen 4.4 on omap5432
Next by Date: [Xen-devel] Singleshot timer firing late
Previous by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Next by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.