[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 development update

On 04/04/13 18:14, Suravee Suthikulanit wrote:
On 4/3/2013 5:51 AM, George Dunlap wrote:
On 03/04/13 00:48, Suravee Suthikulanit wrote:
On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
On 4/2/2013 11:34 AM, Tim Deegan wrote:
At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
On 02.04.13 at 16:07, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
* AMD NPT performance regression after c/s 24770:7f79475d3de7
     owner: ?
     Reference: http://marc.info/?l=xen-devel&m=135075376805215
This is supposedly fixed with the RTC changes Tim committed the
other day. Suravee, is that correct?
This is a separate problem.  IIRC the AMD XP perf issue is caused
by the
emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it
takes a
_lot_ of vmexits for IRQL reads and writes.
Is there any tools or good ways to count the number of VMexit in Xen?


I have used iperf benchmark to compare network performance (bandwidth)
between the two versions of the hypervisor:
1. good: 24769:730f6ed72d70
2. bad: 24770:7f79475d3de7

In the "bad" case, I am seeing that the network bandwidth has dropped
about 13-15%.

However, when I uses the xentrace utility to trace the number of VMEXIT,
I actually see about 25% more number of VMEXIT in the good case.  This
is inconsistent with the statement that Tim mentioned above.
I was going to say, what I remember from my little bit of
investigation back in November, was that it had all the earmarks of
micro-architectural "drag", which happens when the TLB or the caches
can't be effective.

Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
* fewer VMEXITs, but
* time for each vmexit takes longer

If you post the results of "xenalyze --svm-mode -s" for both traces, I
can tell you what I see.


Here's another version of the outputs from xenalyze with only VMEXIT.
In this case, I pin all the VCPUs (4) and pin my application process to

NOTE: This measurement is without the RTC bug.

-- v3 --
   VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
   VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}
   VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
   VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}

-- v3 --
   VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
   VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}
   VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
   VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}

So in the good run, we have 855k NPF exits, each of which takes about 5.7k cycles. In the bad run, we have only 108k NPF exits, each of which takes an average of 15k cycles. (Although the 50th percentile is still only 6.8k cycles -- so most are about the same, but a few take a lot longer.)

It's a bit strange -- the reduced number of NPF exits is consistent with the idea of some micro-architectural thing slowing down the processing of the guest. However, in my experience usually this also has an effect on other processing as well -- i.e., the time to process an IOIO would also go up, because dom0 would be slowed down as well; and time to process any random VMEXIT (say, the CR0 writes) would also go up.

But maybe it only has an effect inside the guest, because of the tagged TLBs or something?

Suravee, could you run this one again, but:
* Trace everything, not just vmexits
* Send me the trace files somehow (FTP or Dropbox), and/or add "--with-interrupt-eip-enumeration=249 --with-mmio-enumeration" when you run the summary?

That will give us an idea where the guest is spending its time statistically, and what kinds of MMIO it is doing, which may give us a clearer picture of what's going on.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.