[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.3 development update
On 04/04/13 18:14, Suravee Suthikulanit wrote: On 4/3/2013 5:51 AM, George Dunlap wrote:On 03/04/13 00:48, Suravee Suthikulanit wrote:On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:On 4/2/2013 11:34 AM, Tim Deegan wrote:At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:On 02.04.13 at 16:07, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:* AMD NPT performance regression after c/s 24770:7f79475d3de7 owner: ? Reference: http://marc.info/?l=xen-devel&m=135075376805215This is supposedly fixed with the RTC changes Tim committed the other day. Suravee, is that correct?This is a separate problem. IIRC the AMD XP perf issue is caused by the emulation of LAPIC TPR accesses slowing down with Andres's p2m locking patches. XP doesn't have 'lazy IRQL' or support for CR8, so it takes a _lot_ of vmexits for IRQL reads and writes.Is there any tools or good ways to count the number of VMexit in Xen?Tim/Jan, I have used iperf benchmark to compare network performance (bandwidth) between the two versions of the hypervisor: 1. good: 24769:730f6ed72d70 2. bad: 24770:7f79475d3de7 In the "bad" case, I am seeing that the network bandwidth has dropped about 13-15%. However, when I uses the xentrace utility to trace the number of VMEXIT, I actually see about 25% more number of VMEXIT in the good case. This is inconsistent with the statement that Tim mentioned above.I was going to say, what I remember from my little bit of investigation back in November, was that it had all the earmarks of micro-architectural "drag", which happens when the TLB or the caches can't be effective. Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like: * fewer VMEXITs, but * time for each vmexit takes longer If you post the results of "xenalyze --svm-mode -s" for both traces, I can tell you what I see. -GeorgeHere's another version of the outputs from xenalyze with only VMEXIT. In this case, I pin all the VCPUs (4) and pin my application process to VCPU 3. NOTE: This measurement is without the RTC bug. BAD: -- v3 -- VMEXIT_CR0_WRITE 305 0.00s 0.00% 1660 cyc { 1158| 1461| 2507} VMEXIT_CR4_WRITE 6 0.00s 0.00% 19771 cyc { 1738| 5031|79600} [snip] VMEXIT_IOIO 5581 0.19s 0.85% 82514 cyc { 4250|81909|146439} VMEXIT_NPF 108072 0.71s 3.14% 15702 cyc { 6362| 6865|37280} GOOD: -- v3 -- VMEXIT_CR0_WRITE 3099 0.00s 0.01% 1541 cyc { 1157| 1420| 2151} VMEXIT_CR4_WRITE 12 0.00s 0.00% 4105 cyc { 1885| 4380| 5515} [snip] VMEXIT_IOIO 53835 1.97s 8.74% 87959 cyc { 4996|82423|144207} VMEXIT_NPF 855101 2.06s 9.13% 5787 cyc { 4903| 5328| 8572} [snip]So in the good run, we have 855k NPF exits, each of which takes about 5.7k cycles. In the bad run, we have only 108k NPF exits, each of which takes an average of 15k cycles. (Although the 50th percentile is still only 6.8k cycles -- so most are about the same, but a few take a lot longer.) It's a bit strange -- the reduced number of NPF exits is consistent with the idea of some micro-architectural thing slowing down the processing of the guest. However, in my experience usually this also has an effect on other processing as well -- i.e., the time to process an IOIO would also go up, because dom0 would be slowed down as well; and time to process any random VMEXIT (say, the CR0 writes) would also go up. But maybe it only has an effect inside the guest, because of the tagged TLBs or something? Suravee, could you run this one again, but: * Trace everything, not just vmexits* Send me the trace files somehow (FTP or Dropbox), and/or add "--with-interrupt-eip-enumeration=249 --with-mmio-enumeration" when you run the summary? That will give us an idea where the guest is spending its time statistically, and what kinds of MMIO it is doing, which may give us a clearer picture of what's going on. Thanks, -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |