[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] More network tests with xenoprofile this time



  William and Andrew

  Sorry for the delay in replying. I have been traveling
  and did not have email access while away.

> 
> Hi Renato,
> 
> The article was an interesting application of the xenoprof.
> 
> It seem like it would be useful to also have data collected using the 
> cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of 
> areas with high overhead operations. There may be some areas with few 
> very expensive instructions. Calling attention to those areas 
> would help 
> improve performance.

  Yes, you are right. We have in fact collected GLOBAL_POWER_EVENTS,
  but did not include in the paper due to space limitations.
  I have attached oprofile results for our ttcp like benchmark(receive
  side) for the case with 1  NIC (both cycle counts and instructions).
  As you can see there are some functions with very expensive
instructions.
  For example "hypercall" add anly 0.6% additional instructions but
  these consume 3.0% more clock cycles; "unmask_IO_APIC_irq" add
  0.25% instructions but consume 5% more cycles. It would be
  interesting to investigate these and see if we can optimize them.
     
> 
> The increases in I-TLB and D-TLB events for Xen-domain0 shown 
> in Figure 
> 4 are surprising. Why would the working sets be that much larger for 
> Xen-domain0 than regular linux, particularly for code? Is 
> there an table 
> similar to table 3 for I-TLB event sample locations?
> 

  Yes, we were also surprised by these results. I have attached
  the complete I-TLB and D_TLB oprofile results (for the 3 NICs case)
  (note these are on a different type of machine than the other
   2 attached oprofile results)  

  Aravind instrumented the macros in xen/include/asm-x86/flushtlb.h.
  I am not sure if he used PERFCOUNTER_CPU or if he included his
  own instrumentation. With this instrumentation we did not observe
  any TLB flush, but I suppose we could have missed TLB flushes
  that did not use the macro... I think it would be a good idea to
  investigate this further to confirm that TLB flushes are not
happening.

  One additional observation is that in general the number of misses
  in NOT proportional to the size of the working set. It is possible
  that a small increase in the working set significantly increase the
  number of misses. Therefore it is possible that the increase
  in TLB misses is in fact due to a larger working set. But, I agree
  we have to investigate this further to get confirmation ...

> Can't the VMM use a 4-MB page and the Xen-domain0 kernel shouldn't be 
> that much larger than regular linux kernel? 
> How were TLB  flushes ruled 
> out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be 
> used to see if the VMM is doing a lot of TLB flushes?
> 
> Also how much of I-TLB and D-TLB events are due to the P4 
> architecture? 
> Are the results so dramatic for a Athlon or AMD64 processors?
> 
  We did not try this on any other architecture. 
  Right now xenoprof is only supported on P4.
  Support for other architectures is not on top of our priority list.

  Regards

  Renato 

> -Will
> 
> 

Attachment: time_func_xen0.prof
Description: time_func_xen0.prof

Attachment: instr_func_xen0.prof
Description: instr_func_xen0.prof

Attachment: dtlb_3nic.prof
Description: dtlb_3nic.prof

Attachment: itlb_3nic.prof
Description: itlb_3nic.prof

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.