[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] New MPI benchmark performance results (update)


Thanks for the response.

In the graphs presented on the webpage, we take the results of native Linux as the reference and normalize the other 3 scenarios to it. We observe a general pattern that usually dom0 has a better performance than domU with SMP than domU without SMP (here better performance means low latency and high throughput). However, we also notice very big performance gap between domU (w/o SMP) and native linux (or dom0 because generally dom0 has a very similar performance as native linux). Some distinct examples are: 8-node SendRecv latency (max domU/linux score ~ 18), 8-node Allgather latency (max domU/linux score ~ 17), and 8-node Alltoall latency (max domU/linux > 60). The performance difference in the last example is HUGE and we could not think about a reasonable explaination why transferring 512B message size is so much different than other sizes. We appreciate if you can provide your insight to such a big performance problem in these benchmarks.

I still don't quite understand your experimental setup. What version of
Xen are you using? How many CPUs does each node have? How many domU's do
you run on a single node?

The Xen version is 2.0. Each node has 2 CPUs. "domU with SMP" I mentioned in the previous email means Xen is booted with SMP support (no "nosmp" option) and I pin dom0 to the 1st CPU and pin domU to the 2nd CPU; "domU with no SMP" I mentioned means Xen is booted without SMP support (with "nosmp" option) and both dom0 and domU use the same single CPU. There is only 1 domU running on a single node for each experiment.

As regards the anomalous result for 512B AlltoAll performance, the best
way to track this down would be to use xen-oprofile.

I am not very familar with xen-oprofile. I notice there are some discussions about it in the mailing list. I wonder if there is any other documents that I can refer to. Thanks.

Is it reliably repeatable?

Yes, we observe this anomaly repeatable. The reported data point in the graph is the average of 10 different runs of the same experiment in different time.

Really bad results are usually due to packets being dropped
somewhere -- there hasn't ben a whole lot of effort put into UDP
performance because so few applications use it.

To clarify: do you indicate that benchmark like AlltoAll might use UDP rather than TCP as transportation protocol?

Thanks again for the help.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.