Hi all,
This provides a patch to arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to mitigate the heavy contention on handle_pmt_io when running a HVM configured with 32 cores on a 48-core HP machine. (8 * 6 AMD 2.4GHz Opteron chips)
We used seven applications to profile guest HVM(Debian GNU/Linux 5.0, kernel version 2.6.35-rc5 ) upon Xen 4.0.0 and Xen 4.0.1, and found all applications encountered heavy contention on a spin_lock inside handle_pmt_io in Xen.
The patch is a workaround for eliminating the contention on handle_pmt_io, As the virtual time must be fresh, there should be someone updating it. But it is not necessary to let a VCPU update the virtual time when another one has been updating it. Thus the update can be skipped when the VCPU finds someone else is updating the virtual time. The patch substitutes the spin_lock with spin_try_lock to check whether someone is holding the spin_lock. If so, there must be someone refreshing the virtual time, and others can just skip the operation. Otherwise, the spin_lock is acquired, and the current VCPU should update the virtual time.
The performance improvements of each application(running on a 32-core HVM and pinning one thread/process to each core) after applying the patch to Xen 4.0.0 and Xen 4.0.1 is as follows:
|
original |
patched |
improvements |
gmake (sec) |
91.2 |
86.4 |
5.6% |
phoenix-histogram (sec) |
32.79 |
27.43 |
19.5% |
phoenix-wordcount (sec) |
279.22 |
232.85 |
19.9% |
phoenix-linear_regression (sec) |
2.57 |
2.4 |
7.2% |
specjvm-compress (ops/min) |
774.37 |
923.71 |
19.0% |
specjvm-crypto (ops/min) |
209.55 |
251.79 |
20.0% |
specjvm-xml-validation (ops/min) |
624.46 |
785.51 |
26.0% |
Performance of each application on a 32-core HVM on Xen 4.0.0
|
original |
patched |
improvements |
gmake (sec) |
89.04 |
85.93 |
3.6% |
phoenix-histogram (sec) |
42.63 |
28.27 |
50.8% |
phoenix-wordcount (sec) |
280.38 |
238.93 |
17.3% |
phoenix-linear_regression (sec) |
2.58 |
2.42 |
6.5% |
specjvm-compress (ops/min) |
751.33 |
923.84 |
23.0% |
specjvm-crypto (ops/min) |
209.33 |
243.28 |
16.2% |
specjvm-xml-validation (ops/min) |
620.41 |
772.25 |
24.5% |
Performance of each application on a 32-core HVM on Xen 4.0.1
For more details, please refer to our technical report:
The patch is same for xen4.0.0 and xen4.0.1: Index: arch/x86/hvm/pmtimer.c =================================================================== --- arch/x86/hvm/pmtimer.c (revision 4651) +++ arch/x86/hvm/pmtimer.c (working copy) @@ -206,10 +206,17 @@ if ( dir == IOREQ_READ ) { - spin_lock(&s->lock); - pmt_update_time(s); - *val = s->pm.tmr_val; - spin_unlock(&s->lock); + /* + * if acquired the PMTState lock then update the time + * else other vcpu is updating it ,it should be up to date. + */ + if (spin_trylock(&s->lock)) { + pmt_update_time(s); + *val = s->pm.tmr_val; + spin_unlock(&s->lock); + } + else + *val = (s->pm.tmr_val & TMR_VAL_MASK); return X86EMUL_OKAY; } |