WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

RE: [Xen-ia64-devel]RID virtualization discussion

To: "INAKOSHI Hiroya" <inakoshi.hiroya@xxxxxxxxxxxxxx>
Subject: RE: [Xen-ia64-devel]RID virtualization discussion
From: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
Date: Wed, 13 Jun 2007 14:45:26 +0800
Cc: Xen-ia64-devel <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 12 Jun 2007 23:43:25 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <466F897C.9030108@xxxxxxxxxxxxxx>
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcetgRwpqcI3Ej2ITGOK1CyRkerFkgAAmR7A
Thread-topic: [Xen-ia64-devel]RID virtualization discussion
Hi Hiroya,

Thanks for doing all those tests.
The data is very useful.

I'm surprising there are a lot of vcpu-switches on dom0.
I guest most of the little penalty come from domain0.

In this case, I have following thinking.

Partition RID by two.
RIDs with most significant bit 1 are used by dom0.
RIDs with most significant bit 0 are shared by other domains.

Then for other domains, they can only see 23 bit RID, and the RID doesn't need
to be virtualized.
For dom0, when switching to dom0, Xen don't execute purge all, except 
vcpu migration happens for dom0( IMO vcpus of dom0 don't need to migrate).


If domain sees 23 bit RID, domain reduce the frequency of ptc.e,(which is 
time-consuming and performance-killer) because one of the reasons that ptc.e is 
executed is RID wrap. This will offset some penalty.


Suggestion and feedback are welcome.


Thanks,
-Anthony





 




>-----Original Message-----
>From: INAKOSHI Hiroya [mailto:inakoshi.hiroya@xxxxxxxxxxxxxx]
>Sent: 2007年6月13日 14:07
>To: Xu, Anthony
>Cc: Xen-ia64-devel
>Subject: Re: [Xen-ia64-devel]RID virtualization discussion
>
>Hi, Anthony,
>
>here are two experimental results regarding the discussion on rid
>virtualization.  One is SpecJBB, where two VT-i guests execute it
>sharing the same logical processors.  The other is TPC-C, as a more
>practical workload.
>
>1/ SpecJBB
>I employed a 4-core server.  Domain-0 has one vcpu pinned on lp#0.  A
>VT-i guest has two vcpus pinned on lp#2 and #3, so the two guests share
>the same logical processors.
>I will show only the overhead caused by the patch.  It was about 2.2%.
>The number of TLB flushing for each lp in 60 seconds was:
>
>       lp#0    lp#1    lp#2    lp#3
>       ----------------------------
>       36734   0       6733    8104
>
>Most of them occurred in Domain-0.
>
>
>2/ TPC-C
>I employed a different 8-core server for TPC-C.  Domain-0 has one vcpu
>pinned on lp#0.  The VT-i guest has four vcpus pinned on lp#1 through
>lp#4.  Please note that I have one VT-i guest in this case.
>I will show only the overhead caused by the patch.  It was about 1.6%.
>The number of TLB flushing for each lp in 60 seconds was:
>
>       lp#0    lp#1    lp#2    lp#3    lp#4    lp#5    lp#6    lp#7
>       ------------------------------------------------------------
>       505550  17531   23472   21544   21154   0       0       0
>
>Similarly, most of them occurred in Domain-0.
>A TPC-C export told me that there should be at most 2% of perturbation
>among trials in this server settings.  Note that this comment is on
>bare-metal case, though I have no evidence the situation is different on
>virtualized servers.
>
>
>TLB flushing seems infrequent in VT-i guests.  Because the frequency
>would be sub-linear to the number of guests, I suppose that the penalty
>caused by missing rid virtualization would be less significant.
>
>Regards,
>
>Hiroya
>
>
>
>Xu, Anthony wrote:
>> More tests.
>>
>> Test case:
>> Specjbb
>>
>> Platform:
>>          6 physical cpus with HT disable
>>
>> Guest Env:
>> 2 vti-guest each with 4 vcpus pined on same physical cpu
>> Guest1:
>>          Vcpu1 pined on pcpu2
>>          Vcpu2 pined on pcpu3
>>          Vcpu3 pined on pcpu4
>>          Vcpu4 pined on pcpu5
>> Guest2:
>>          Same as guest1
>>
>> Without flushing:
>> Score: 11066
>>
>> With flushing:
>> Score: 11031
>> Flushing TLB times: 3973286
>> Flushing times per second: 3014/s
>>
>>
>> The penalty is less than 0.5%.
>>
>> Definitely, we need to run "big benchmark” to get answer how much ptc.e will
>> impact performance.
>> Hope community can do more tests.
>>
>>
>> Thanks,
>> Anthony
>>
>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Xu,
>Anthony
>>> Sent: 2007年5月24日 17:05
>>> To: Isaku Yamahata
>>> Cc: Xen-ia64-devel
>>> Subject: RE: [Xen-ia64-devel]RID virtualization discussion
>>>
>>>> From: Isaku Yamahata
>>>> Sent: 2007年5月24日 17:00
>>>> To: Xu, Anthony
>>>> Cc: Xen-ia64-devel
>>>> Subject: Re: [Xen-ia64-devel]RID virtualization discussion
>>>>
>>>>> We have tested following cases
>>>>> There are 6 physical processors.
>>>>> And local_purge_all is executed about 2000 per second on each processor.
>>>>>
>>>>> Dom0(1vcpu) + domU(2vcpu)
>>>>> Dom0(1vcpu) + domU(4vcpu)
>>>>> Dom0(1vcpu) + vti(2vcpu)
>>>>> Dom0(1vcpu) + vti(4vcpu)
>>>>> Dom0(1vcpu) + vti(2vcpu) + vti(2vcpu)
>>>> Thank you for explanation.
>>>> Given that # of vcpu < # of pcpu, we can assume each vcpus are
>>>> bounded to pcpu. So context_switch() is called only when pcpu
>>>> goes to idle or pcpu is waked up from idle.
>>>>
>>>> Probably you may want to insert tlb flush into continue_running()
>>>> which is called when vcpu uses up time slice and it is chosen again.
>>>> Thus tlb is flushed each time slice.
>>> There is about 2000 vcpu switch per second on each processor.
>>> That's a lot of vcpu switch.
>>>
>>> I can do a test with #vcpu> #pcpu.
>>>
>>>
>>> Thanks,
>>> Anthony
>>>
>>> _______________________________________________
>>> Xen-ia64-devel mailing list
>>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-ia64-devel
>>
>> _______________________________________________
>> Xen-ia64-devel mailing list
>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-ia64-devel
>>
>>

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

<Prev in Thread] Current Thread [Next in Thread>