[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] How can Xen trigger a context switch in an HVM guest domain?


  • To: XiaYubin <xiayubin@xxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Tue, 3 Nov 2009 11:51:35 +0000
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, "James \(song wei\)" <jsong@xxxxxxxxxx>
  • Delivery-date: Tue, 03 Nov 2009 03:52:07 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=o8GaUAzhO+JMJtBmte5oIXleEHX4rmVQcPGI7prX1CtxOCn7coS9SCvFk+eqaK8GTt SFjA2QDjNMw8J5simCOL0P8/uOp/itDULV7yx3otRcwPOnQSkz5jxRfyMgh1yFmvTN0G JTsTRegALo46CrS/ViRAbFyCvo3CcmWsSpznU=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

When I first started doing performance analysis, the sedf scheduler
was using a 500us timeslice, which (in my estimates) caused the
first-gen VMX-capable processors to spend at least 5% of their time
handling vmenters and vmexits.  Obviously performance has increased
somewhat since then, but they're still not free. :-)

 -George

On Tue, Nov 3, 2009 at 1:43 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
> James and George, thank you both! The breakpoint way is interesting, I
> don't event think of it :)
>
> OK, I'm going to use a simpler way to verify my idea first. Before the
> preempting-state VM runs, I will set a timer to make Xen get to run
> every 100us (maybe longer for the first iteration). The timer-handler
> will check if the preempting VM is in kernel-mode or user-mode. If it
> is in user-mode with cpu-hog's CR3, then it will be scheduled out.
> Meanwhile, if the iteration goes beyond some threshold (say 5 times),
> the VM will also be scheduled out. This way seems much simpler than
> the one using breakpoint, and more accurate than the one using
> 1ms-timer. It may bring some overhead, but the preemption is not
> supposed to occur frequently and the fairness is more important.
>
> The thread problem also exists in Linux platform. Currently I have no
> good idea to identify different threads from the hypervisor's
> perspective. I have a dream that one day those OS guys will export
> this information to VMM, a dream that one day our children will live
> in a world where virtualization rules. I have a dream today :)
>
> Thanks!
>
> --
> Yubin
>
> On Tue, Nov 3, 2009 at 12:05 AM, George Dunlap
> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>> OK, so you want to allow a VM to run so that it can do packet
>> processing in the kernel, but once it's done in the kernel you want to
>> preempt the VM again.
>>
>> An idea I was going to try out is that if a VM receives an interrupt
>> (possibly only certain interrupts, like network), let it run for a
>> very short amount of time (say, 1ms or 500us).  That should be enough
>> for it to do its basic packet processing (or audio processing, video
>> processing, whatever).  True, you're going to run the "cpu hog" during
>> that time, but that will be debited against time he'll run later.  (I
>> haven't tested this idea yet. It may work better with some credit
>> algorithms than others.)
>>
>> The problem with inducing a guest to call schedule():
>> * It may not have any other runnable processes, or it may choose the
>> same process to run again; so it may not switch the cr3 anyway.
>> * The only reliable way to do it without some kind of
>> paravirtualization (if even a kernel driver) would be to give it a
>> timer interrupt, which may mess up other things on the system, such as
>> the system time.
>>
>> If you're really keen to preempt on return to userspace, you could try
>> something like the following.  Before delivering the interrupt, note
>> the EIP the guest is at.  If it's in user space, set a hardware
>> breakpoint at that address.  Then deliver the interrupt.  If the guest
>> calls schedule(), you can catch the CR3 switch; if it returns to the
>> same process, it will hit the breakpoint.
>>
>> Two possible problems:
>> * For reasons of ancient history, the iret instruction may set the RF
>> flag in the EFLAGS register, which will cause the breakpoint not to
>> fire after the guest iret.  You may need to decode the instruction and
>> set the breakpoint at the instruction after, or something like that.
>> * I believe windows doens't do a cr3 switch if it does a *thread*
>> switch.  If so, on a thread switch you'll get neither the CR3 switch
>> nor the breakpoint (since the other thread is probably running
>> somewhere else).
>>
>> Peace,
>>  -George
>>
>> On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>> Hi, George,
>>>
>>> Thank you for your reply. Actually, I'm looking for a generic
>>> mechanism of cooperative scheduling. The independence of  guest OS can
>>> make such mechanism more convincing and practical, just like the
>>> balloon driver does.
>>>
>>> Maybe you are wondering why I asked such a wired question, let me
>>> describe it with more details. My current work is based on "Task-aware
>>> VM scheduling", which is published on VEE'09. By monitoring CR3
>>> changing at VMM level, Xen can get information of tasks' CPU
>>> consumption to identify CPU hogs and I/O tasks. Therefore, the
>>> task-aware mechanism offers a more fine-grained scheduler than the
>>> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O
>>> tasks in a mixed style.
>>>
>>> Imagine there are n VMs. One of them, named mix-VM, runs two tasks:
>>> cpuhog and iotask (network). The other VMs, named CPU-VM, run just
>>> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows).
>>>
>>> Here's what supposed to happen when iotask receiving an network
>>> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an
>>> inter-domain event to mix-VM, which is likely to be in run-queue. Xen
>>> then schedules it to run immediately and set its state to
>>> preempting-state. Right after that, the mix-VM *should* schedules
>>> iotask to process the incoming packet, and then schedules cpuhog after
>>> processing. When the CR3 is changing to cpuhog, Xen knows that the
>>> mix-VM has finished I/O processing (here we assume that the priority
>>> of cpuhog is usually lower than iotask in most OS), and schedules the
>>> mix-VM out to finish its preempting-state. Therefore, the mix-VM can
>>> preempt other VMs to process I/O ASAP, while making the preempting
>>> time as short as possible to keep fairness. The point is: cpuhog
>>> should not run in preempting-state.
>>>
>>> However, a problem arises when the mix-VM sending packets. When iotask
>>> sends an amount of data (using TCP protocol), it will block and wait
>>> to be waked up after guest kernel sending all the data, which may be
>>> split into thousands of TCP packets. The mix-VM will receives an ACK
>>> packet every time it sending a packet, which makes it enter
>>> preempting-state. Note that at this moment, the CR3 of mix-VM is
>>> cpuhog's (as the only running process). After the guest kernel
>>> processing the ACK packet and sending next packet, it switches to user
>>> mode, which means the cpuhog gets to run in preempting-state. The
>>> point is: as there is no CR3-changing, Xen has no way to run.
>>>
>>> One way is to add a hook at user/kernel mode switching, then Xen can
>>> catch the moment when cpuhog gets to run. However, this way costs too
>>> much. Another way is to force a VM to schedule when it entering
>>> preempting-state. Therefore, it will trap to Xen when CR3 is changed,
>>> and Xen can finish its preempting-state when it schedules cpuhog to
>>> run. That's why I want to trigger guest context switch from Xen. I
>>> don't really care *which* process it will switch to, I just want to
>>> get Xen a chance to run. The point is: is there a better/simpler way
>>> to solve this problem?
>>>
>>> Hope I described the problem clearly. And would you please show more
>>> details about the thought of "reschedule event channel"? Thanks!
>>>
>>> --
>>> Yubin
>>>
>>> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap
>>> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>>> Context switching is a choice the guest OS has to make, and how that's
>>>> done will differ based on the operating system.  I think if you're
>>>> thinking about modifying the guest scheduler, you're probably better
>>>> off starting with Linux.  Even if there's a way to convince Windows to
>>>> call schedule() to pick a new process, I'm not sure you'll be able to
>>>> tell it *which* process to choose.
>>>>
>>>> As far as mechanism on Xen's side, it would be easy enough to allocate
>>>> a "reschedule" event channel for the guest, such that whenever you
>>>> want to trigger a guest reschedule, just raise the event channel.
>>>>
>>>>  -George
>>>>
>>>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>>>> Hi, all,
>>>>>
>>>>> As I'm doing some research in cooperative scheduling between Xen and
>>>>> guest domain, I want to know how many ways can Xen trigger a context
>>>>> switch inside an HVM guest domain (which runs Windows in my case). Do
>>>>> I have to write a driver (like balloon-driver)? Or a user process is
>>>>> enough? Or there is an even simpler way?
>>>>>
>>>>> All your suggestions are appreciated. Thanks! :)
>>>>>
>>>>> --
>>>>> Yubin
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.