Xen project Mailing List

Re: [Xen-devel] How can Xen trigger a context switch in an HVM guest domain?

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Tue, 3 Nov 2009 11:51:35 +0000

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, "James \(song wei\)" <jsong@xxxxxxxxxx>

Delivery-date: Tue, 03 Nov 2009 03:52:07 -0800

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=o8GaUAzhO+JMJtBmte5oIXleEHX4rmVQcPGI7prX1CtxOCn7coS9SCvFk+eqaK8GTt SFjA2QDjNMw8J5simCOL0P8/uOp/itDULV7yx3otRcwPOnQSkz5jxRfyMgh1yFmvTN0G JTsTRegALo46CrS/ViRAbFyCvo3CcmWsSpznU=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

When I first started doing performance analysis, the sedf scheduler was using a 500us timeslice, which (in my estimates) caused the first-gen VMX-capable processors to spend at least 5% of their time handling vmenters and vmexits. Obviously performance has increased somewhat since then, but they're still not free. :-) -George On Tue, Nov 3, 2009 at 1:43 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote: > James and George, thank you both! The breakpoint way is interesting, I > don't event think of it :) > > OK, I'm going to use a simpler way to verify my idea first. Before the > preempting-state VM runs, I will set a timer to make Xen get to run > every 100us (maybe longer for the first iteration). The timer-handler > will check if the preempting VM is in kernel-mode or user-mode. If it > is in user-mode with cpu-hog's CR3, then it will be scheduled out. > Meanwhile, if the iteration goes beyond some threshold (say 5 times), > the VM will also be scheduled out. This way seems much simpler than > the one using breakpoint, and more accurate than the one using > 1ms-timer. It may bring some overhead, but the preemption is not > supposed to occur frequently and the fairness is more important. > > The thread problem also exists in Linux platform. Currently I have no > good idea to identify different threads from the hypervisor's > perspective. I have a dream that one day those OS guys will export > this information to VMM, a dream that one day our children will live > in a world where virtualization rules. I have a dream today :) > > Thanks! > > -- > Yubin > > On Tue, Nov 3, 2009 at 12:05 AM, George Dunlap > <George.Dunlap@xxxxxxxxxxxxx> wrote: >> OK, so you want to allow a VM to run so that it can do packet >> processing in the kernel, but once it's done in the kernel you want to >> preempt the VM again. >> >> An idea I was going to try out is that if a VM receives an interrupt >> (possibly only certain interrupts, like network), let it run for a >> very short amount of time (say, 1ms or 500us). That should be enough >> for it to do its basic packet processing (or audio processing, video >> processing, whatever). True, you're going to run the "cpu hog" during >> that time, but that will be debited against time he'll run later. (I >> haven't tested this idea yet. It may work better with some credit >> algorithms than others.) >> >> The problem with inducing a guest to call schedule(): >> * It may not have any other runnable processes, or it may choose the >> same process to run again; so it may not switch the cr3 anyway. >> * The only reliable way to do it without some kind of >> paravirtualization (if even a kernel driver) would be to give it a >> timer interrupt, which may mess up other things on the system, such as >> the system time. >> >> If you're really keen to preempt on return to userspace, you could try >> something like the following. Before delivering the interrupt, note >> the EIP the guest is at. If it's in user space, set a hardware >> breakpoint at that address. Then deliver the interrupt. If the guest >> calls schedule(), you can catch the CR3 switch; if it returns to the >> same process, it will hit the breakpoint. >> >> Two possible problems: >> * For reasons of ancient history, the iret instruction may set the RF >> flag in the EFLAGS register, which will cause the breakpoint not to >> fire after the guest iret. You may need to decode the instruction and >> set the breakpoint at the instruction after, or something like that. >> * I believe windows doens't do a cr3 switch if it does a *thread* >> switch. If so, on a thread switch you'll get neither the CR3 switch >> nor the breakpoint (since the other thread is probably running >> somewhere else). >> >> Peace, >> -George >> >> On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote: >>> Hi, George, >>> >>> Thank you for your reply. Actually, I'm looking for a generic >>> mechanism of cooperative scheduling. The independence of guest OS can >>> make such mechanism more convincing and practical, just like the >>> balloon driver does. >>> >>> Maybe you are wondering why I asked such a wired question, let me >>> describe it with more details. My current work is based on "Task-aware >>> VM scheduling", which is published on VEE'09. By monitoring CR3 >>> changing at VMM level, Xen can get information of tasks' CPU >>> consumption to identify CPU hogs and I/O tasks. Therefore, the >>> task-aware mechanism offers a more fine-grained scheduler than the >>> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O >>> tasks in a mixed style. >>> >>> Imagine there are n VMs. One of them, named mix-VM, runs two tasks: >>> cpuhog and iotask (network). The other VMs, named CPU-VM, run just >>> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows). >>> >>> Here's what supposed to happen when iotask receiving an network >>> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an >>> inter-domain event to mix-VM, which is likely to be in run-queue. Xen >>> then schedules it to run immediately and set its state to >>> preempting-state. Right after that, the mix-VM *should* schedules >>> iotask to process the incoming packet, and then schedules cpuhog after >>> processing. When the CR3 is changing to cpuhog, Xen knows that the >>> mix-VM has finished I/O processing (here we assume that the priority >>> of cpuhog is usually lower than iotask in most OS), and schedules the >>> mix-VM out to finish its preempting-state. Therefore, the mix-VM can >>> preempt other VMs to process I/O ASAP, while making the preempting >>> time as short as possible to keep fairness. The point is: cpuhog >>> should not run in preempting-state. >>> >>> However, a problem arises when the mix-VM sending packets. When iotask >>> sends an amount of data (using TCP protocol), it will block and wait >>> to be waked up after guest kernel sending all the data, which may be >>> split into thousands of TCP packets. The mix-VM will receives an ACK >>> packet every time it sending a packet, which makes it enter >>> preempting-state. Note that at this moment, the CR3 of mix-VM is >>> cpuhog's (as the only running process). After the guest kernel >>> processing the ACK packet and sending next packet, it switches to user >>> mode, which means the cpuhog gets to run in preempting-state. The >>> point is: as there is no CR3-changing, Xen has no way to run. >>> >>> One way is to add a hook at user/kernel mode switching, then Xen can >>> catch the moment when cpuhog gets to run. However, this way costs too >>> much. Another way is to force a VM to schedule when it entering >>> preempting-state. Therefore, it will trap to Xen when CR3 is changed, >>> and Xen can finish its preempting-state when it schedules cpuhog to >>> run. That's why I want to trigger guest context switch from Xen. I >>> don't really care *which* process it will switch to, I just want to >>> get Xen a chance to run. The point is: is there a better/simpler way >>> to solve this problem? >>> >>> Hope I described the problem clearly. And would you please show more >>> details about the thought of "reschedule event channel"? Thanks! >>> >>> -- >>> Yubin >>> >>> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap >>> <George.Dunlap@xxxxxxxxxxxxx> wrote: >>>> Context switching is a choice the guest OS has to make, and how that's >>>> done will differ based on the operating system. I think if you're >>>> thinking about modifying the guest scheduler, you're probably better >>>> off starting with Linux. Even if there's a way to convince Windows to >>>> call schedule() to pick a new process, I'm not sure you'll be able to >>>> tell it *which* process to choose. >>>> >>>> As far as mechanism on Xen's side, it would be easy enough to allocate >>>> a "reschedule" event channel for the guest, such that whenever you >>>> want to trigger a guest reschedule, just raise the event channel. >>>> >>>> -George >>>> >>>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote: >>>>> Hi, all, >>>>> >>>>> As I'm doing some research in cooperative scheduling between Xen and >>>>> guest domain, I want to know how many ways can Xen trigger a context >>>>> switch inside an HVM guest domain (which runs Windows in my case). Do >>>>> I have to write a driver (like balloon-driver)? Or a user process is >>>>> enough? Or there is an even simpler way? >>>>> >>>>> All your suggestions are appreciated. Thanks! :) >>>>> >>>>> -- >>>>> Yubin >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>>> http://lists.xensource.com/xen-devel >>>>> >>>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>> http://lists.xensource.com/xen-devel >>> >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.