[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] How can Xen trigger a context switch in an HVM guest domain?
James and George, thank you both! The breakpoint way is interesting, I don't event think of it :) OK, I'm going to use a simpler way to verify my idea first. Before the preempting-state VM runs, I will set a timer to make Xen get to run every 100us (maybe longer for the first iteration). The timer-handler will check if the preempting VM is in kernel-mode or user-mode. If it is in user-mode with cpu-hog's CR3, then it will be scheduled out. Meanwhile, if the iteration goes beyond some threshold (say 5 times), the VM will also be scheduled out. This way seems much simpler than the one using breakpoint, and more accurate than the one using 1ms-timer. It may bring some overhead, but the preemption is not supposed to occur frequently and the fairness is more important. The thread problem also exists in Linux platform. Currently I have no good idea to identify different threads from the hypervisor's perspective. I have a dream that one day those OS guys will export this information to VMM, a dream that one day our children will live in a world where virtualization rules. I have a dream today :) Thanks! -- Yubin On Tue, Nov 3, 2009 at 12:05 AM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: > OK, so you want to allow a VM to run so that it can do packet > processing in the kernel, but once it's done in the kernel you want to > preempt the VM again. > > An idea I was going to try out is that if a VM receives an interrupt > (possibly only certain interrupts, like network), let it run for a > very short amount of time (say, 1ms or 500us). That should be enough > for it to do its basic packet processing (or audio processing, video > processing, whatever). True, you're going to run the "cpu hog" during > that time, but that will be debited against time he'll run later. (I > haven't tested this idea yet. It may work better with some credit > algorithms than others.) > > The problem with inducing a guest to call schedule(): > * It may not have any other runnable processes, or it may choose the > same process to run again; so it may not switch the cr3 anyway. > * The only reliable way to do it without some kind of > paravirtualization (if even a kernel driver) would be to give it a > timer interrupt, which may mess up other things on the system, such as > the system time. > > If you're really keen to preempt on return to userspace, you could try > something like the following. Before delivering the interrupt, note > the EIP the guest is at. If it's in user space, set a hardware > breakpoint at that address. Then deliver the interrupt. If the guest > calls schedule(), you can catch the CR3 switch; if it returns to the > same process, it will hit the breakpoint. > > Two possible problems: > * For reasons of ancient history, the iret instruction may set the RF > flag in the EFLAGS register, which will cause the breakpoint not to > fire after the guest iret. You may need to decode the instruction and > set the breakpoint at the instruction after, or something like that. > * I believe windows doens't do a cr3 switch if it does a *thread* > switch. If so, on a thread switch you'll get neither the CR3 switch > nor the breakpoint (since the other thread is probably running > somewhere else). > > Peace, > -George > > On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote: >> Hi, George, >> >> Thank you for your reply. Actually, I'm looking for a generic >> mechanism of cooperative scheduling. The independence of guest OS can >> make such mechanism more convincing and practical, just like the >> balloon driver does. >> >> Maybe you are wondering why I asked such a wired question, let me >> describe it with more details. My current work is based on "Task-aware >> VM scheduling", which is published on VEE'09. By monitoring CR3 >> changing at VMM level, Xen can get information of tasks' CPU >> consumption to identify CPU hogs and I/O tasks. Therefore, the >> task-aware mechanism offers a more fine-grained scheduler than the >> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O >> tasks in a mixed style. >> >> Imagine there are n VMs. One of them, named mix-VM, runs two tasks: >> cpuhog and iotask (network). The other VMs, named CPU-VM, run just >> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows). >> >> Here's what supposed to happen when iotask receiving an network >> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an >> inter-domain event to mix-VM, which is likely to be in run-queue. Xen >> then schedules it to run immediately and set its state to >> preempting-state. Right after that, the mix-VM *should* schedules >> iotask to process the incoming packet, and then schedules cpuhog after >> processing. When the CR3 is changing to cpuhog, Xen knows that the >> mix-VM has finished I/O processing (here we assume that the priority >> of cpuhog is usually lower than iotask in most OS), and schedules the >> mix-VM out to finish its preempting-state. Therefore, the mix-VM can >> preempt other VMs to process I/O ASAP, while making the preempting >> time as short as possible to keep fairness. The point is: cpuhog >> should not run in preempting-state. >> >> However, a problem arises when the mix-VM sending packets. When iotask >> sends an amount of data (using TCP protocol), it will block and wait >> to be waked up after guest kernel sending all the data, which may be >> split into thousands of TCP packets. The mix-VM will receives an ACK >> packet every time it sending a packet, which makes it enter >> preempting-state. Note that at this moment, the CR3 of mix-VM is >> cpuhog's (as the only running process). After the guest kernel >> processing the ACK packet and sending next packet, it switches to user >> mode, which means the cpuhog gets to run in preempting-state. The >> point is: as there is no CR3-changing, Xen has no way to run. >> >> One way is to add a hook at user/kernel mode switching, then Xen can >> catch the moment when cpuhog gets to run. However, this way costs too >> much. Another way is to force a VM to schedule when it entering >> preempting-state. Therefore, it will trap to Xen when CR3 is changed, >> and Xen can finish its preempting-state when it schedules cpuhog to >> run. That's why I want to trigger guest context switch from Xen. I >> don't really care *which* process it will switch to, I just want to >> get Xen a chance to run. The point is: is there a better/simpler way >> to solve this problem? >> >> Hope I described the problem clearly. And would you please show more >> details about the thought of "reschedule event channel"? Thanks! >> >> -- >> Yubin >> >> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap >> <George.Dunlap@xxxxxxxxxxxxx> wrote: >>> Context switching is a choice the guest OS has to make, and how that's >>> done will differ based on the operating system. I think if you're >>> thinking about modifying the guest scheduler, you're probably better >>> off starting with Linux. Even if there's a way to convince Windows to >>> call schedule() to pick a new process, I'm not sure you'll be able to >>> tell it *which* process to choose. >>> >>> As far as mechanism on Xen's side, it would be easy enough to allocate >>> a "reschedule" event channel for the guest, such that whenever you >>> want to trigger a guest reschedule, just raise the event channel. >>> >>> -George >>> >>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote: >>>> Hi, all, >>>> >>>> As I'm doing some research in cooperative scheduling between Xen and >>>> guest domain, I want to know how many ways can Xen trigger a context >>>> switch inside an HVM guest domain (which runs Windows in my case). Do >>>> I have to write a driver (like balloon-driver)? Or a user process is >>>> enough? Or there is an even simpler way? >>>> >>>> All your suggestions are appreciated. Thanks! :) >>>> >>>> -- >>>> Yubin >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >>>> >>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |