Xen project Mailing List

[Xen-devel] [RFC] Erratic mouse in HVM guest

To: <xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Ross Maxfield" <rmaxfiel@xxxxxxxxxx>

Date: Wed, 28 Jun 2006 16:42:05 -0600

Delivery-date: Wed, 28 Jun 2006 15:42:56 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

To whom it may concern, For many months, some of us at Novell working on and testing Xen have contended with chaotic mouse behavior in HVM Linux guests. This ill-mannered mouse, however, appears to be sensitive to certain hardware. Although I have seen the mouse jump around the screen occasionally on diverse machines, I see it continuously on the Harwich Twin Castle Paxville (3GHz, 8GB, x86_64, 8 way duel-core). The mouse is completely unusable in the guest as the slightest mouse event produces wild results in the guest, either erratic mouse movement or button presses. Bug 167187, “Erratic mouse behavior with HVM Linux guest and SDL” was entered into Novell's Bugzilla April 17th, 2006, and Intel was informed of the issue. Since Novell's first release of Xen with SLES is with full support of para-virtualized guests, this issue relative to the HVM guest has been put aside until recently when I began to explore the cause of the mouse problem. Here's what I've learned. First, the mouse behaves erratically because the data coming out of /dev/input/mice is jumbled up, out of order actually. This was rather perplexing because I had been able to determine that qemu was delivering the data in the proper order and, in fact, i8042_interrupt() of linux-2.6.16/drivers/input/serio/i8042.c executing in the HVM guest was also reporting that the data had been read in proper order, yet the processing of the data occurred out of order. After exploring a number of possible causes for this behavior I discovered an assumption in the kernel code that is true when the kernel is running natively but not necessarily true when hosted by the hypervisor. I learned that the i8042_interrupt() will be polled by the timer interrupt if HZ/20 jiffies has expired since the last 8042 interrupt. So here's what I believe is happening. Each mouse event generates at least three bytes of data, each byte of data generates an interrupt. When the first interrupt is injected in the guest, as well as all interrupts, the kernel masks the interrupt vector in the PIC and then EOIs the PIC before actually handling the interrupt. This, of course, allows ANY other interrupt to occur save the one currently begin serviced. When i8042_interrupt() is called, it first calls timer_mod() to delay the timer callback another HZ/20, takes a spin_lock_irqsave() disabling interrupts (interrupts are enabled prior to i8042_interrupt() being called), reads the 8042 obtaining the first byte of data from qemu, and then releases the spinlock. Immediately after releasing the spinlock, this isr is interrupted by a timer interrupt which discovers that the 8042's HZ/20 timer has expired and i8042_interrupt() is reentered and runs to completion as there is not a pending timer interrupt. When the timer interrupt completes, the previously interrupted isr resumes and continues to process what was to be the first byte but now is not. I have been able to determine that the timer is indeed calling i8052_interrupt() and causing the mis-ordered data. For the timer interrupt handler to believe that HZ/20 jiffies had expired there must have been at least that amount of time lapse between i8052_interrupt() releasing the spinlock and calling serio_intrerrupt() a dozen lines later, suggesting a lengthy hypervisor preemption followed by a timer isr before resuming from the point of preemption. Or, a considerable amount of time, > HZ/20, expired reading the data from qemu's emulation of port 0x60, followed by a timer isr after the spin_unlock_irqrestore() in i8052_interrupt(). Which ever case may be, i8052_interrupt() is _assuming_ that HZ/20 jiffies are not going to lapse before its isr completes. This assumption is probably fair enough for running natively, but not a good assumption when hosted by the current implementation of the hypervisor. The question now is, does the hypervisor change to accommodate the assumption, or is the assumption removed from the kernel, or is there yet some other fiendish time-consuming bug yet to be discovered ? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.