[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Does Dom0 always get interrupts first beforetheyare delivered to other guest domains?



Hi Mats,

Thank you for your always prompt and knowledgeable reply. I will vote you as one of MVP in this mailing list :)

Best regards,

Liang

----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>
Cc: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Sent: Thursday, April 12, 2007 7:00 AM
Subject: RE: [Xen-devel] Does Dom0 always get interrupts first beforetheyare delivered to other guest domains?




-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang
Sent: 12 April 2007 01:21
To: Mark Williamson
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Does Dom0 always get interrupts
first before theyare delivered to other guest domains?

Hi Mark,

I'm not Mark, but I'll try to give some answers...

Thanks for your reply. I still have questions about the
switch overhead
between rings. It seems HW support of VT-x is not as
efficient as expected
as there are too many conditions to check for each vmexit and
vm-reentry.
But I don't know how to quantify the overhead comparison of
vt-x based
context switch and
hypercall based context switch.

The HVM context switch will be longer. How much longer depends on so
many factors that it's probably easier to measure the difference (in
some way) than to try to guestimate it by reading documentation or
anything such-like.

The reason that HVM (AMD-V/Intel VTx) isn't as "good" as the
para-virtual case has very little to do with interrupt handling, I
should think (unless you're doing something very peculiar in your
guest), but much more to do with how the guest hardware acesses are
performed. For example, an interrupt that leads back to the guest will
most likely lead to several DomU VMEXITs just in the interrupt handler.
For example an IDE interrupt that indicates to dom0 that a sector
requested by a HVM DomU is ready:

Assuming that the HVM guest is currently running, the following is the
set of events:
1. VMEXIT for disk-related IRQ in real hardware. Hypervisor forwards the
IRQ 14 to Dom0 (actually, there's nothing that the hypervisor actually
needs to do here, but the guest needs to exit so that Dom0 can run, and
of course, eventually the guest will have to be restarted)
2. QEMU receives the data from the read() function requesting the
disk-data for DomU. Once the data is in QEMU, QEMU will signal the IRQ
to guest.
3. Guest is restarted with Virtual IRQ pending.
4. Guest takes interrupt (assuming interrupt mask and eflag interupt
enable flag allows interrupts to be taken). Processor looks up the IDT
entry for the corresponding IRQ and jumps to the location indicated.
5. IRQ handler checks the status of the IDE controller -> VMEXIT IOIO.
6. VMEXIT IOIO leads to QEMU-operation -> Dom0 needs to run -> QEMU
signals the result back to guest, guest is restarted.
7. IRQ handler retrieves the data [a] -> VMEXIT IOIO.
8. VMEXIT IOIO for the IO read/write of the data [if the driver uses
INS/OUTS this is a single VMEXIT IOIO, if it's a "stupid" driver using
individual IN/OUT instructions, it will take 256 (16-bit per transfer)
VMEXIT's]. Again, this leads to QEMU/Dom0 being scheduled and event back
to guest when done.
9. IRQ handler acknowledges the interrupt -> VMEXIT IOIO. [b]
10. VMEXIT IOIO/MMIO (pagefault) due to access to interrupt controller.
This time we just perform the relevant [A]PIC management inside the
hypervisor, as it's has models for the interrupt controllers (8259 and
APIC). Guest is restarted when the interrupt controller access is
finished.
11. Done.

That is four VMEXIT operations for one disk-interrupt.

[a] The IRQ handler itself may not actually retrieve the data, but some
thread/process that is awakaned by the IRQ handler - this is not really
important for the discussion or the number of VMEXIT's, but it will of
course have some impact on the interrupt latency as interrupts are
disabled within the IRQ handler, but not in the worker thread that reads
the data. The exact order of the events described above is also
different, but the net number of VMEXIT's is unchanged.
[b] On a "old-style" PC, there will be 2 interrupt acknowledge IO
operations, because the IDE controller is wired to IRQ13/14, which is on
the second 8259 PIC, which means that both the master and the slave
needs an ACK operation.


If I just considering the pure context switch ovehead, which
one has bigger
overhead, using HW vmexit/vmentry to do root and non-root
mode switch by
programming VT-x vector or using SW hypercall to inject
interrupt to switch
from ring 1 to ring 0 (or ring 3 to ring 0 for 64bit OS)?
Does the switch
between ring1 and ring0 has the same overhead as the switch
between ring 3
and ring0?

Ring-switch has the same overhead regardless of which rings the switch
is between [at least in the sense that the processor does exactly the
same thing when switching from ring 2 to 1 or ring 3 to 0 - the exact
time it takes to switch rings is harder to determine, because it depends
on alignments, cache hit/miss rates, and various other things].

BTW, both root and non-root mode has four rings, if the ring0
and ring3 in
non-root mode are used for guest os kernel and user
applications, which
ring level in root mode will be used when a vmexit happens?

The VMEXIT will end up in ring 0 in the hypervisor [in AMD-processors,
VMEXIT "returns", whilst in Intel processors, there is a dedicated
register that holds the "vmexit address", but the processor essentially
returns to a state that is identical to prior to the
VMRUN/VMLAUNCH/VMRESUME instruction that got the guest-code running in
the first place - very similar to a call instruction].

Can I jump from
ring 3 in non-root mode directly to ring 0 in root mode?

Yes, that's perfectly possible (in fact, it's most likely what ALWAYS
happens).

--
Mats

Thanks,

Liang

----- Original Message ----- From: "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>
To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>
Cc: <xen-devel@xxxxxxxxxxxxxxxxxxx>; "'Petersson, Mats'"
<Mats.Petersson@xxxxxxx>
Sent: Saturday, April 07, 2007 9:59 AM
Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before
theyare delivered to other guest domains?


>> I have another question about using VT-X and Hypercall to support
>> para-virtualized and full-virtualized domain simultaneously:
>
> Sure, sorry for the delay...
>
>> It seems Xen does not need to use hypercall to replace all
problematic
>> instructions (e.g. HLT, POPF etc.). For example, there is
an instruction
>> called CLTS. Instead of replacing it with a hypercall, Xen
hypervisor
>> will
>> first delegate it to ring 0 when a GP fault occurs and
then run it from
>> there to solve ring aliasing issue.
>> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
>>
>
> If instructions are trappable then Xen can catch their execution and
> emulate them - it sometimes does this, even for paravirt
guests.  Since
> a GPF occurs it's possible to catch the CLTS instruction.  Some
> instructions fail silently when run outside ring 0, which is one cas
> ewhere a hypercall is more important (broadly speaking, the
other cases
> for using hypercalls being performance and improved manageability).
>
>> Now my first question comes up: if I 'm running both
para-virtualized and
>> full-virtualized domain on single CPU (I think Xen
hypervisor will set up
>> the exception bitmap for CLTS instruction for HVM domain). Then Xen
>> hypervisor will be confused and does not know how to
handle it when
>> running
>> CLTS in ring 1.
>
> It'll know which form of handling is required because it changes the
> necessary data structures when context switching between the two
> domains.
>
> The other stuff is a bit too specific in HVM-land for me to answer
> fully, but I vaguely remember Mats having already responded.
>
> Cheers,
> Mark
>
>> Does Xen hypervisor do a VM EXIT or still delegate CLTS to
ring 0? How
>> does
>> Xen hypervisor distinguish the instruction is from
para-virtualized
>> domain
>> or is from a full-virtualized domain? Does Xen have to replace all
>> problematic instructions with hypercalls for Para-domain
(even for CLTS)?
>> Why does Xen need to use different strategies in
para-virtualized domain
>> to
>> handle CLTS (delegation to ring 0) and other problematic
instructions
>> (hypercall)?
>>
>>
>> My second question:
>> It seems each processor has its own exception bitmap. If I have
>> multi-processors (vt-x enabled), does Xen hypervisor use the same
>> exception
>> bitmap in all processors or does Xen allow different
processor have its
>> own
>> (maybe different) exception bitmap?
>>
>> Best regards,
>>
>> Liang
>>
>> -----Original Message-----
>> From: M.A. Williamson [mailto:maw48@xxxxxxxxxxxxxxxx] On
Behalf Of Mark
>> Williamson
>> Sent: Tuesday, March 20, 2007 5:37 PM
>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Cc: Liang Yang; Petersson, Mats
>> Subject: Re: [Xen-devel] Does Dom0 always get interrupts
first before
>> they
>> are delivered to other guest domains?
>>
>> Hi,
>>
>> > First, you once gave another excellent explanation about the
>> > communication
>> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
>> > "...Since these IO events are synchronous in a real
processor, the
>> > hypervisor will wait for a "return event" before the
guest is allowed
>> > to
>> > continue. Qemu-dm runs as a normal user-process in Dom0..."
>> > My question is about those Synchronous I/O events. Why
can't we make
>> > them
>> > asynchronous? e.g. whenever I/O are done, we can
interrupt HV again and
>> let
>> > HV resume I/O processing. Is there any specific
limiation to force Xen
>> > hypervisor do I/O in synchronous mode?
>>
>> Was this talking about IO port reads / writes?
>>
>> The problem with IO port reads is that the guest expects
the hardware to
>> have
>> responded to an IO port read and for the result to be
available as soon
>> as
>> the inb (or whatever) instruction has finished...
Therefore in a virtual
>> machine, we can't return to the guest until we've figured out (by
>> emulating
>> using the device model) what that read should return.
>>
>> Consecutive writes can potentially be batched, I believe,
and there has
>> been
>>
>> talk of implementing that.
>>
>> I don't see any reason why other VCPUs shouldn't keep
running in the
>> meantime,
>> though.
>>
>> > Second,  you just mentioned there is big difference
between the number
>> > of
>> > HV-to-domain0 events for device model and split driver
model. Could you
>> > elaborate the details about how split driver model can reduce the
>> > HV-to-domain0 events compared with using qemu device model?
>>
>> The PV split drivers are designed to minimise events:
they'll queue up a
>> load
>> of IO requests in a batch and then notify dom0 that the IO
requests are
>> ready.
>>
>> In contrast, the FV device emulation can't do this: we
have to consult
>> dom0
>> for the emulation of any device operations the guest does
(e.g. each IO
>> port
>>
>> read the guest does) so the batching is less efficient.
>>
>> Cheers,
>> Mark
>>
>> > Have a wonderful weekend,
>> >
>> > Liang
>> >
>> > ----- Original Message -----
>> > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
>> > To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>;
>> > <xen-devel@xxxxxxxxxxxxxxxxxxx>
>> > Sent: Friday, March 16, 2007 10:40 AM
>> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts
first before
>> > they
>> > are delivered to other guest domains?
>> >
>> > > -----Original Message-----
>> > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
Behalf Of Liang
>> > > Yang
>> > > Sent: 16 March 2007 17:30
>> > > To: xen-devel@xxxxxxxxxxxxxxxxxxx
>> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
>> > > before they are delivered to other guest domains?
>> > >
>> > > Hello,
>> > >
>> > > It seems if HVM domains access device using emulation mode
>> > > w/ device model
>> > > in domain0, Xen hypervisor will send the interrupt event to
>> > > domain0 first
>> > > and then the device model in domain0 will send event
to HVM domains.
>> >
>> > Ok, so let's see if I've understood your question first:
>> > If we do a disk-read (for example), the actual disk-read
operation
>> > itself will generate an interrupt, which goes into Xen
HV where it's
>> > converted to an event that goes to Dom0, which in turn
wakes up the
>> > pending call to read (in this case) that was requesting
the disk IO,
>> > and
>> > then when the read-call is finished an event is sent to
the HVM DomU.
>> > Is
>> > this the sequence of events that you're talking about?
>> >
>> > If that's what you are talking about, it must be done this way.
>> >
>> > > However, if I'm using split driver model and I only
run BE driver on
>> > > domain0. Does domain0 still get the interrupt first (assume
>> > > this interupt is
>> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
>> > > Xen hypervisor
>> > > will send event directly to HVM domain bypass domain0 for
>> > > split driver
>> > > model?
>> >
>> > Not in the above type of scenario. The interrupt must go to the
>> > driver-domain (normally Dom0) to indicate that the
hardware is ready to
>> > deliver the data. This will wake up the user-mode call
that waited for
>> > the data, and then the data can be delivered to the
guest domain from
>> > there (which in turn is awakened by the event sent from
the driver
>> > domain).
>> >
>> > There is no difference in the number of events in these
two cases.
>> >
>> > There is however a big difference in the number of
hypervisor-to-dom0
>> > events that occur: the HVM model will require something
in the order of
>> > 5 writes to the IDE controller to perform one disk read/write
>> > operation.
>> > Each of those will incur one event to wake up qemu-dm,
and one event to
>> > wake the domu (which will most likely just to one or two
instructions
>> > forward to hit the next write to the IDE controller).
>> >
>> > > Another question is: for interrupt delivery, does Xen treat
>> > > para-virtualized
>> > > domain differently from HVM domain considering using device
>> > > model and split
>> > > driver model?
>> >
>> > Not in interrupt delivery, no. Except for the fact that
HVM domains
>> > obviously have full hardware interfaces for interrupt
controllers etc,
>> > which adds a little bit of overhead (because each
interrupt needs to be
>> > acknowledged/cancelled on the interrupt controller, for example).
>> >
>> > --
>> > Mats
>> >
>> > > Thanks a lot,
>> > >
>> > > Liang
>> > >
>> > >
>> > > _______________________________________________
>> > > Xen-devel mailing list
>> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > > http://lists.xensource.com/xen-devel
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-devel
>>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel






_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.