[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Questioning the Xen Design of the VMM



Petersson, Mats wrote:
> > > Al Boldi wrote:
> > > > I maybe missing something, but why should the Xen-design
> > > > require the guest to be patched?
>
> The main reason to use a para-virtual kernel that it performs better
> than the fully virtualized version.
>
> > So HVM solves the problem, but why can't this layer be implemented in
> > software?
>
> It CAN, and has been done.

You mean full virtualization using binary translation in software?

My understanding was, that HVM implies full virtualization without the need 
for binary translation in software.

> It is however, a little bit difficult to
> cover some of the "strange" corner cases, as the x86 processor wasn't
> really designed to handle virtualization natively [until these
> extensions where added].

You mean AMDV/IntelVT extensions?

If so, then these extensions don't actively participate in the act of 
virtualization, but rather fix some x86-arch shortcomings, that make it 
easier for software (i.e. Xen) to virtualize, thus circumventing the need to 
do binary translation.  Is this a correct reading?

> This is why you end up with binary translation
> in VMWare for example. For example, let's say that we use the method of
> "ring compression" (which is when the guest-OS is moved from Ring 0
> [full privileges] to Ring 1 [less than full privileges]), and the
> hypervisor wants to have full control of interrupt flags:
>
> some_function:
>       ...
>       pushf                   // Save interrupt flag.
>       cli                     // Disable interrupts
>       ...
>       ...
>       ...
>       popf                    // Restore interrupt flag.
>       ...
>
> In Ring 0, all this works just fine - but of course, we don't know that
> the guest-OS tried to disable interrupts, so we have to change
> something. In Ring 1, the guest can't disable interrupts, so the CLI
> instruction can be intercepted. Great. But pushf/popf is a valid
> instruction in all four rings - it just doesn't change the interrupt
> enable flag in the flags register if you're not allowed to use the
> CLI/STI instructions! So, that means that interrupts are disabled
> forever after [until an STI instruction gets found by chance, at least].
>
>
> And if the next bit of code is:
>
>       mov     someaddress, eax                // someaddress is
> updated by an interrupt!
> $1:
>       cmp     someaddress, eax                // Check it...
>       jz      $1
>
> Then we'd very likely never get out of there, since the actual interrupt
> causing someaddress to change is believed by the VMM to be disabled.
>
> There is no real way to make popf trap [other than supplying it with
> invalid arguments in virtual 8086 mode, which isn't really a practical
> thing to do here!]
>
> Another problem is "hidden bits" in registers.
>
> Let's say this:
>
>       mov     cr0, eax
>       mov     eax, ecx
>       or      $1, eax
>       mov     eax, cr0
>       mov     $0x10, eax
>       mov     eax, fs
>       mov     ecx, cr0
>
>       mov     $0xF000000, eax
>       mov     $10000, ecx
> $1:
>       mov     $0, fs:eax
>       add     $4, eax
>       dec     ecx
>       jnz     $1
>
> Let's now say that we have an interrupt that the hypervisor would handle
> in the loop in the above code. The hypervisor itself uses FS for some
> special purpose, and thus needs to save/restore the FS register. When it
> returns, the system will crash (GP fault) because the FS register limit
> is 0xFFFF (64KB) and eax is greater than the limit - but the limit of FS
> was set to 0xFFFFFFFF before we took the interrupt... Incorrect
> behaviour like this is terribly difficult to deal with, and there really
> isn't any good way to solve these issues [other than not allowing the
> code to run when it does "funny" things like this - or to perform the
> necessary code in "translation mode" - i.e. emulate each instruction ->
> slow(ish)].

Or introduce AMDV/IntelVT extensions?

> > I'm sure there can't be a performance issue, as this
> > virtualization doesn't
> > occur on the physical resource level, but is (should be)
> > rather implemented
> > as some sort of a multiplexed routing algorithm, I think :)
>
> I'm not entirely sure what this statement is trying to say, but as I
> understand the situation, performance is entirely the reason why the Xen
> paravirtual model was implemented - all other VMM's are slower [although
> it's often hard to prove that, since for example Vmware have the rule
> that they have to give permission before publishing benchmarks of their
> product, and of course that permission would only be given in cases
> where there is some benefit to them].
>
> One of the obvious reasons for para-virtual being better than full
> virtualization is that it can be used in a "batched" mode. Let's say we
> have some code that does this:
>
> ...
>       p = malloc(2000 * 4096);
> ...
>
> Let's then say that the guts of malloc ends up in something like this:
>
> map_pages_to_user(...)
> {
>       for(v = random_virtual_address, p = start_page; p < end_page;
> p++, v+=4096)
>               map_one_page_to_user(p, v);
> }
>
> In full virtualization, we have no way to understand that someone is
> mapping 2000 pages to the same user-process in one guest, we'd just see
> writes to the page-table one page at a time.
>
> In the para-virtual case, we could do something like:
> map_pages_to_user(...)
> {
>       hypervisor_map_pages_to_user(current_process, start_page,
> end_page,
> random_virtual_address);
> }
>
> Now, the hypervisor knows "the full story" and can map all those pages
> in one go - much quicker, I would say. There's still more work than in
> the native case, but it's much closer to the native case.

Sure, but wouldn't this be for the price of losing guest-OS transparency?


Thanks!

--
Al


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.