[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Questioning the Xen Design of the VMM
> -----Original Message----- > From: Al Boldi [mailto:a1426z@xxxxxxxxx] > Sent: 08 August 2006 15:10 > To: Petersson, Mats > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] Questioning the Xen Design of the VMM > > Petersson, Mats wrote: > > Al Boldi wrote: > > > I hoped Xen would be a bit more > > > transparent, by simply exposing native hw tunneled thru some > > > multiplexed Xen patched host-kernel driver. > > > > On the other hand, to reduce the size of the actual > hypervisor (VMM), > > the approach of Xen is to use Linux as a driver-domain (commonly > > combined as the management "domain" of Dom0). This means that Xen > > hypervisor itself can be driver-less, but of course also relies on > > having another OS on top of itself to make up for this. > Currently Linux > > is the only available option for a driver-domain, but > there's nothing in > > the interface between Xen and the driver domain that says > it HAS to be > > so - it's just much easier to do with a well-known, open-source, > > driver-rich kernel, than with a closed-source or > driver-poor kernel... > > Ok, you are probably describing the state of the host-kernel, > which I agree > needs to be patched for performance reasons. Yes, but you could have more than one driver domain, that is isolated in all aspects from other driver domains (host-kernel implies, to me, that it's also the management of the other domains). Why would you want to have more than one driver domain? For separation of course... 1. Competing Company A and Company B are sharing the same hardware - you don't want Company A to have even the remotest chance of seeing any data that belongs to B or the other way around, so you definitely want them to be separated in as many ways as possible. 2. Let's assume that someone finds a way to "hack" into a system by sending some particular pattern on the network (TCP/IP to a particular port, causing buffer overflow, seems to have been popular on Widnows at least). If you have multiple driver domains, you would only get ONE domain broken (into) by this approach - of course, if it's widespread it would still break all ports, but if it's targetted towards one particular domain, the others will survive [let's say one of your client companies are attacked with a targetted attack - other companies will then be unaffected]. > > > > I maybe missing something, but why should the Xen-design > > > require the guest to be patched? > > > > There are two flavours of Xen guests: > > Para-virtual guests. Those are patched kernels, and have (in past > > versions of Xen) been implemented for Linux 2.4, Linux 2.6, Windows, > > <some version of>BSD and perhaps other versions that I > don't know of. > > Current Xen is "Linux only" supplied with the Xen kernel. > Other kernels > > are being worked on. > > This is the part I am questioning. The main reason to use a para-virtual kernel that it performs better than the fully virtualized version. > > > HVM guests. These are fully virtualized guests, where the > guest contains > > the same binary as you would use on a non-virtual system. > You can run > > Windows or Linux, or most other OS's on this. It does require "new" > > hardware that has virtualization support in hardware (AMD's > AMDV (SVM) > > or Intel VT) to use this flavour of guest though, so the > older model is > > still maintained. > > So HVM solves the problem, but why can't this layer be implemented in > software? It CAN, and has been done. It is however, a little bit difficult to cover some of the "strange" corner cases, as the x86 processor wasn't really designed to handle virtualization natively [until these extensions where added]. This is why you end up with binary translation in VMWare for example. For example, let's say that we use the method of "ring compression" (which is when the guest-OS is moved from Ring 0 [full privileges] to Ring 1 [less than full privileges]), and the hypervisor wants to have full control of interrupt flags: some_function: ... pushf // Save interrupt flag. cli // Disable interrupts ... ... ... popf // Restore interrupt flag. ... In Ring 0, all this works just fine - but of course, we don't know that the guest-OS tried to disable interrupts, so we have to change something. In Ring 1, the guest can't disable interrupts, so the CLI instruction can be intercepted. Great. But pushf/popf is a valid instruction in all four rings - it just doesn't change the interrupt enable flag in the flags register if you're not allowed to use the CLI/STI instructions! So, that means that interrupts are disabled forever after [until an STI instruction gets found by chance, at least]. And if the next bit of code is: mov someaddress, eax // someaddress is updated by an interrupt! $1: cmp someaddress, eax // Check it... jz $1 Then we'd very likely never get out of there, since the actual interrupt causing someaddress to change is believed by the VMM to be disabled. There is no real way to make popf trap [other than supplying it with invalid arguments in virtual 8086 mode, which isn't really a practical thing to do here!] Another problem is "hidden bits" in registers. Let's say this: mov cr0, eax mov eax, ecx or $1, eax mov eax, cr0 mov $0x10, eax mov eax, fs mov ecx, cr0 mov $0xF000000, eax mov $10000, ecx $1: mov $0, fs:eax add $4, eax dec ecx jnz $1 Let's now say that we have an interrupt that the hypervisor would handle in the loop in the above code. The hypervisor itself uses FS for some special purpose, and thus needs to save/restore the FS register. When it returns, the system will crash (GP fault) because the FS register limit is 0xFFFF (64KB) and eax is greater than the limit - but the limit of FS was set to 0xFFFFFFFF before we took the interrupt... Incorrect behaviour like this is terribly difficult to deal with, and there really isn't any good way to solve these issues [other than not allowing the code to run when it does "funny" things like this - or to perform the necessary code in "translation mode" - i.e. emulate each instruction -> slow(ish)]. > > I'm sure there can't be a performance issue, as this > virtualization doesn't > occur on the physical resource level, but is (should be) > rather implemented > as some sort of a multiplexed routing algorithm, I think :) I'm not entirely sure what this statement is trying to say, but as I understand the situation, performance is entirely the reason why the Xen paravirtual model was implemented - all other VMM's are slower [although it's often hard to prove that, since for example Vmware have the rule that they have to give permission before publishing benchmarks of their product, and of course that permission would only be given in cases where there is some benefit to them]. One of the obvious reasons for para-virtual being better than full virtualization is that it can be used in a "batched" mode. Let's say we have some code that does this: ... p = malloc(2000 * 4096); ... Let's then say that the guts of malloc ends up in something like this: map_pages_to_user(...) { for(v = random_virtual_address, p = start_page; p < end_page; p++, v+=4096) map_one_page_to_user(p, v); } In full virtualization, we have no way to understand that someone is mapping 2000 pages to the same user-process in one guest, we'd just see writes to the page-table one page at a time. In the para-virtual case, we could do something like: map_pages_to_user(...) { hypervisor_map_pages_to_user(current_process, start_page, end_page, random_virtual_address); } Now, the hypervisor knows "the full story" and can map all those pages in one go - much quicker, I would say. There's still more work than in the native case, but it's much closer to the native case. > > > I hope this is of use to you. > > > > Please feel free to ask any further questions... > > Thanks a lot for your detailed response! > > -- > Al > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |