[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-ia64-devel] Re: PMT table for XEN/IA64 (was: RE:Transparentparavirtualization vs. xen paravirtualization)
I am open to considering a design change which exposes a physical-to-machine translation table (PMT) which is shared between domain0 and Xen. Domain0 is: - started once by Xen - is essentially in the same trust domain as Xen - unlikely (outside of research projects) to ever be safely rebootable without a system/Xen reboot - rarely will run real customer apps, so need not use a large portion of a system's physical memory - not migratable However, I agree with Matt that a PMT for other domains (domU) is a bad idea as it creates many problems for migration, save/restore, ballooning, and adding new domains to an already loaded system. Further, the grant table abstraction is the primary mechanism for page sharing for domU in Xen (on Xen/x86). I think if domU has any knowledge of actual machine addresses, the Xen team would consider this a bug that should be fixed. Some of the email discussion in this thread has referred to a PMT for dom0 and others refer to a PMT for both dom0 and domU. At this time, I am willing to consider a PMT for dom0 only. If you would like to start proposing a design (and patches) for dom0 PMT, please start a new thread and describe: - what is the structure/size of the PMT and how is it allocated (e.g. is it a linear table)? Does the table have other attributes (e.g. r/w permissions) or is it just a one-to-one map of physical-to-machine pages? - how do you deal with different page sizes? (does dom0 need to be compiled with PAGE_SIZE=4K?) - how is dom0 I/O handled (differently than it is now)? - what is the impact on handling virtual translations (e.g. vcpu_translate())? - what code that is now different for ia64 in the Xen virtual drivers would now be the same as x86** - what code that is now different for ia64 in the Xen virtual drivers will still be different between ia64 and x86** - what code (outside of Xen drivers) in xenlinux/ia64 would need to be changed and is it still possible to make the changes transparent? - can dom0 and domU still use the same binary? - what code in grant_table.c changes (can we merge back to using common/grant_table.c instead of a separate file?) HOWEVER, unless there is a general consensus that this change will be easy to implement and debug, and will make fixing of multiple domains and/or implementation of virtual networking much easier for 3.0, I see this as a post-3.0 implementation. Thanks, Dan ** it would be good to see the patches for the drivers as I think the whole point of this proposal is to make the code closer to Xen/x86 to minimize differences/maintenance. If "before" we have 100 lines different, and "after" we have 90 lines different, and there are other disadvantages, adding a PMT might not be a very good tradeoff. > -----Original Message----- > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf > Of Dong, Eddie > Sent: Tuesday, November 01, 2005 12:09 AM > To: Matt Chapman; Tian, Kevin > Cc: Ling, Xiaofeng; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > Subject: RE: [Xen-ia64-devel] Re: PMT table for XEN/IA64 > (was: RE:Transparentparavirtualization vs. xen paravirtualization) > > Matt: > Yes, like you mentioned, let domU or VTIdomain only do > page flipping with assumption of the service domain own whole > system pages (i.e all other domain's page comes from service > domain) works. While it is eventually impossible for driver > domains as there can be only one domain that own whole system > pages. So either we start with what you proposed, and roll > back to what X86 is doing now at some time later for example > Xen3.1, or we start with align to Xen/X86 and save all > various maintaince effort and rework effort. I suggest we go > with the right design it will be eventually. > Yes, supporting PMT may require modification in > Xenia64Linux, while as you pointed out, domU in any sense > (migration, memory location etc.) has to maintain PMT table, > so why not let dom0 work in same way? Let dom0 and domU use > as much code as possible is a right way to do IMO, right? > The modification to Xenia64Linux is not so big, > probably only PMT setup now, and then VBD/VNIF work may > reference and modify it. It should be almost same with X86 approach. > What sepcific question about X86 shadow_translate? I > can consult expert here too if you need :-) > > So, now it may be time for us to dig into details of > how to do PMTs...:-) And dan? > Eddie > > > > > Matt Chapman wrote: > > I'm still not clear about the details. Could you outline > the changes > > that you want to make to Xen/ia64? > > > > Would DomU have a PMT? Surely DomU should not know about > real machine > > addresses, that should be hidden behind the grant table interface. > > Otherwise migration, save/restore, etc. are difficult (as they have > > found on x86). > > > > Do you know how x86 shadow_translate mode works? Perhaps we should > > use that as an example. > > > > Matt > > > > > > On Mon, Oct 31, 2005 at 05:11:09PM +0800, Tian, Kevin wrote: > >> Matt Chapman wrote: > >>> 1. Packet arrives in a Dom0 SKB. Of course the buffer needs > >>> to be page sized/aligned (this is true on x86 too). > >>> 2. netback steals the buffer > >>> 3. netback donates it to DomU *without freeing it* > >>> 4. DomU receives the frame and passes it up its network stack > >>> 5. DomU gives away other frame(s) to restore balance > >>> 6. Dom0 eventually receives extra frames via its balloon driver > >>> > >>> 5 and 6 can be done lazily in batches. Alternatively, 4 and 5 > >>> could be a single "flip" operation. > >> > >> The solution will work with some tweaks. But is there any obvious > >> benefit than PMT approach used on x86? (If yes, you should suggest > >> to xen-devel;-) Usually we want a different approach for either > >> "can't do on this architecture" or "far better performance than > >> existing one". Or else why we derail from Xen design for extra > >> maintainance effort. This extra effort has causing us 2+ weeks to > >> get VBD up to support DomU for the last 2 upstream merges > >> > >>> > >>> I think this is not significantly different from x86. > >>> > >>> I'm not saying this is necessarily better than a PMT solution, > >>> but I want to discuss the differences and trade-offs. By PMT > >>> I assume you mean to make Dom0 not 1:1 mapped, and then give > >>> it access to the translation table? Can you describe how the > >>> above works differently with a PMT? > >> > >> > >> Simply saying the work flow, PMT approach is similar with > >> backend/frontend needed to touch PMT table for ownership change. > >> However do you evaluate how many tricky changes required to support > >> Domain0 with gpn=mfn upon existing code? For example, > >> - Backend drivers are not bound to dom0, which can also > be used by > >> domU as driver domain. At that time, 1:1 mapping has no > sense there. > >> There are some talks on DomU servers as driver IO already. > >> - You need ensure all available pages granted to dom0. > That means > >> you need change current dom0 allocation code. > >> - You need to change current vnif code with - unknown - > #ifdefs and > >> workarounds, since you implement a new behavior on top of different > >> approach. > >> - ... (maintenance!) > >> > >> So if you implement a VM from scratch, then definitely > your approach > >> is worthy of trying since no limitation there. However > since we work > >> on XEN, we should take advantage of current Xen design as possible, > >> right? ;-) > >> > >>> > >>> One disadvantage I see of having Dom0 not 1:1 is that superpages > >>> are more difficult, we can't just use the guest's superpages. > >> > >> > >> Superpages are optimization option, and we still need to support > >> incontiguous pages as a basic requirement. You can still add option > >> to allocate contiguous pages for guest even with PMT table, since > >> para-virtualization is cooperative. > >> > >>> > >>> Also, are there paravirtualisation changes needed to support a > >>> PMT? I'm concerned about not making the paravirtualisation > >>> changes too complex (I think x86 Xen changes the OS too much). > >>> Also, it should be possible to load Xen frontend drivers into > >>> unmodified OSs (on VT). > >> > >> > >> We need balance between new designs and maintainance effort. > >> Currently Xiaofeng Lin from Intel is working on para-drivers for > >> unmodified domain, and both VBD & VNIF are working for x86 VT > >> domains already and are reviewing by Cambridge. This work is based > >> on PMT table. > >> > >> Kevin > >>> > >>> On Mon, Oct 31, 2005 at 01:28:43PM +0800, Tian, Kevin wrote: > >>>> Hi, Matt, > >>>> > >>>> The point here is how to check donated frame done and > where "free" > >>>> actually happens in domU. Currently Linux network driver utilizes > >>>> zero-copy to pass received packet up without any copy. In this > >>>> case, the receive pages are allocated from skbuff, which however > >>>> is freed by upper layer instead of vnif driver itself. > To let dom0 > >>>> know when the donated page is done, you may either: > >>>> - Copy content from donated page to local skbuff page, and then > >>>> notify dom0 immediately at the cost of performance > >>>> - Modify upper layer code to register "free" hook which notify > >>>> dom0 if done at the cost of more modification to common code and > >>>> bias from x86. > >>>> > >>>> Definitely there're other possibilities to make it "working" by > >>>> this approach and even more alternatives. However the point we > >>>> really want to emphasize here is that we can move towards x86 > >>>> solution by adding PMT, with best performance and less > maintenance > >>>> effort. That can actually minimize our future re-base effort when > >>>> para-drivers keep going. ;-) > >>>> > >>>> Thanks, > >>>> Kevin > >>>> > >>>>> -----Original Message----- > >>>>> From: Matt Chapman [mailto:matthewc@xxxxxxxxxxxxxxx] > >>>>> Sent: 2005å10æ31æ 13:09 > >>>>> To: Tian, Kevin > >>>>> Cc: Dong, Eddie; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>> Subject: Re: [Xen-ia64-devel] Re: PMT table for > XEN/IA64 (was: RE: > >>>>> Transparentparavirtualization vs. xen paravirtualization) > >>>>> > >>>>> Yes, I think I understand the problem now. > >>>>> > >>>>> The way I imagine this could work is that Dom0 would know about > >>>>> all of the memory in the machine (i.e. it would be passed the > >>>>> original EFI memmap, minus memory used by Xen). > >>>>> > >>>>> Then Dom0 would donate memory for other domains (=ballooning). > >>>>> Dom0 can donate data frames to DomU in the same way - > by granting > >>>>> the frame and not freeing it. When DomU donates a data frame to > >>>>> Dom0, Dom0 frees it when it is done, and now the kernel can use > >>>>> it. > >>>>> > >>>>> What do you think of this approach? > >>>>> > >>>>> Matt > >>>>> > >>>>> > >>>>> On Mon, Oct 31, 2005 at 11:09:04AM +0800, Tian, Kevin wrote: > >>>>>> Hi, Matt, > >>>>>> It's not related to mapped virtual address, but only for > >>>>>> physical/machine pfn. > >>>>> Current vnif backend (on x86) works as: > >>>>>> > >>>>>> 1. Allocate a set of physical pfns from kernel > >>>>>> 2. chop up the mapping between physical pfn and old machine pfn > >>>>>> 3. Transfer ownership of old machine pfn to frontend > >>>>>> 4. Allocate new machine pfn and bound to that physical pfn > >>>>>> (In this case, there's no ownership return from frontend for > >>>>>> performance reason) > >>>>>> > >>>>>> If without PMT table (Assuming guest==machine > for dom0), that > >>>>>> means you > >>>>> have to hotplug physical pfns from guest (based on page > >>>>> granularity) based on current vnif model. Or maybe you > have better > >>>>> alternative without PMT, and without big change to existing vnif > >>>>> driver simultaneously? > >>>>>> > >>>>>> Thanks, > >>>>>> Kevin > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >>>>>>> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] > On Behalf Of > >>>>>>> Matt Chapman Sent: 2005å10æ31æ 10:59 To: Dong, Eddie > >>>>>>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>>>> Subject: [Xen-ia64-devel] Re: PMT table for XEN/IA64 (was: RE: > >>>>>>> Transparentparavirtualization vs. xen paravirtualization) > >>>>>>> > >>>>>>> Hi Eddie, > >>>>>>> > >>>>>>> The way I did it was to make the address argument to grant > >>>>>>> hypercalls in/out; that is, the hypervisor might > possibly return > >>>>>>> a different address than the one requested, like mmap on UNIX. > >>>>>>> > >>>>>>> For DomU, the hypervisor would map the page at the requested > >>>>>>> address. For Dom0, the hypervisor would instead return the > >>>>>>> existing address of that page, since Dom0 already has access > >>>>>>> to the whole address space. > >>>>>>> > >>>>>>> (N.B. I'm referring to physical/machine mappings here; unlike > >>>>>>> the x86 implementation where the grant table ops map pages > >>>>>>> directly into virtual address space). > >>>>>>> > >>>>>>> Matt > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Oct 28, 2005 at 10:28:08PM +0800, Dong, Eddie wrote: > >>>>>>>>> Page flipping should work just fine > >>>>>>>>> in the current design; Matt had it almost working (out of > >>>>>>>>> tree) before he went back to school. > >>>>>>>>> > >>>>>>>> Matt: > >>>>>>>> Dan mentioned that you had VNIF work almost > done without PMT > >>>>>>>> table support for dom0, Can you share the idea with us? > >>>>>>>> Usually VNIF swap page between dom0 and domU so > that network > >>>>>>>> package copy (between dom0 native driver and domU frontend > >>>>>>>> driver) can be avoided and thus achieve high > performance. With > >>>>>>>> this swap, we can no longer assume dom0 gpn=mfn. So what did > >>>>>>>> you ever propose to port VNIF without PMT > table? Thanks a > >>>>>>>> lot, eddie > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Xen-ia64-devel mailing list > >>>>>>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>>>> http://lists.xensource.com/xen-ia64-devel > >>> > >>> _______________________________________________ > >>> Xen-ia64-devel mailing list > >>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>> http://lists.xensource.com/xen-ia64-devel > > > _______________________________________________ > Xen-ia64-devel mailing list > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-ia64-devel > _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |