[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] Re: PMT table for XEN/IA64 (was: RE:Transparentparavirtualization vs. xen paravirtualization)



I am open to considering a design change which exposes a
physical-to-machine translation table (PMT) which is shared
between domain0 and Xen.  Domain0 is:
- started once by Xen
- is essentially in the same trust domain as Xen
- unlikely (outside of research projects) to ever be safely
  rebootable without a system/Xen reboot
- rarely will run real customer apps, so need not use
  a large portion of a system's physical memory
- not migratable

However, I agree with Matt that a PMT for other domains
(domU) is a bad idea as it creates many problems for migration,
save/restore, ballooning, and adding new domains to an already
loaded system. Further, the grant table abstraction is the primary
mechanism for page sharing for domU in Xen (on Xen/x86).
I think if domU has any knowledge of actual machine addresses,
the Xen team would consider this a bug that should be fixed.

Some of the email discussion in this thread has referred to
a PMT for dom0 and others refer to a PMT for both dom0 and domU.
At this time, I am willing to consider a PMT for dom0 only.
If you would like to start proposing a design (and patches)
for dom0 PMT, please start a new thread and describe:

- what is the structure/size of the PMT and how is it allocated
  (e.g. is it a linear table)? Does the table have other
  attributes (e.g. r/w permissions) or is it just a one-to-one
  map of physical-to-machine pages?
- how do you deal with different page sizes?  (does dom0 need
  to be compiled with PAGE_SIZE=4K?)
- how is dom0 I/O handled (differently than it is now)?
- what is the impact on handling virtual translations (e.g.
  vcpu_translate())?
- what code that is now different for ia64 in the Xen virtual
  drivers would now be the same as x86**
- what code that is now different for ia64 in the Xen virtual
  drivers will still be different between ia64 and x86**
- what code (outside of Xen drivers) in xenlinux/ia64 would
  need to be changed and is it still possible to make the
  changes transparent?
- can dom0 and domU still use the same binary?
- what code in grant_table.c changes (can we merge back to
  using common/grant_table.c instead of a separate file?)

HOWEVER, unless there is a general consensus that this change
will be easy to implement and debug, and will make fixing of
multiple domains and/or implementation of virtual networking
much easier for 3.0, I see this as a post-3.0 implementation.

Thanks,
Dan

** it would be good to see the patches for the drivers as
I think the whole point of this proposal is to make the code
closer to Xen/x86 to minimize differences/maintenance.  If
"before" we have 100 lines different, and "after" we have
90 lines different, and there are other disadvantages,
adding a PMT might not be a very good tradeoff.


> -----Original Message-----
> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf 
> Of Dong, Eddie
> Sent: Tuesday, November 01, 2005 12:09 AM
> To: Matt Chapman; Tian, Kevin
> Cc: Ling, Xiaofeng; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-ia64-devel] Re: PMT table for XEN/IA64 
> (was: RE:Transparentparavirtualization vs. xen paravirtualization)
> 
> Matt:
>       Yes, like you mentioned, let domU or VTIdomain only do 
> page flipping with assumption of the service domain own whole 
> system pages (i.e all other domain's page comes from service 
> domain) works. While it is eventually impossible for driver 
> domains as there can be only one domain that own whole system 
> pages. So either we start with what you proposed, and roll 
> back to what X86 is doing now at some time later for example 
> Xen3.1, or we start with align to Xen/X86 and save all 
> various maintaince effort and rework effort. I suggest we go 
> with the right design it will be eventually.
>       Yes, supporting PMT may require modification in 
> Xenia64Linux, while as you pointed out, domU in any sense 
> (migration, memory location etc.)  has to maintain PMT table, 
> so why not let dom0 work in same way? Let dom0 and domU use 
> as much code as possible is a right way to do IMO, right?
>       The modification to Xenia64Linux is not so big, 
> probably only PMT setup now, and then VBD/VNIF work may 
> reference and modify it. It should be almost same with X86 approach.
>       What sepcific question about X86 shadow_translate? I 
> can consult expert here too if you need :-)
> 
>       So, now it may be time for us to dig into details of 
> how to do PMTs...:-) And dan?
> Eddie
> 
> 
> 
> 
> Matt Chapman wrote:
> > I'm still not clear about the details.  Could you outline 
> the changes
> > that you want to make to Xen/ia64?
> > 
> > Would DomU have a PMT?  Surely DomU should not know about 
> real machine
> > addresses, that should be hidden behind the grant table interface.
> > Otherwise migration, save/restore, etc. are difficult (as they have
> > found on x86).
> > 
> > Do you know how x86 shadow_translate mode works?  Perhaps we should
> > use that as an example.
> > 
> > Matt
> > 
> > 
> > On Mon, Oct 31, 2005 at 05:11:09PM +0800, Tian, Kevin wrote:
> >> Matt Chapman wrote:
> >>> 1. Packet arrives in a Dom0 SKB.  Of course the buffer needs
> >>>    to be page sized/aligned (this is true on x86 too).
> >>> 2. netback steals the buffer
> >>> 3. netback donates it to DomU *without freeing it*
> >>> 4. DomU receives the frame and passes it up its network stack
> >>> 5. DomU gives away other frame(s) to restore balance
> >>> 6. Dom0 eventually receives extra frames via its balloon driver
> >>> 
> >>> 5 and 6 can be done lazily in batches.  Alternatively, 4 and 5
> >>> could be a single "flip" operation.
> >> 
> >> The solution will work with some tweaks.  But is there any obvious
> >> benefit than PMT approach used on x86? (If yes, you should suggest
> >> to xen-devel;-) Usually we want a different approach for either
> >> "can't do on this architecture" or "far better performance than
> >> existing one". Or else why we derail from Xen design for extra
> >> maintainance effort.  This extra effort has causing us 2+ weeks to
> >> get VBD up to support DomU for the last 2 upstream merges      
> >> 
> >>> 
> >>> I think this is not significantly different from x86.
> >>> 
> >>> I'm not saying this is necessarily better than a PMT solution,
> >>> but I want to discuss the differences and trade-offs.  By PMT
> >>> I assume you mean to make Dom0 not 1:1 mapped, and then give
> >>> it access to the translation table?  Can you describe how the
> >>> above works differently with a PMT?
> >> 
> >> 
> >> Simply saying the work flow, PMT approach is similar with
> >> backend/frontend needed to touch PMT table for ownership change.
> >> However do you evaluate how many tricky changes required to support
> >> Domain0 with gpn=mfn upon existing code? For example,   
> >>    - Backend drivers are not bound to dom0, which can also 
> be used by
> >> domU as driver domain. At that time, 1:1 mapping has no 
> sense there.
> >> There are some talks on DomU servers as driver IO already.  
> >>    - You need ensure all available pages granted to dom0. 
> That means
> >> you need change current dom0 allocation code. 
> >>    - You need to change current vnif code with - unknown - 
> #ifdefs and
> >> workarounds, since you implement a new behavior on top of different
> >> approach.  
> >>    - ... (maintenance!)
> >> 
> >> So if you implement a VM from scratch, then definitely 
> your approach
> >> is worthy of trying since no limitation there. However 
> since we work
> >> on XEN, we should take advantage of current Xen design as possible,
> >> right? ;-)   
> >> 
> >>> 
> >>> One disadvantage I see of having Dom0 not 1:1 is that superpages
> >>> are more difficult, we can't just use the guest's superpages.
> >> 
> >> 
> >> Superpages are optimization option, and we still need to support
> >> incontiguous pages as a basic requirement. You can still add option
> >> to allocate contiguous pages for guest even with PMT table, since
> >> para-virtualization is cooperative.   
> >> 
> >>> 
> >>> Also, are there paravirtualisation changes needed to support a
> >>> PMT?  I'm concerned about not making the paravirtualisation
> >>> changes too complex (I think x86 Xen changes the OS too much).
> >>> Also, it should be possible to load Xen frontend drivers into
> >>> unmodified OSs (on VT).
> >> 
> >> 
> >> We need balance between new designs and maintainance effort.
> >> Currently Xiaofeng Lin from Intel is working on para-drivers for
> >> unmodified domain, and both VBD & VNIF are working for x86 VT
> >> domains already and are reviewing by Cambridge. This work is based
> >> on PMT table.    
> >> 
> >> Kevin
> >>> 
> >>> On Mon, Oct 31, 2005 at 01:28:43PM +0800, Tian, Kevin wrote:
> >>>> Hi, Matt,
> >>>> 
> >>>>  The point here is how to check donated frame done and 
> where "free"
> >>>> actually happens in domU. Currently Linux network driver utilizes
> >>>> zero-copy to pass received packet up without any copy. In this
> >>>> case, the receive pages are allocated from skbuff, which however
> >>>> is freed by upper layer instead of vnif driver itself. 
> To let dom0
> >>>> know when the donated page is done, you may either:
> >>>>  - Copy content from donated page to local skbuff page, and then
> >>>> notify dom0 immediately at the cost of performance
> >>>>  - Modify upper layer code to register "free" hook which notify
> >>>> dom0 if done at the cost of more modification to common code and
> >>>> bias from x86. 
> >>>> 
> >>>>  Definitely there're other possibilities to make it "working" by
> >>>> this approach and even more alternatives. However the point we
> >>>> really want to emphasize here is that we can move towards x86
> >>>> solution by adding PMT, with best performance and less 
> maintenance
> >>>> effort. That can actually minimize our future re-base effort when
> >>>> para-drivers keep going. ;-) 
> >>>> 
> >>>> Thanks,
> >>>> Kevin
> >>>> 
> >>>>> -----Original Message-----
> >>>>> From: Matt Chapman [mailto:matthewc@xxxxxxxxxxxxxxx]
> >>>>> Sent: 2005å10æ31æ 13:09
> >>>>> To: Tian, Kevin
> >>>>> Cc: Dong, Eddie; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>> Subject: Re: [Xen-ia64-devel] Re: PMT table for 
> XEN/IA64 (was: RE:
> >>>>> Transparentparavirtualization vs. xen paravirtualization)
> >>>>> 
> >>>>> Yes, I think I understand the problem now.
> >>>>> 
> >>>>> The way I imagine this could work is that Dom0 would know about
> >>>>> all of the memory in the machine (i.e. it would be passed the
> >>>>> original EFI memmap, minus memory used by Xen).
> >>>>> 
> >>>>> Then Dom0 would donate memory for other domains (=ballooning).
> >>>>> Dom0 can donate data frames to DomU in the same way - 
> by granting
> >>>>> the frame and not freeing it.  When DomU donates a data frame to
> >>>>> Dom0, Dom0 frees it when it is done, and now the kernel can use
> >>>>> it. 
> >>>>> 
> >>>>> What do you think of this approach?
> >>>>> 
> >>>>> Matt
> >>>>> 
> >>>>> 
> >>>>> On Mon, Oct 31, 2005 at 11:09:04AM +0800, Tian, Kevin wrote:
> >>>>>> Hi, Matt,
> >>>>>>        It's not related to mapped virtual address, but only for
> >>>>>> physical/machine pfn.
> >>>>> Current vnif backend (on x86) works as:
> >>>>>> 
> >>>>>> 1. Allocate a set of physical pfns from kernel
> >>>>>> 2. chop up the mapping between physical pfn and old machine pfn
> >>>>>> 3. Transfer ownership of old machine pfn to frontend
> >>>>>> 4. Allocate new machine pfn and bound to that physical pfn
> >>>>>> (In this case, there's no ownership return from frontend for
> >>>>>> performance reason) 
> >>>>>> 
> >>>>>>        If without PMT table (Assuming guest==machine 
> for dom0), that
> >>>>>> means you
> >>>>> have to hotplug physical pfns from guest (based on page
> >>>>> granularity) based on current vnif model. Or maybe you 
> have better
> >>>>> alternative without PMT, and without big change to existing vnif
> >>>>> driver simultaneously?
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Kevin
> >>>>>> 
> >>>>>>> -----Original Message-----
> >>>>>>> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >>>>>>> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] 
> On Behalf Of
> >>>>>>> Matt Chapman Sent: 2005å10æ31æ 10:59 To: Dong, Eddie
> >>>>>>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>>> Subject: [Xen-ia64-devel] Re: PMT table for XEN/IA64 (was: RE:
> >>>>>>> Transparentparavirtualization vs. xen paravirtualization)
> >>>>>>> 
> >>>>>>> Hi Eddie,
> >>>>>>> 
> >>>>>>> The way I did it was to make the address argument to grant
> >>>>>>> hypercalls in/out; that is, the hypervisor might 
> possibly return
> >>>>>>> a different address than the one requested, like mmap on UNIX.
> >>>>>>> 
> >>>>>>> For DomU, the hypervisor would map the page at the requested
> >>>>>>> address.  For Dom0, the hypervisor would instead return the
> >>>>>>> existing address of that page, since Dom0 already has access
> >>>>>>> to the whole address space.
> >>>>>>> 
> >>>>>>> (N.B. I'm referring to physical/machine mappings here; unlike
> >>>>>>> the x86 implementation where the grant table ops map pages
> >>>>>>> directly into virtual address space).
> >>>>>>> 
> >>>>>>> Matt
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Fri, Oct 28, 2005 at 10:28:08PM +0800, Dong, Eddie wrote:
> >>>>>>>>>  Page flipping should work just fine
> >>>>>>>>> in the current design; Matt had it almost working (out of
> >>>>>>>>> tree) before he went back to school.
> >>>>>>>>> 
> >>>>>>>> Matt:
> >>>>>>>>      Dan mentioned that you had VNIF work almost 
> done without PMT
> >>>>>>>> table support for dom0, Can you share the idea with us?
> >>>>>>>>      Usually VNIF swap page between dom0 and domU so 
> that network
> >>>>>>>> package copy (between dom0 native driver and domU  frontend
> >>>>>>>> driver) can be avoided and thus achieve high 
> performance. With
> >>>>>>>> this swap, we can no longer assume dom0 gpn=mfn. So what did
> >>>>>>>>      you ever propose to port VNIF without PMT 
> table? Thanks a
> >>>>>>>> lot, eddie 
> >>>>>>> 
> >>>>>>> _______________________________________________
> >>>>>>> Xen-ia64-devel mailing list
> >>>>>>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>>> http://lists.xensource.com/xen-ia64-devel
> >>> 
> >>> _______________________________________________
> >>> Xen-ia64-devel mailing list
> >>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.xensource.com/xen-ia64-devel
> 
> 
> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-ia64-devel
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.