[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: Interdomain comms


  • To: Eric Van Hensbergen <ericvh@xxxxxxxxx>
  • From: Andrew Warfield <andrew.warfield@xxxxxxxxx>
  • Date: Sun, 8 May 2005 09:19:06 +0100
  • Cc: Eric Van Hensbergen <ericvh@xxxxxxxxxxxxxxxxxxxxx>, Mike Wray <mike.wray@xxxxxx>, Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, "Ronald G. Minnich" <rminnich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Sun, 08 May 2005 08:18:41 +0000
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=pJM3F+eyRYuEtJI+1aiwVIm+nIttK21n88kwKVCaGBIg8bsb2vCo/tgIaUQEe5LeFK205sOJs4dUqBL66bCpdUfWTDC3J0jukfLyPsco0pFGw0NmaSJL8up9uVVoBEyr+qefow/x+OMcclgzdhHHxP/8FKGUsJ9KC2hUmm3G9oU=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Eric,

   Your thoughts on 9P are all really interesting -- I'd come across
the protocol years ago in looking into approaches to remote device/fs
access but had a hard time finding details.  It's quite interesting to
hear a bit more about the approach taken.

   Having a more accessible inter-domain comms API is clearly a good
thing, and extending device channels (in our terminology -- shared
memory + event notification) to work across a cluster is something
that we've talked about on several occasions at the lab.

   I do think though, that as mentioned above there are some concerns
with the VMM environment that make this a little trickier.  For the
general case of inefficient comms between VMs, using the regular IP
stack may be okay for many people.  The net drivers are being fixed up
to special-case local communications.

   For the more specific cases of FE/BE comms, I think the devil may
be in the details more than the current discussion is alluding to. 
Specifically:

> c) As long as the buffers in question (both *buf and the buffer cache
> entry) were page-aligned, etc. -- we could play clever VM games
> marking the page as shared RO between the two partitions and alias the
> virtual memory pointed to by *buf to the shared page.  This is very
> sketchy and high level and I need to delve into all sorts of details
> -- but the idea would be to use virtual memory as your friend for
> these sort of shared read-only buffer caches.  It would also require
> careful allocation of buffers of the right size on the right alignment
> -- but driver writers are used to that sort of thing.

   Most of the good performance that Xen gets off of block and net
split devices are specifically because of these clever VM games. 
Block FEs pass page references down to be mapped directly for DMA. 
Net devices pass pages into a free pool, and actually exchange
physical pages under the feet of the VM as inbound packets are
demultiplexed.  The grant tables that have recently been added provide
separate mechanisms for the mapping and ownership transfer of pages
across domains.  In addition to these tricks, we make careful use of
timing event notification in order to batch messages.

   In the case of the buffer cache that has come up several times in
the thread, a cache across domains would potentially neet to pass read
only page mappings as CoW in many situations, and a fault handler
somewhere would need to bring in a new page to the guest on a write. 
There are also a pile of complicating cases with regards cache
eviction from a BE domain, migration, and so on that make the
accounting really tricky.  I think it would be quite good to have a
discussion of generalized interdomain comms address the current
drivers, as well as a hypothetical buffer cache as potential cases. 
Does 9P already have hooks that would allow you to handle this sort of
per-application special case?

   Additionally, I think we get away with a lot in the current drivers
from a falure model that excludes transport.  The FE or BE can crash,
and the two drivers can be written defensively to handle that.  How
does 9P handle the strangenesses of real distribution?

Anyhow, very interesting discussion... looking forward to your thoughts.

a.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.