[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs.



* Konrad Rzeszutek Wilk (konrad.wilk@xxxxxxxxxx) wrote:
> > > +  * Is the DMA (Bus) address within our bounce buffer (start and end).
> > > +  */
> > > + int (*is_swiotlb_buffer)(struct swiotlb_engine *, dma_addr_t dev_addr,
> > > +                          phys_addr_t phys);
> > > +
> > 
> > Why is this implementation specific?
> 
> In the current implementation, they both use the physical address and
> do a simple check:
> 
>       return paddr >= virt_to_phys(io_tlb_start) &&
>               paddr < virt_to_phys(io_tlb_end);
> 
> That for virtualized environments where a PCI device is passed in would
> work too.
> 
> Unfortunately the problem is when we provide a channel of communication
> with another domain and we end up doing DMA on behalf of another guest.
> The short description of the problem is that a page of memory is shared
> with another domain and the mapping in our domain is correct (bus->physical)
> the other way (virt->physical->bus) is incorrect for the duration of this page
> being shared. Hence we need to verify that the page is local to our
> domain, and for that we need the bus address to verify that the
> addr ==  physical->bus(bus->physical(addr)) where addr is the bus
> address (dma_addr_t). If it is not local (shared with another domain)
> we MUST not consider it as a SWIOTLB buffer as that can lead to
> panics and possible corruptions. The trick here is that the phys->virt
> address can fall within the SWIOTLB buffer for pages that are
> shared with another domain and we need the DMA address to do an extra check.
> 
> The long description of the problem is:
> 
> You are the domain doing some DMA on behalf of another domain. The
> simple example is you are servicing a block device to the other guests.
> One way to implement this is to present a one page ring buffer where
> both domains move the producer and consumer indexes around. Once you get
> a request (READ/WRITE), you use the virtualized channels to "share" that page
> into your domain. For this you have a buffer (2MB or bigger) wherein for
> pages that shared in to you, you over-write the phys->bus mapping.
> That means that the phys->phys translation is altered for the duration
> of this request being out-standing. Once it is completed, the phys->bus
> translation is restored.
> 
> Here is a little diagram of what happens when a page is shared (and lets
> assume that we have a situation where virt #1 == virt #2, which means
> that phys #1 == phys #2).
> 
> (domain 2) virt#1->phys#1---\
>                              +- bus #1
> (domain 3) virt#2->phys#2 ---/
> 
> (phys#1 points to bus #1, and phys#2 points to bus #1 too).
> 
> The converse of the above picture is not true:
> 
>       /---> phys #1-> virt #1. (domain 2).
> bus#1 +
>       \---> phys #2-> virt #2. (domain 3).
> 
> phys #1 != phys #2 and hence virt #1 != virt #2.
> 
> When a page is not shared:
> 
> (domain 2) virt #1->phys #1--> bus #1
> (domain 3) virt #2->phys #2--> bus #2
> 
> bus #1 -> phys #1 -> virt #1 (domain 2)
> bus #2 -> phys #2 -> virt #2 (domain 3)
> 
> The bus #1 != bus #2, but phys #1 could be same as phys #2 (since
> there are just PFNs). And virt #1 == virt #2.
> 
> The reason for these is that each domain has its own memory layout where
> the memory starts at pfn 0, not at some higher number. So each domain
> sees the physical address identically, but the bus address MUST point
> to different areas (except when sharing) otherwise one domain would
> over-write another domain, ouch.
> 
> Furthermore when a domain is allocated, the pages for the domain are not
> guaranteed to be linearly contiguous so we can't guarantee that phys == bus.
> 
> So to guard against the situation in which phys #1 ->virt comes out with
> an address that looks to be within our SWIOTLB buffer we need to do the
> extra check:
> 
> addr ==  physical->bus(bus->physical(addr)) where addr is the bus
> address
> 
> And for scenarios where this is not true (page belongs to another
> domain), that page is not in the SWIOTLB (even thought the virtual and
> physical address point to it).
> 
> > > + /*
> > > +  * Is the DMA (Bus) address reachable by the PCI device?.
> > > +  */
> > > + bool (*dma_capable)(struct device *, dma_addr_t, phys_addr_t, size_t);
> 
> I mentioned in the previous explanation that when a domain is allocated,
> the pages are not guaranteed to be linearly contiguous.
> 
> For bare-metal that is not the case and 'dma_capable' just checks
> the device DMA mask against the bus address.
> 
> For virtualized environment we do need to check if the pages are linearly
> contiguous for the size request.
> 
> For that we need the physical address to iterate over them doing the
> phys->bus#1 translation and checking whether the  (phys+1)->bus#2
> bus#1 == bus#2 + 1.

Right, for both of those cases I was thinking you could make that the
base logic and the existing helpers to do addr translation would be
enough.  But that makes more sense when compiling for a specific arch
(i.e. the checks would be noops and compile away when !xen) as opposed to a
dynamic setup like this.

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.