[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a transient memory pool



On Fri, 7 Jul 2023 10:29:00 +0100
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Jul 06, 2023 at 02:22:50PM +0000, Michael Kelley (LINUX) wrote:
> > From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, July 
> > 6, 2023 1:07 AM  
> > > 
> > > On Thu, Jul 06, 2023 at 03:50:55AM +0000, Michael Kelley (LINUX) wrote:  
> > > > From: Petr Tesarik <petrtesarik@xxxxxxxxxxxxxxx> Sent: Tuesday, June 
> > > > 27, 2023  
> > > 2:54 AM  
> > > > >
> > > > > Try to allocate a transient memory pool if no suitable slots can be 
> > > > > found,
> > > > > except when allocating from a restricted pool. The transient pool is 
> > > > > just
> > > > > enough big for this one bounce buffer. It is inserted into a 
> > > > > per-device
> > > > > list of transient memory pools, and it is freed again when the bounce
> > > > > buffer is unmapped.
> > > > >
> > > > > Transient memory pools are kept in an RCU list. A memory barrier is
> > > > > required after adding a new entry, because any address within a 
> > > > > transient
> > > > > buffer must be immediately recognized as belonging to the SWIOTLB, 
> > > > > even if
> > > > > it is passed to another CPU.
> > > > >
> > > > > Deletion does not require any synchronization beyond RCU ordering
> > > > > guarantees. After a buffer is unmapped, its physical addresses may no
> > > > > longer be passed to the DMA API, so the memory range of the 
> > > > > corresponding
> > > > > stale entry in the RCU list never matches. If the memory range gets
> > > > > allocated again, then it happens only after a RCU quiescent state.
> > > > >
> > > > > Since bounce buffers can now be allocated from different pools, add a
> > > > > parameter to swiotlb_alloc_pool() to let the caller know which memory 
> > > > > pool
> > > > > is used. Add swiotlb_find_pool() to find the memory pool 
> > > > > corresponding to
> > > > > an address. This function is now also used by is_swiotlb_buffer(), 
> > > > > because
> > > > > a simple boundary check is no longer sufficient.
> > > > >
> > > > > The logic in swiotlb_alloc_tlb() is taken from 
> > > > > __dma_direct_alloc_pages(),
> > > > > simplified and enhanced to use coherent memory pools if needed.
> > > > >
> > > > > Note that this is not the most efficient way to provide a bounce 
> > > > > buffer,
> > > > > but when a DMA buffer can't be mapped, something may (and will) 
> > > > > actually
> > > > > break. At that point it is better to make an allocation, even if it 
> > > > > may be
> > > > > an expensive operation.  
> > > >
> > > > I continue to think about swiotlb memory management from the standpoint
> > > > of CoCo VMs that may be quite large with high network and storage loads.
> > > > These VMs are often running mission-critical workloads that can't 
> > > > tolerate
> > > > a bounce buffer allocation failure.  To prevent such failures, the 
> > > > swiotlb
> > > > memory size must be overly large, which wastes memory.  
> > > 
> > > If "mission critical workloads" are in a vm that allowes overcommit and
> > > no control over other vms in that same system, then you have worse
> > > problems, sorry.
> > > 
> > > Just don't do that.
> > >   
> > 
> > No, the cases I'm concerned about don't involve memory overcommit.
> > 
> > CoCo VMs must use swiotlb bounce buffers to do DMA I/O.  Current swiotlb
> > code in the Linux guest allocates a configurable, but fixed, amount of guest
> > memory at boot time for this purpose.  But it's hard to know how much
> > swiotlb bounce buffer memory will be needed to handle peak I/O loads.
> > This patch set does dynamic allocation of swiotlb bounce buffer memory,
> > which can help avoid needing to configure an overly large fixed size at 
> > boot.  
> 
> But, as you point out, memory allocation can fail at runtime, so how can
> you "guarantee" that this will work properly anymore if you are going to
> make it dynamic?

In general, there is no guarantee, of course, because bounce buffers
may be requested from interrupt context. I believe Michael is looking
for the SWIOTLB_MAY_SLEEP flag that was introduced in my v2 series, so
new pools can be allocated with GFP_KERNEL instead of GFP_NOWAIT if
possible, and then there is no need to dip into the coherent pool.

Well, I have deliberately removed all complexities from my v3 series,
but I have more WIP local topic branches in my local repo:

- allow blocking allocations if possible
- allocate a new pool before existing pools are full
- free unused memory pools

I can make a bigger series, or I can send another series as RFC if this
is desired. ATM I don't feel confident enough that my v3 series will be
accepted without major changes, so I haven't invested time into
finalizing the other topic branches.

@Michael: If you know that my plan is to introduce blocking allocations
with a follow-up patch series, is the present approach acceptable?

Petr T



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.