[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings



On Tue, 11 Jul 2017, Alexey G wrote:
> Under certain circumstances normal xen-mapcache functioning may be broken
> by guest's actions. This may lead to either QEMU performing exit() due to
> a caught bad pointer (and with QEMU process gone the guest domain simply
> appears hung afterwards) or actual use of the incorrect pointer inside
> QEMU address space -- a write to unmapped memory is possible. The bug is
> hard to reproduce on a i440 machine as multiple DMA sources are required
> (though it's possible in theory, using multiple emulated devices), but can
> be reproduced somewhat easily on a Q35 machine using an emulated AHCI
> controller -- each NCQ queue command slot may be used as an independent
> DMA source ex. using READ FPDMA QUEUED command, so a single storage
> device on the AHCI controller port will be enough to produce multiple DMAs
> (up to 32). The detailed description of the issue follows.
> 
> Xen-mapcache provides an ability to map parts of a guest memory into
> QEMU's own address space to work with.
> 
> There are two types of cache lookups:
>  - translating a guest physical address into a pointer in QEMU's address
>    space, mapping a part of guest domain memory if necessary (while trying
>    to reduce a number of such (re)mappings to a minimum)
>  - translating a QEMU's pointer back to its physical address in guest RAM
> 
> These lookups are managed via two linked-lists of structures.
> MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
> reverse lookups.
> 
> Every guest physical address is broken down into 2 parts:
>     address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
>     address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
> 
> MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
> a 64-bit system (which assumed for the further description). Basically,
> this means that we deal with 1 MB chunks and offsets within those 1 MB
> chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
> etc. Most DMA transfers typically are less than 1MB, however, if the
> transfer crosses any 1MB border(s) - than a nearest larger mapping size
> will be used, so ex. a 512-byte DMA transfer with the start address
> 700FFF80h will actually require a 2MB range.
> 
> Current implementation assumes that MapCacheEntries are unique for a given
> address_index and size pair and that a single MapCacheEntry may be reused
> by multiple requests -- in this case the 'lock' field will be larger than
> 1. On other hand, each requested guest physical address (with 'lock' flag)
> is described by each own MapCacheRev. So there may be multiple MapCacheRev
> entries corresponding to a single MapCacheEntry. The xen-mapcache code
> uses MapCacheRev entries to retrieve the address_index & size pair which
> in turn used to find a related MapCacheEntry. The 'lock' field within
> a MapCacheEntry structure is actually a reference counter which shows
> a number of corresponding MapCacheRev entries.
> 
> The bug lies in ability for the guest to indirectly manipulate with the
> xen-mapcache MapCacheEntries list via a special sequence of DMA
> operations, typically for storage devices. In order to trigger the bug,
> guest needs to issue DMA operations in specific order and timing.
> Although xen-mapcache is protected by the mutex lock -- this doesn't help
> in this case, as the bug is not due to a race condition.
> 
> Suppose we have 3 DMA transfers, namely A, B and C, where
> - transfer A crosses 1MB border and thus uses a 2MB mapping
> - transfers B and C are normal transfers within 1MB range
> - and all 3 transfers belong to the same address_index
> 
> In this case, if all these transfers are to be executed one-by-one
> (without overlaps), no special treatment necessary -- each transfer's
> mapping lock will be set and then cleared on unmap before starting
> the next transfer.
> The situation changes when DMA transfers overlap in time, ex. like this:
> 
>   |===== transfer A (2MB) =====|
> 
>               |===== transfer B (1MB) =====|
> 
>                           |===== transfer C (1MB) =====|
>  time --->
> 
> In this situation the following sequence of actions happens:
> 
> 1. transfer A creates a mapping to 2MB area (lock=1)
> 2. transfer B (1MB) tries to find available mapping but cannot find one
>    because transfer A is still in progress, and it has 2MB size + non-zero
>    lock. So transfer B creates another mapping -- same address_index,
>    but 1MB size.
> 3. transfer A completes, making 1st mapping entry available by setting its
>    lock to 0
> 4. transfer C starts and tries to find available mapping entry and sees
>    that 1st entry has lock=0, so it uses this entry but remaps the mapping
>    to a 1MB size
> 5. transfer B completes and by this time
>   - there are two locked entries in the MapCacheEntry list with the SAME
>     values for both address_index and size
>   - the entry for transfer B actually resides farther in list while
>     transfer C's entry is first
> 6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
>    and size pair from corresponding MapCacheRev entry, but then it starts
>    looking for MapCacheEntry with these values and finds the first entry
>    -- which belongs to transfer C.
> 
> At this point there may be following possible (bad) consequences:
> 
> 1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
>    in this statement:
> 
>    raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
>        ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
> 
> resulting in an incorrent raddr value returned from the function. The
> (ptr - entry->vaddr_base) expression may produce both positive and negative
> numbers and its actual value may differ greatly as there are many
> map/unmap operations take place. If the value will be beyond guest RAM
> limits then a "Bad RAM offset" error will be triggered and logged,
> followed by exit() in QEMU.
> 
> 2. If raddr value won't exceed guest RAM boundaries, the same sequence
> of actions will be performed for xen_invalidate_map_cache_entry() on DMA
> unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
> operation which uses it is still active. The above example must
> be extended by one more DMA transfer in order to allow unmapping as the
> first mapping in the list is sort of resident.

Thanks for the well written description of the problem!


> The patch modifies the behavior in which MapCacheEntry's are added to the
> list, avoiding duplicates.

I take that the idea is to always go through the whole list to check for
duplicate locked entries, right?


> Signed-off-by: Alexey Gerasimenko <x1917x@xxxxxxxxx>
> ---
>  hw/i386/xen/xen-mapcache.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index e60156c..84f25ef 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -206,6 +206,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
> hwaddr size,
>                                         uint8_t lock, bool dma)
>  {
>      MapCacheEntry *entry, *pentry = NULL;
> +    MapCacheEntry *avl_entry = NULL, *avl_entry_prev = NULL;
>      hwaddr address_index;
>      hwaddr address_offset;
>      hwaddr cache_size = size;
> @@ -251,14 +252,36 @@ tryagain:
>  
>      entry = &mapcache->entry[address_index % mapcache->nr_buckets];
>  
> -    while (entry && entry->lock && entry->vaddr_base &&
> -            (entry->paddr_index != address_index || entry->size != 
> cache_size ||
> -             !test_bits(address_offset >> XC_PAGE_SHIFT,
> -                 test_bit_size >> XC_PAGE_SHIFT,
> -                 entry->valid_mapping))) {
> +    /* find a remappable entry. An existing locked entry which can be reused
> +     * has a priority over all other entries (with lock=0, etc).
> +     * Normally there will be just few entries for a given address_index
> +     * bucket, typically 1-2 entries only
> +     */
> +    while (entry) {
> +        if (entry->lock &&
> +            entry->paddr_index == address_index &&
> +            entry->size == cache_size &&
> +            test_bits(address_offset >> XC_PAGE_SHIFT,
> +                 test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) {
> +            break;
> +        }
> +        else if (!entry->lock || !entry->vaddr_base) {
> +            avl_entry = entry;
> +            avl_entry_prev = pentry;
> +        }
> +
>          pentry = entry;
>          entry = entry->next;
>      }
> +
> +    /* if the reuseable entry was not found, use any available.
> +     * Otherwise, a new entry will be created
> +     */
> +    if (avl_entry && !entry) {
> +        pentry = avl_entry_prev;
> +        entry = avl_entry;
> +    }

Yes, I think this would work, but we should make sure to scan the whole
list only when lock ==  true. Something like the following:

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 2a1fbd1..f964143 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -234,7 +234,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
                                        uint8_t lock, bool dma)
 {
-    MapCacheEntry *entry, *pentry = NULL;
+    MapCacheEntry *entry, *pentry = NULL,
+                  *free_entry = NULL, *free_pentry = NULL;
     hwaddr address_index;
     hwaddr address_offset;
     hwaddr cache_size = size;
@@ -281,14 +282,22 @@ tryagain:
 
     entry = &mapcache->entry[address_index % mapcache->nr_buckets];
 
-    while (entry && entry->lock && entry->vaddr_base &&
+    while (entry && (lock || entry->lock) && entry->vaddr_base &&
             (entry->paddr_index != address_index || entry->size != cache_size 
||
              !test_bits(address_offset >> XC_PAGE_SHIFT,
                  test_bit_size >> XC_PAGE_SHIFT,
                  entry->valid_mapping))) {
+        if (!free_entry && !entry->lock) {
+            free_entry = entry;
+            free_pentry = pentry;
+        }
         pentry = entry;
         entry = entry->next;
     }
+    if (!entry && free_entry) {
+        entry = free_entry;
+        pentry = free_pentry;
+    }
     if (!entry) {
         entry = g_malloc0(sizeof (MapCacheEntry));
         pentry->next = entry;


Would this work?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.