[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v1 1/1] x86/domctl: add gva_to_gfn command





On Tue, Mar 21, 2023 at 3:49 AM Ковалёв Сергей <valor@xxxxxxx> wrote:
>
>
>
> 21.03.2023 2:34, Tamas K Lengyel пишет:
> >
> >
> > On Mon, Mar 20, 2023 at 3:23 PM Ковалёв Сергей <valor@xxxxxxx
> > <mailto:valor@xxxxxxx>> wrote:
> >  >
> >  >
> >  >
> >  > 21.03.2023 1:51, Tamas K Lengyel wrote:
> >  > >
> >  > >
> >  > > On Mon, Mar 20, 2023 at 12:32 PM Ковалёв Сергей <valor@xxxxxxx
> > <mailto:valor@xxxxxxx>
> >  > > <mailto:valor@xxxxxxx <mailto:valor@xxxxxxx>>> wrote:
> >  > >  >
> >  > >  > gva_to_gfn command used for fast address translation in LibVMI
> > project.
> >  > >  > With such a command it is possible to perform address translation in
> >  > >  > single call instead of series of queries to get every page table.
> >  > >
> >  > > You have a couple assumptions here:
> >  > >   - Xen will always have a direct map of the entire guest memory -
> > there
> >  > > are already plans to move away from that. Without that this approach
> >  > > won't have any advantage over doing the same mapping by LibVMI
> >  >
> >  > Thanks! I didn't know about the plan. Though I use this patch
> >  > back ported into 4.16.
> >  >
> >  > >   - LibVMI has to map every page for each page table for every lookup -
> >  > > you have to do that only for the first, afterwards the pages on which
> >  > > the pagetable is are kept in a cache and subsequent lookups would be
> >  > > actually faster then having to do this domctl since you can keep being
> >  > > in the same process instead of having to jump to Xen.
> >  >
> >  > Yes. I know about the page cache. But I have faced with several issues
> >  > with cache like this one https://github.com/libvmi/libvmi/pull/1058
> > <https://github.com/libvmi/libvmi/pull/1058> .
> >  > So I had to disable the cache.
> >
> > The issue you linked to is an issue with a stale v2p cache, which is a
> > virtual TLB. The cache I talked about is the page cache, which is just
> > maintaining a list of the pages that were accessed by LibVMI for future
> > accesses. You can have one and not the other (ie. ./configure
> > --disable-address-cache --enable-page-cache).
> >
> > Tamas
>
> Thanks. I know about the page cache. Though I'm not familiar with
> it close enough.
>
> As far as I understand at the time the page cache implementation in
> LibVMI looks like this:
> 1. Call sequence: vmi_read > vmi_read_page > driver_read_page >
>     xen_read_page > memory_cache_insert ..> get_memory_data >
>     xen_get_memory > xen_get_memory_pfn > xc_map_foreign_range
> 2. This is perfectly valid while guest OS keeps page there. And
>     physical pages are always there.
> 3. To renew cache the "age_limit" counter is used.
> 4. In Xen driver implementation in LibVMI the "age_limit" is
>     disabled.
> 5. Also it is possible to invalidate cache with "xen_write" or
>     "vmi_pagecache_flush". But it is not used.
> 6. Other way to avoid too big cache is cache size limit. So on
>     every insert half of the cache is dropped on size overflow.
>
> So the only thing we should know is valid mapping of guest
> virtual address to guest physical address.
>
> And the slow paths are:
> 1. A first traversal of new page table set. E.g. for the new process.
> 2. Or new subset of page tables for known process.
> 3. Subsequent page access after cache clean on size overflow.
>
> Am I right?
>
> The main idea behind the patch:
> 1. For the very first time it would be done faster with hypercall.
> 2. For subsequent calls v2p translation cache could be used (used in
>     my current work in LibVMI).
> 3. To avoid errors with stale cache v2p cache could be invalidated
>     on every event (VMI_FLUSH_RATE = 1).

If you set a flush rate to 1 then you would effectively run without any caching between events. It will still be costly. Yes, you save some performance on having to map the pages because Xen already has them but overall this is still a subpar solution.

IMHO you are not addressing the real issue, which is just the lack of hooks into the OS that would tell you when the v2p cache actually needs to be invalidated. The OS does TLB maintenance already when it updates its pagetables. If you wrote logic to hook into that, you wouldn't have to disable the caches or run with flush rate = 1. On the DRAKVUF side this has been a TODO for a long time https://github.com/tklengyel/drakvuf/blob/df2d274dfe349bbdacdb121229707f6c91449b38/src/libdrakvuf/private.h#L140. If you had those hooks into the TLB maintenance logic you could just use the existing page-cache and be done with it. Yes, the very first access may still be slower than the hypercall but I doubt it would be noticeable in the long run.

Tamas

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.