[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



>>> On 29.09.15 at 04:53, <quan.xu@xxxxxxxxx> wrote:
>>>> Monday, September 28, 2015 2:47 PM,<JBeulich@xxxxxxxx> wrote:
>> >>> On 28.09.15 at 05:08, <quan.xu@xxxxxxxxx> wrote:
>> >>>> Thursday, September 24, 2015 12:27 AM, Tim Deegan wrote:
> 
>> It would be a guest kernel bug, but all _we_ care about is that such a guest 
> kernel
>> bug won't affect the hypervisor or other guests.
> 
> It won't affect the hypervisor or other guest domains.
> As the required Device-TLB flushes are not applied, the hypercall is not 
> completed. The being freed page is still owned by this buggy
> Guest, not released back to xen or reallocated for other guests.

Seems like you misunderstood the purpose of my reply: I wasn't
claiming that what you patch set currently does would constitute an
issue. I was simply stating a general rule to consider when thinking
about which solutions are viable and which aren't.

> For Tim's suggestion --"to make the IOMMU table take typed refcounts to
> anything it points to, and only drop those refcounts when the flush 
> completes."
> 
> From IOMMU point of view, if it can walk through IOMMU table to get these 
> pages and take typed refcounts. 
> These pages are maybe owned by hardware_domain, dummy, HVM guest .etc. could 
> I narrow it down to HVM guest? --- It is not for anything it points to, but 
> just 
> for HVM guest related. this will simplify the design.

I don't follow. Why would you want to walk page tables? And why
would a HVM guest have pages other than those owned by itself or
granted access to by another guest mapped in its IOMMU page
tables? In any event - the ref-counting would need to happen as
you _create_ the mappings, not at some later point.

> from HVM guest point of view, once the ATS device is assigned, we can: 
> *pause the HVM guest domain.
> *scan domain's xenpage_list, page_list and arch.relmem_list to get these 
> pages, which will be took typed refcounts ( PGT_dev_tlb_page -- a new type).
> *unpause the HVM guest domain.
> 
> (we can ignore domain's xenpage_list) as:
> ((
>    Actually, the previous pages are maybe mapped from Xen heap for guest 
> domains in decrease_reservation() / xenmem_add_to_physmap_one()
>    / p2m_add_foreign(), but they are not mapped to IOMMU table. The below 4 
> functions will map xen heap page for guest domains:
>           * share page for xen Oprofile.
>           * vLAPIC mapping.
>           * grant table shared page.
>           * domain share_info page.
> ))

Neither of which really has a need to be in the IOMMU page tables
afaics.

>  Just for check, do typed refcounts refer to the following?
> 
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -183,6 +183,7 @@ struct page_info
>  #define PGT_seg_desc_page PG_mask(5, 4)  /* using this page in a GDT/LDT?  */
>  #define PGT_writable_page PG_mask(7, 4)  /* has writable mappings?         */
>  #define PGT_shared_page   PG_mask(8, 4)  /* CoW sharable page              */
> +#define PGT_dev_tlb_page  PG_mask(9, 4)  /* Maybe in Device-TLB mapping?   */
>  #define PGT_type_mask     PG_mask(15, 4) /* Bits 28-31 or 60-63.           */
> 
> * I define a new typed refcounts PGT_dev_tlb_page.

Why? I.e. why won't a base ref for r/o pages and a writable type-ref
for r/w ones suffice, just like we do everywhere else?

>> Once you do that, I
>> don't think there'll be a reason to pause the guest for the duration of the 
> flush.
>> And really (as pointed out before) pausing the guest would get us _far_ away
>> from how real hardware behaves.
>> 
> 
> Once I do that, I think the guest should be still paused, if the Device-TLB 
> flush is not completed.
> 
> As mentioned in previous email, for example:
> Call do_memory_op HYPERCALL to free a pageX (gfn1 <---> mfn1). The gfn1 is 
> the 
> freed portion of GPA.
> assume that there is a mapping(gfn1<---> mfn1) in Device-TLB. If the 
> Device-TLB 
> flush is not completed and return to guest mode,
> the guest may call do_memory_op HYPERCALL to allocate a new pageY(mfn2) to 
> gfn1..
> then:
> the EPT mapping is (gfn1--mfn2), the Device-TLB mapping is (gfn1<--->mfn1) .
> 
> If the Device-TLB flush is not completed, DMA associated with gfn1 may still 
> write some data with pageX(gfn1 <---> mfn1), but pageX will be 
> Released to xen when the Device-TLB flush is completed. It is maybe not 
> correct for guest to read data from gfn1 after DMA(now the page associated 
> with gfn1 is pageY ).
> 
> Right?

No. The extra ref taken will prevent the page from getting freed. And
as long as the flush is in process, DMA to/from the page is going to
produce undefined results (affecting only the guest). But note that
there may be reasons for an external to the guest entity invoking the
operation which ultimately led to the flush to do this on a paused guest
only. But that's not of concern to the hypervisor side implementation.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.