[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device
>>> On 29.09.15 at 04:53, <quan.xu@xxxxxxxxx> wrote: >>>> Monday, September 28, 2015 2:47 PM,<JBeulich@xxxxxxxx> wrote: >> >>> On 28.09.15 at 05:08, <quan.xu@xxxxxxxxx> wrote: >> >>>> Thursday, September 24, 2015 12:27 AM, Tim Deegan wrote: > >> It would be a guest kernel bug, but all _we_ care about is that such a guest > kernel >> bug won't affect the hypervisor or other guests. > > It won't affect the hypervisor or other guest domains. > As the required Device-TLB flushes are not applied, the hypercall is not > completed. The being freed page is still owned by this buggy > Guest, not released back to xen or reallocated for other guests. Seems like you misunderstood the purpose of my reply: I wasn't claiming that what you patch set currently does would constitute an issue. I was simply stating a general rule to consider when thinking about which solutions are viable and which aren't. > For Tim's suggestion --"to make the IOMMU table take typed refcounts to > anything it points to, and only drop those refcounts when the flush > completes." > > From IOMMU point of view, if it can walk through IOMMU table to get these > pages and take typed refcounts. > These pages are maybe owned by hardware_domain, dummy, HVM guest .etc. could > I narrow it down to HVM guest? --- It is not for anything it points to, but > just > for HVM guest related. this will simplify the design. I don't follow. Why would you want to walk page tables? And why would a HVM guest have pages other than those owned by itself or granted access to by another guest mapped in its IOMMU page tables? In any event - the ref-counting would need to happen as you _create_ the mappings, not at some later point. > from HVM guest point of view, once the ATS device is assigned, we can: > *pause the HVM guest domain. > *scan domain's xenpage_list, page_list and arch.relmem_list to get these > pages, which will be took typed refcounts ( PGT_dev_tlb_page -- a new type). > *unpause the HVM guest domain. > > (we can ignore domain's xenpage_list) as: > (( > Actually, the previous pages are maybe mapped from Xen heap for guest > domains in decrease_reservation() / xenmem_add_to_physmap_one() > / p2m_add_foreign(), but they are not mapped to IOMMU table. The below 4 > functions will map xen heap page for guest domains: > * share page for xen Oprofile. > * vLAPIC mapping. > * grant table shared page. > * domain share_info page. > )) Neither of which really has a need to be in the IOMMU page tables afaics. > Just for check, do typed refcounts refer to the following? > > --- a/xen/include/asm-x86/mm.h > +++ b/xen/include/asm-x86/mm.h > @@ -183,6 +183,7 @@ struct page_info > #define PGT_seg_desc_page PG_mask(5, 4) /* using this page in a GDT/LDT? */ > #define PGT_writable_page PG_mask(7, 4) /* has writable mappings? */ > #define PGT_shared_page PG_mask(8, 4) /* CoW sharable page */ > +#define PGT_dev_tlb_page PG_mask(9, 4) /* Maybe in Device-TLB mapping? */ > #define PGT_type_mask PG_mask(15, 4) /* Bits 28-31 or 60-63. */ > > * I define a new typed refcounts PGT_dev_tlb_page. Why? I.e. why won't a base ref for r/o pages and a writable type-ref for r/w ones suffice, just like we do everywhere else? >> Once you do that, I >> don't think there'll be a reason to pause the guest for the duration of the > flush. >> And really (as pointed out before) pausing the guest would get us _far_ away >> from how real hardware behaves. >> > > Once I do that, I think the guest should be still paused, if the Device-TLB > flush is not completed. > > As mentioned in previous email, for example: > Call do_memory_op HYPERCALL to free a pageX (gfn1 <---> mfn1). The gfn1 is > the > freed portion of GPA. > assume that there is a mapping(gfn1<---> mfn1) in Device-TLB. If the > Device-TLB > flush is not completed and return to guest mode, > the guest may call do_memory_op HYPERCALL to allocate a new pageY(mfn2) to > gfn1.. > then: > the EPT mapping is (gfn1--mfn2), the Device-TLB mapping is (gfn1<--->mfn1) . > > If the Device-TLB flush is not completed, DMA associated with gfn1 may still > write some data with pageX(gfn1 <---> mfn1), but pageX will be > Released to xen when the Device-TLB flush is completed. It is maybe not > correct for guest to read data from gfn1 after DMA(now the page associated > with gfn1 is pageY ). > > Right? No. The extra ref taken will prevent the page from getting freed. And as long as the flush is in process, DMA to/from the page is going to produce undefined results (affecting only the guest). But note that there may be reasons for an external to the guest entity invoking the operation which ultimately led to the flush to do this on a paused guest only. But that's not of concern to the hypervisor side implementation. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |