Xen project Mailing List

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

To: Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: "Durrant, Paul" <pdurrant@xxxxxxxxxxxx>

Date: Wed, 26 Feb 2020 12:03:23 +0000

Accept-language: en-GB, en-US

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Wed, 26 Feb 2020 12:03:31 +0000

Ironport-sdr: ilEAcqJaDkljnM+1VCTkdtHj66unGTcx3kfxhhOknuArLQqbXBeEI6GH3UJcQDp4VP3/oCGnxK dHI3lCpPXowA==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHV68GGHAJvfVndcEiEcAXBc0AZ06gtWYKAgAAFnRA=

Thread-topic: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

> -----Original Message----- > From: Julien Grall <julien@xxxxxxx> > Sent: 26 February 2020 11:33 > To: Durrant, Paul <pdurrant@xxxxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; Volodymyr Babchuk > <Volodymyr_Babchuk@xxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; > George Dunlap <george.dunlap@xxxxxxxxxx>; Ian Jackson > <ian.jackson@xxxxxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Konrad > Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx> > Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for > shared_info > > Hi Paul, > > On 25/02/2020 09:53, Paul Durrant wrote: > > There's no particular reason shared_info need use a xenheap page. > > AFAICT, a side-effect of this change is the shared_info is now going to > be part of the migration stream. One issue with this is on the restore > side, they will be accounted in number of pages allocated to the domain. > > I am fairly certain dirty logging on page with PGC_extra set would not > work properly as well. True, those two do need fixing before expanding the use of PGC_extra. I guess this may already be a problem with the current VMX use case. > > As the pages will be recreated in the restore side, I would suggest to > skip them in XEN_DOMCTL_getpageframeinfo3. > At some point in future I guess it may be useful to migrate PGC_extra pages, but they would need to be marked as such in the stream. Skipping sounds like the right approach for now. > > It's > > only purpose is to be mapped by the guest so use a PGC_extra domheap > page > > instead. This does entail freeing shared_info during > > domain_relinquish_resources() rather than domain_destroy() so care is > > needed to avoid de-referencing a NULL shared_info pointer hence some > > extra checks of 'is_dying' are needed. > > ASSERTions are added before apparently vulnerable dereferences in the > > event channel code. These should not be hit because domain_kill() calls > > evtchn_destroy() before calling domain_relinquish_resources() but are > > added to catch any future re-ordering. > > IHMO, the ASSERTions in the event channel pending/mask/... helpers will > not protect against re-ordering. You may never hit them even if there is > a re-ordering. It would be better to add an ASSERT()/BUG_ON() in > evtchn_destroy() and possibly a comment in domain_kill() to mention > ordering. I'm not sure about that. Why would they not be hit? > > > For Arm, the call to free_shared_info() in arch_domain_destroy() is left > > in place since it called in the error path for arch_domain_create(). > > > > NOTE: A modification in p2m_alloc_table() is required to avoid a false > > positive when checking for domain memory. > > A fix is also needed in dom0_construct_pv() to avoid > automatically > > adding PGC_extra pages to the physmap. > > I am not entirely sure how this is related to this patch. Was it a > latent bug? If so, I think it would make sense to split it from this > patch. > It's related because the check relies on the fact that no PGC_extra pages are created before it is called. This is indeed true until this patch is applied. > > > > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx> > > --- > > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx> > > Cc: Julien Grall <julien@xxxxxxx> > > Cc: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Cc: George Dunlap <george.dunlap@xxxxxxxxxx> > > Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > > Cc: Jan Beulich <jbeulich@xxxxxxxx> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > Cc: Wei Liu <wl@xxxxxxx> > > --- > > xen/arch/arm/domain.c | 2 ++ > > xen/arch/x86/domain.c | 3 ++- > > xen/arch/x86/mm/p2m.c | 3 +-- > > xen/arch/x86/pv/dom0_build.c | 4 ++++ > > xen/common/domain.c | 25 +++++++++++++++++++------ > > xen/common/event_2l.c | 4 ++++ > > xen/common/event_fifo.c | 1 + > > xen/common/time.c | 3 +++ > > 8 files changed, 36 insertions(+), 9 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index 2cbcdaac08..3904519256 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d) > > BUG(); > > } > > > > + free_shared_info(d); > > + > > return 0; > > } > > > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > > index eb7b0fc51c..3ad532eccf 100644 > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d) > > pv_domain_destroy(d); > > free_perdomain_mappings(d); > > > > - free_shared_info(d); > > cleanup_domain_irq_mapping(d); > > > > psr_domain_free(d); > > @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d) > > if ( is_hvm_domain(d) ) > > hvm_domain_relinquish_resources(d); > > > > + free_shared_info(d); > > + > > return 0; > > } > > > > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > > index c5f428d67c..787d97d85e 100644 > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m) > > > > p2m_lock(p2m); > > > > - if ( p2m_is_hostp2m(p2m) > > - && !page_list_empty(&d->page_list) ) > > + if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) ) > > { > > P2M_ERROR("dom %d already has memory allocated\n", d- > >domain_id); > > p2m_unlock(p2m); > > diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c > > index dc16ef2e79..f8f1bbe2f4 100644 > > --- a/xen/arch/x86/pv/dom0_build.c > > +++ b/xen/arch/x86/pv/dom0_build.c > > @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d, > > { > > mfn = mfn_x(page_to_mfn(page)); > > BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn))); > > + > > + if ( page->count_info & PGC_extra ) > > + continue; > > I would add a comment explaining why we skip page with PGC_extra set. > > > + > > if ( get_gpfn_from_mfn(mfn) >= count ) > > { > > BUG_ON(is_pv_32bit_domain(d)); > > diff --git a/xen/common/domain.c b/xen/common/domain.c > > index ba7a905258..1d42fbcc0f 100644 > > --- a/xen/common/domain.c > > +++ b/xen/common/domain.c > > @@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v) > > { > > struct domain *d = v->domain; > > > > - v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS) > > + v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS) > > ? (vcpu_info_t *)&shared_info(d, vcpu_info[v- > >vcpu_id]) > > - : &dummy_vcpu_info); > > + : &dummy_vcpu_info; > > Without holding domain_lock(), I don't think there is a guarantee that > is_dying (and therefore the shared_info) is not going to change behind > your back. So v->vcpu_info may point to garbagge. > > Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock. > True, it does need locking in some way. > As an aside, it would be good to explain in a comment why we are using > dummy_vcpu_info when the domain is dying. > Simply because it is guaranteed to exist, but I'll add a comment to that effect. > > v->vcpu_info_mfn = INVALID_MFN; > > } > > > > @@ -1650,24 +1650,37 @@ int continue_hypercall_on_cpu( > > > > int alloc_shared_info(struct domain *d, unsigned int memflags) > > { > > - if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == > NULL ) > > + struct page_info *pg; > > + > > + pg = alloc_domheap_page(d, MEMF_no_refcount | memflags); > > + if ( !pg ) > > return -ENOMEM; > > > > - d->shared_info.mfn = virt_to_mfn(d->shared_info.virt); > > + if ( !get_page_and_type(pg, d, PGT_writable_page) ) > > + return -ENODATA; > > I think the page will never be freed if this fails. > Good spot. > > + > > + d->shared_info.mfn = page_to_mfn(pg); > > + d->shared_info.virt = __map_domain_page_global(pg); > > > > clear_page(d->shared_info.virt); > > - share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, > SHARE_rw); > > > > return 0; > > } > > > > void free_shared_info(struct domain *d) > > { > > + struct page_info *pg; > > + > > if ( !d->shared_info.virt ) > > return; > > > > - free_xenheap_page(d->shared_info.virt); > > + unmap_domain_page_global(d->shared_info.virt); > > d->shared_info.virt = NULL; > > + > > + pg = mfn_to_page(d->shared_info.mfn); > > + > > + put_page_alloc_ref(pg); > > + put_page_and_type(pg); > > } > > > > /* > > diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c > > index e1dbb860f4..a72fe0232b 100644 > > --- a/xen/common/event_2l.c > > +++ b/xen/common/event_2l.c > > @@ -27,6 +27,7 @@ static void evtchn_2l_set_pending(struct vcpu *v, > struct evtchn *evtchn) > > * others may require explicit memory barriers. > > */ > > > > + ASSERT(d->shared_info.virt); > > if ( guest_test_and_set_bit(d, port, &shared_info(d, > evtchn_pending)) ) > > return; > > > > @@ -54,6 +55,7 @@ static void evtchn_2l_unmask(struct domain *d, struct > evtchn *evtchn) > > * These operations must happen in strict order. Based on > > * evtchn_2l_set_pending() above. > > */ > > + ASSERT(d->shared_info.virt); > > if ( guest_test_and_clear_bit(d, port, &shared_info(d, > evtchn_mask)) && > > guest_test_bit(d, port, &shared_info(d, evtchn_pending)) && > > !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d), > > @@ -67,6 +69,7 @@ static bool evtchn_2l_is_pending(const struct domain > *d, evtchn_port_t port) > > { > > unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * > BITS_PER_EVTCHN_WORD(d); > > > > + ASSERT(d->shared_info.virt); > > ASSERT(port < max_ports); > > return (port < max_ports && > > guest_test_bit(d, port, &shared_info(d, evtchn_pending))); > > @@ -76,6 +79,7 @@ static bool evtchn_2l_is_masked(const struct domain > *d, evtchn_port_t port) > > { > > unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * > BITS_PER_EVTCHN_WORD(d); > > > > + ASSERT(d->shared_info.virt); > > ASSERT(port < max_ports); > > return (port >= max_ports || > > guest_test_bit(d, port, &shared_info(d, evtchn_mask))); > > diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c > > index 230f440f14..e8c6045d72 100644 > > --- a/xen/common/event_fifo.c > > +++ b/xen/common/event_fifo.c > > @@ -497,6 +497,7 @@ static void setup_ports(struct domain *d) > > > > evtchn = evtchn_from_port(d, port); > > > > + ASSERT(d->shared_info.virt); > > if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) > ) > > evtchn->pending = 1; > > > > diff --git a/xen/common/time.c b/xen/common/time.c > > index 58fa9abc40..938226c7b1 100644 > > --- a/xen/common/time.c > > +++ b/xen/common/time.c > > @@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d) > > uint32_t *wc_version; > > uint64_t sec; > > > > + if ( d->is_dying ) > > + return; > > This is another instance where I think the use of d->is_dying is not > safe. I looked at how other places in Xen dealt with d->is_dying. > > We don't seem to have a common way to deal with it: > 1) It may be checked under domain_lock() -> No issue with that > 2) It may be checked under d->page_alloc_lock (e.g assign_pages()). > The assign_pages() case is fine because it will act as a full barrier. > So if we call happen after relinquish_memory() then we will surely have > observed d->is_dying. I haven't checked the others. > > Some of the instances user neither the 2 locks above. We probably ought > to investigate them (I will add a TODO in my list). > > Regarding the two cases here, domain_lock() might be the solution. If we > are concern about the contention (it is a spinlock), then we could > switch the domain_lock() from spinlock to rwlock. > I'll see if there is any other suitable lock guaranteed to be taken during domain_kill() but it may indeed have to be domain_lock(). Paul > Cheers, > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.