|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access
>> +void p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access)
>> +{
>> +
>> + /* Restrict with access permissions */
>> + switch (access)
>> + {
>> + case p2m_access_r:
>> + *flags &= ~(_PAGE_RW);
>> + *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT);
>> + break;
>> + case p2m_access_rx:
>> + case p2m_access_rx2rw:
>> + *flags &= ~(_PAGE_NX_BIT|_PAGE_RW);
>> + *flags |= _PAGE_PRESENT;
>> + break;
>> + case p2m_access_rw:
>> + *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT);
>> + break;
>> + case p2m_access_rwx:
>> + default:
>> + *flags &= ~(_PAGE_NX_BIT);
>> + *flags |= (_PAGE_RW|_PAGE_PRESENT);
>> + break;
>> + }
>
>Tim will have the final say here, but I'd suggest dropping all these
>needless parentheses (I see only one case where they're really
>needed).
OK, I will drop all the needless ones. Could you please point out the case
they are needed? I couldn't find anything in the CODING_STYLE that dictates
what should be done.
>> +
>> + // Allow more restrictive guest flags to be propagated instead of access
>> + // permissions
>
>Coding style (there are more of these, which I'm not going to
>individually comment on).
Sorry, I let the C++ style comments slip in. I will fix all of them to C style
comments.
>> +static int
>> +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long
>gfn, mfn_t mfn,
>> + unsigned int page_order, p2m_type_t p2mt,
>> + p2m_access_t p2ma)
>> +{
>> + struct domain *d = p2m->domain;
>> + mfn_t *access_lookup_table = p2m->access_lookup_table;
>> + uint table_idx;
>> + uint page_idx;
>> + uint8_t *access_table_page;
>> +
>> + ASSERT(shadow_mode_mem_access(d) && access_lookup_table !=
>NULL);
>
>Better split into two ASSERT()s, so that if it triggers one would know
>which of the two conditions wasn't met?
You are right. I should have done that in the first place.
>> +
>> + /* For PV domains we only support rw, rx, rx2rw, rwx access permissions
>*/
>> + if ( unlikely(p2ma != p2m_access_r &&
>> + p2ma != p2m_access_rw &&
>> + p2ma != p2m_access_rx &&
>> + p2ma != p2m_access_rwx &&
>> + p2ma != p2m_access_rx2rw) )
>> + return -EINVAL;
>
>Perhaps better done as a switch() statement.
Sure. That will definitely be cleaner.
>> +
>> + if ( page_get_owner(mfn_to_page(mfn)) != d )
>> + return -ENOENT;
>> +
>> + gfn = get_gpfn_from_mfn(mfn_x(mfn));
>
>You get "gfn" passed in and blindly overwrite it here? _If_ you need to
>do this lookup, shouldn't you instead check it matches the passed in one?
The "gfn" and "mfn" passed in to the function should be the same as it is a PV
guest. That is why I am overwriting "gfn" to get the "gpfn" from the
machine_to_phys_mapping.
>> +
>> + /*
>> + * Values with the MSB set denote MFNs that aren't really part of the
>> + * domain's pseudo-physical memory map (e.g., the shared info frame).
>> + * Nothing to do here.
>> + */
>> + if ( unlikely(!VALID_M2P(gfn)) )
>> + return 0;
>> +
>> + if ( gfn > (d->tot_pages - 1) )
>> + return -EINVAL;
>
>Hardly - a PV domain can easily make its address space sparse, and
>iirc pv-ops Linux even does so by default to avoid PFN/MFN collisions
>on MMIO space. (And as a side note, this would better be "gfn >=
>d->tot_pages".)
I was using this as a guard against going out of bounds in the access lookup
table which is created based on the domains tot_pages. Please let me know what
I should be using instead of tot_pages when doing this.
>> + paging_lock(d);
>> +
>> + table_idx = MEM_ACCESS_TABLE_IDX(gfn);
>> + page_idx = MEM_ACCESS_PAGE_IDX(gfn);
>> + access_table_page =
>map_domain_page(mfn_x(access_lookup_table[table_idx]));
>> + access_table_page[page_idx] = p2ma;
>> + unmap_domain_page(access_table_page);
>> +
>> + if ( sh_remove_all_mappings(d->vcpu[0], mfn) )
>
>Is there anything guaranteeing d->vcpu and d->vcpu[0] being non-
>NULL?
I am not sure. I see shadow_track_dirty_vram() calling this without a check.
Maybe I should check and call it only if d->vcpu is not null.
>> +static mfn_t
>> +p2m_mem_access_get_entry(struct p2m_domain *p2m, unsigned long
>gfn,
>> + p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
>> + unsigned int *page_order)
>> +{
>> + struct domain *d = p2m->domain;
>> + mfn_t *access_lookup_table = p2m->access_lookup_table;
>> + uint table_idx;
>> + uint page_idx;
>> + uint8_t *access_table_page;
>> + mfn_t mfn = _mfn(gfn); // For PV guests mfn == gfn
>> +
>> + ASSERT(shadow_mode_mem_access(d) && access_lookup_table !=
>NULL);
>> +
>> + /* Not necessarily true, but for non-translated guests, we claim
>> + * it's the most generic kind of memory */
>
>I think you copied this comment verbatim from elsewhere, but I don't
>think it's correct as is.
Yes, indeed. I copied this from __get_gfn_type_access(). I will get rid of the
comment.
>> + *t = p2m_ram_rw;
>> +
>> + if ( page_get_owner(mfn_to_page(mfn)) != d )
>> + return _mfn(INVALID_MFN);
>> +
>> + gfn = get_gpfn_from_mfn(mfn_x(mfn));
>
>Same comment as earlier.
>
>> + if ( gfn > (d->tot_pages - 1) )
>
>Dito.
I have the same reasoning here as I did above.
>> + table_idx = MEM_ACCESS_TABLE_IDX(gfn);
>> + page_idx = MEM_ACCESS_PAGE_IDX(gfn);
>> +
>> + access_table_page =
>map_domain_page(mfn_x(access_lookup_table[table_idx]));
>> +
>> + /* This is a hint to take the default permissions */
>> + if ( access_table_page[page_idx] == p2m_access_n )
>> + access_table_page[page_idx] = p2m->default_access;
>
>We're in "get" here - why does that modify any global state?
I do not initialize the access lookup table to the default value. But the table
is zeroed as part of initialization when each page is allocated using
shadow_alloc_p2m_page(). I then use the "p2m_access_n" / 0 value as hint that
the default value should be returned as p2m_access_n is not valid for PV
domains. If you prefer that I initialize the table to the default value, I will
gladly do so.
>> +void p2m_mem_access_teardown(struct p2m_domain *p2m)
>> +{
>> + struct domain *d = p2m->domain;
>> + mfn_t *access_lookup_table = p2m->access_lookup_table;
>> + uint32_t nr_access_table_pages;
>> + uint32_t ctr;
>> +
>> + /* Reset the set_entry and get_entry function pointers */
>> + p2m_pt_init(p2m);
>> +
>> + if ( !access_lookup_table )
>> + return;
>> +
>> + nr_access_table_pages = get_domain_nr_access_table_pages(d);
>> +
>> + for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )
>
>No new effectively unbounded loops please.
OK, I will add a check for preemption and add hypercall continuation logic
here.
>> +int p2m_mem_access_init(struct p2m_domain *p2m)
>> +{
>> + struct domain *d = p2m->domain;
>> + mfn_t *access_lookup_table;
>> + uint32_t nr_access_table_pages;
>> + uint32_t ctr;
>> +
>> + nr_access_table_pages = get_domain_nr_access_table_pages(d);
>> + access_lookup_table = xzalloc_array(mfn_t, nr_access_table_pages);
>
>This surely is going to be an order > 0 allocation, and you even risk
>it being an order > MAX_ORDER one. The former is disallowed at
>runtime by convention/agreement, the latter is an outright bug.
Sorry, I was not aware of the convention. What should I be doing here instead?
>(And again just as a side note - you don't appear to need the zero
>initialization here.)
>
>> + if ( !access_lookup_table )
>> + return -ENOMEM;
>> +
>> + p2m->access_lookup_table = access_lookup_table;
>> +
>> + for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )
>
>Same comment regarding the effective unboundedness of the loop.
OK, I will add a check for preemption and add hypercall continuation logic here
too.
>> @@ -1414,6 +1419,8 @@ long p2m_set_mem_access(struct domain *d,
>unsigned long pfn, uint32_t nr,
>> if ( pfn == ~0ul )
>> {
>> p2m->default_access = a;
>> + if ( is_pv_domain(d) )
>> + return p2m_mem_access_set_default(p2m);
>
>Is it really correct to set p2m->default_access _before_ calling that
>function? Perhaps it wouldn't be correct doing it the other way
>around either - I suppose you need to hold the necessary lock across
>both operations.
I do not see the problem here. p2m_mem_access_set_default() just blows the
shadows away after taking the paging_lock. There is no actual setting of
default permission for all pages in the PV case.
>> @@ -585,9 +585,17 @@ int paging_domctl(struct domain *d,
>xen_domctl_shadow_op_t *sc,
>> {
>>
>> case XEN_DOMCTL_SHADOW_OP_ENABLE:
>> + /*
>> + * Shadow mem_access mode should only be enabled when
>mem_access is
>> + * enabled in XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE.
>> + */
>> + if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_MEM_ACCESS )
>> + return -EINVAL;
>> +
>> if ( !(sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY) )
>> break;
>> /* Else fall through... */
>> +
>> case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
>
>Stray blank line - the fall through here makes it preferable to keep the
>two case blocks un-separated.
OK, I will keep them un-seperated.
>> @@ -2443,7 +2443,8 @@ int sh_remove_all_mappings(struct vcpu *v,
>mfn_t gmfn)
>> if ( !(shadow_mode_external(v->domain)
>> && (page->count_info & PGC_count_mask) <= 3
>> && ((page->u.inuse.type_info & PGT_count_mask)
>> - == !!is_xen_heap_page(page))) )
>> + == !!is_xen_heap_page(page)))
>> + && !(shadow_mode_mem_access(v->domain)) )
>
>You're breaking indentation, there are pointless parentheses again,
>but most importantly - why?
Sorry, I meant to ask a question about this in my patch message and mark this
as a workaround. I am seeing the message on all sh_remove_all_mappings() and I
was not able to figure out why this was happening. I just added this as a work
around. I was hoping you or Tim would shed more light on this.
>> @@ -625,6 +627,20 @@ _sh_propagate(struct vcpu *v,
>> }
>> }
>>
>> + /* Propagate access permissions */
>> + if ( unlikely((level == 1) &&
>> + mem_event_check_ring(&d->mem_event->access) &&
>> + !sh_mfn_is_a_page_table(target_mfn)) )
>
>Just a general remark here - unlikely() used like this is pointless, it
>ought to be used on the individual constituents of && or ||.
OK, I will leave this for mem_event_check_ring() case.
>> + {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> + p2m_access_t a;
>> + p2m_type_t t;
>> + mfn_t mfn;
>> + mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, NULL);
>
>Missing blank line between declarations and statements. Also this
>statement could instead be an initializer.
I will make it an initializer.
>> @@ -2822,6 +2838,8 @@ static int sh_page_fault(struct vcpu *v,
>> int r;
>> fetch_type_t ft = 0;
>> p2m_type_t p2mt;
>> + p2m_access_t p2ma;
>
>This variable ...
>
>> @@ -3009,7 +3027,80 @@ static int sh_page_fault(struct vcpu *v,
>>
>> /* What mfn is the guest trying to access? */
>> gfn = guest_l1e_get_gfn(gw.l1e);
>> - gmfn = get_gfn(d, gfn, &p2mt);
>> + if ( likely(!mem_event_check_ring(&d->mem_event->access)) )
>> + gmfn = get_gfn(d, gfn, &p2mt);
>> + /*
>> + * A mem_access listener is present, so we will first check if a
>> violation
>> + * has occurred.
>> + */
>> + else
>> + {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
>
>... seems to be used only in this scope, and with you already adding
>scope restricted variables it should go here.
OK, I will do that.
>> +
>> + gmfn = get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, 0,
>NULL);
>> + if ( mfn_valid(gmfn) && !sh_mfn_is_a_page_table(gmfn)
>> + && (regs->error_code & PFEC_page_present)
>> + && !(regs->error_code & PFEC_reserved_bit) )
>> + {
>> + int violation = 0;
>> + bool_t access_w = !!(regs->error_code & PFEC_write_access);
>> + bool_t access_x = !!(regs->error_code & PFEC_insn_fetch);
>> + bool_t access_r = access_x ? 0 : !(access_w);
>
>Stray parentheses again. I'm not going to repeat this again - please
>just check your code before submitting.
I am really sorry. I will fix this is in the next patch series.
>> + /*
>> + * Do not police writes to guest memory emanating from the Xen
>> + * kernel. Trying to do so will cause the same pagefault to
>> occur
>> + * over and over again with an event being sent to the access
>> + * listener for each fault. If the access listener's vcpu is not
>> + * scheduled during this time, the violation is never resolved
>> and
>> + * will eventually end with the host crashing.
>> + */
>> + if ( (violation && access_w) &&
>> + (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END)
>> )
>> + {
>> + violation = 0;
>> + rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K,
>> + p2m_ram_rw, p2m_access_rw);
>> + }
>
>This looks more like a hack than a proper solution - shouldn't the
>listener be allowed to know of hypervisor side accesses?
Ideally the listener should know of the hypervisor side accesses but I ran in
to cases where the listener would not get to run and the pagefault would be
tried over and over, eventually causing the host to crash. I guess more detail
about the problem is needed. I will send out a separate email regarding this,
CCing you and Tim.
>> +/* Number of access table pages for a PV domain */
>> +#define get_domain_nr_access_table_pages(d) \
>> + DIV_ROUND_UP(P2M_ACCESS_SIZE * (d->tot_pages - 1), PAGE_SIZE)
>
>And once again. I wonder whether this code was tested on a suitably
>big pv-ops PV guest.
The largest PV domain I tried this with was 28GB.
Thanks,
Aravindh
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |