[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op



On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>
> On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
> > wrote:
> > >
> > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > > +static inline uint16_t
> > > > +argo_hash_fn(const struct argo_ring_id *id)
> > >
> > > No need for the argo_ prefix for static functions, this is already an
> > > argo specific file.
> >
> > Although the compiler could live without the prefix, I'm finding it helpful 
> > to
> > very easily determine that functions being used are not defined elsewhere
> > within Xen; so I've left the prefix as is for version two of this series.
>
> Why do you care whether they are defined elsewhere in Xen? The scope
> of static functions is limited to the translation unit anyway.

ok, I'll remove the prefixes - you're right that I shouldn't care
whether they are defined elsewhere in Xen, and Jan's points about the
string table expansion and serial line bandwidth are true - I had not
considered those. Would adding a note to describe this reasoning to
the CODING_STYLE document be welcome?

> > > > +#else
> > > > +    *mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > > > +#endif
> > > > +
> > > > +    if ( !mfn_valid(*mfn) )
> > > > +        ret = -EINVAL;
> > > > +#ifdef CONFIG_X86
> > > > +    else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > > > +        ret = -EAGAIN;
> > > > +#endif
> > > > +    else if ( (p2mt != p2m_ram_rw) ||
> > > > +              !get_page_and_type(mfn_to_page(*mfn), d, 
> > > > PGT_writable_page) )
> > > > +        ret = -EINVAL;
> > > > +
> > > > +#ifdef CONFIG_X86
> > > > +    put_gfn(d, pfn);
> > >
> > > If you do this put_gfn here, by the time you check that the gfn -> mfn
> > > matches your expectations the guest might have somehow changed the gfn
> > > -> mfn mapping already (for example by ballooning down memory?)
> >
> > If the guest does that, I think it only harms itself. If for some reason
> > a memory access is denied, then the op would just fail. I don't think
> > there's a more serious consequence to be worried about.
>
> Then I wonder why you need such check in any case if the code can
> handle such cases, the more than the check itself is racy.

OK, so at the root of the question here is: does it matter what the p2m
type of the memory is at these points:

1) when the gfn is translated to mfn, at the time of ring registration

2) when the hypervisor writes into guest memory:
    - where the tx_ptr index is initialized in the register op
    - where ringbuf data is written in sendv
    - where ring description data is written in notify

or is having PGT_writable_page type and ownership by the domain
sufficient?

For 1), I think there's some use in saying no to a guest that has
supplied a region that appears misconfigured.

For 2), input would be appreciated. It currently works under the
assumption that a p2m type check is unnecessary, which is why the
put_gfn is where it is.

For further background context, here's my understanding of this section:

When the guest invokes the hypercall operation to register a ring, it
identifies the memory that it owns and wants the hypervisor to use by
supplying an array of gfns (or in v2, addresses which are shifted to
extract their gfns).

The hypervisor translates from gfns to mfns, using the translation that
exists at that time, and then refers to that memory internally by mfn
from there on out. This find_ring_mfn function is where the gfn->mfn
translation happens.  (The variable name does needs renaming from pfn,
as you noted - thanks.)

To do the translation from gfn to mfn, (on x86) it's using
get_gfn_unshare. That's doing three things:
* returns the mfn, if there is one.
* returns the p2m type of that memory.
* acquires a reference to that gfn, which needs to be dropped at some
  point.

The p2m type type check on the gfn in find_ring_mfn at that time is
possibly conservative, rejecting more types than perhaps it needs to,
but the type that it accepts (p2m_ram_rw) is sane. It is a validation of
the p2m type at that instant, intended to detect if the guest has
supplied memory to the ring register op that does not make sense for it
to use as a ring, as indicated by the current p2m type, and if so, fail
early, or indicate that a retry later is needed.

Then the get_page_and_type call is where the memory identified by the
mfn that was just obtained, gets locked to PGT_writable_page type, and
ownership fixed to its current owner domain, by adding to its reference
count.

Then the gfn reference count is dropped with the put_gfn call. This
means that the guest can elect to change the p2m type afterwards, if it
wants; (any change needs to be consistent with its domain ownership and
PGT_writable_page type though -- not sure if that constrains possible types).

That memory can have guest-supplied data written into it, either by the
domain owning the page itself, or in response to argo sendv operations
by other domains that are authorized to transmit into the ring.

Your note that the "check itself is racy": ie. that a change of p2m type
could occur immediately afterwards is true.

So: Do you think that a check on the current p2m type of the pages in
the ring is needed at the points where the hypervisor issues writes into
that ring memory?


> > Above, if we're going to use the mfn, then we've just done a successful:
> >     get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)
> >
> > which should hold it in a state that we're ok with until we're done
> > with it -- see put_page_and_type in argo_ring_remove_mfns.
> >
> > > > +        /* W(L2) protects all the elements of the domain's ring_info */
> > > > +        write_lock(&d->argo->lock);
> > >
> > > I don't understand this W(L2) nomenclature, is this explain somewhere?
> >
> > Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
> > comment near the top of the file. It's a read-write lock, so 'W' means:
> > take the write lock on it.
> >
> > > Also there's no such comment when you take the global argo_lock above.
> >
> > L2 covers more interesting work than L1, which is why there are more
> > comments pertaining to it than L1.
>
> I would add such comments about which locks protect what items to the
> declaration of the locks, rather than the usage place.

ack, and those comments are there, introduced earlier in the series.
There's a dedicated section with comments on locking near the top
(titled: "locking is organized as follows") and comments within the
argo_ring_info data structure definition for protection of each field.

> I don't see a
> lot of value in the comments there unless they maybe describe an
> exception or a corner case, but that might just be my taste.

ack. It's notable with that specific site that it's (intentionally,
necessarily) a write_lock that is being taken, rather than a
read_lock; maybe that's enough to fall under your exception
/ corner case condition. I can drop it if it's really unwanted.

> > > > +/*
> > > > + * Messages on the ring are padded to 128 bits
> > > > + * Len here refers to the exact length of the data not including the
> > > > + * 128 bit header. The message uses
> > > > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > > > + * Using typeof(a) make clear that this does not truncate any 
> > > > high-order bits.
> > > > + */
> > > > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
> > >
> > > Why not just use ROUNDUP?
> > >
> > > And in any case this shouldn't be on the public header IMO, since it's
> > > not part of the interface AFAICT.
> >
> > Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> > because it does need to be in the public header because it's used within the
> > Linux device driver, and items in that public Xen header need the 'xen' 
> > prefix
> > (so they now do).  Within the Linux code, it's used to choose a sensible 
> > ring
> > size, and also used when manipulating the rx_ptr on the guest side.
>
> I'm quite sure Linux (or any other OS) will have a roundup helper, or
> if there's indeed an OS without a roundup helper it should be added to
> the generic OS code. There's nothing Xen or ARGO specific in this
> roundup helper, hence I see no need to add it to the public header.
>
> I think you should instead:
>
> #define XEN_ARGO_MESSAGE_SIZE 0xf
>
> Or some such and use that value with the OS roundup helper.

OK, I'll go and look. Thanks for the suggestion.

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.