[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/5] xen/memory: Fix acquire_resource size semantics



On 30/07/2020 09:31, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Andrew 
>> Cooper
>> Sent: 28 July 2020 12:37
>> To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Hubert Jasudowicz <hubert.jasudowicz@xxxxxxx>; Stefano Stabellini 
>> <sstabellini@xxxxxxxxxx>; Julien
>> Grall <julien@xxxxxxx>; Wei Liu <wl@xxxxxxx>; Konrad Rzeszutek Wilk 
>> <konrad.wilk@xxxxxxxxxx>; George
>> Dunlap <George.Dunlap@xxxxxxxxxxxxx>; Andrew Cooper 
>> <andrew.cooper3@xxxxxxxxxx>; Paul Durrant
>> <paul@xxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx>; Michał Leszczyński 
>> <michal.leszczynski@xxxxxxx>; Ian
>> Jackson <ian.jackson@xxxxxxxxxx>
>> Subject: [PATCH 4/5] xen/memory: Fix acquire_resource size semantics
>>
>> Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
>> size of the resource, but the returned 32 is bogus.
>>
>> If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
>> call will fail as IOREQ servers currently top out at 2 frames, and it is only
>> half the size of the default grant table limit for guests.
>>
>> Also, no users actually request a resource size, because it was never wired 
>> up
>> in the sole implemenation of resource aquisition in Linux.
>>
>> Introduce a new resource_max_frames() to calculate the size of a resource, 
>> and
>> implement it the IOREQ and grant subsystems.
>>
>> It is impossible to guarentee that a mapping call following a successful size
> s/guarantee/guarantee
>
>> call will succedd (e.g. The target IOREQ server gets destroyed, or the domain
> s/succedd/succeed
>
>> switches from grant v2 to v1).  Document the restriction, and use the
>> flexibility to simplify the paths to be lockless.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
>> CC: Ian Jackson <ian.jackson@xxxxxxxxxx>
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>> CC: Wei Liu <wl@xxxxxxx>
>> CC: Julien Grall <julien@xxxxxxx>
>> CC: Paul Durrant <paul@xxxxxxx>
>> CC: Michał Leszczyński <michal.leszczynski@xxxxxxx>
>> CC: Hubert Jasudowicz <hubert.jasudowicz@xxxxxxx>
>> ---
>>  xen/arch/x86/mm.c             | 20 ++++++++++++++++
>>  xen/common/grant_table.c      | 19 +++++++++++++++
>>  xen/common/memory.c           | 55 
>> +++++++++++++++++++++++++++++++++----------
>>  xen/include/asm-x86/mm.h      |  3 +++
>>  xen/include/public/memory.h   | 16 +++++++++----
>>  xen/include/xen/grant_table.h |  8 +++++++
>>  xen/include/xen/mm.h          |  6 +++++
>>  7 files changed, 110 insertions(+), 17 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 82bc676553..f73a90a2ab 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -4600,6 +4600,26 @@ int xenmem_add_to_physmap_one(
>>      return rc;
>>  }
>>
>> +unsigned int arch_resource_max_frames(
>> +    struct domain *d, unsigned int type, unsigned int id)
>> +{
>> +    unsigned int nr = 0;
>> +
>> +    switch ( type )
>> +    {
>> +#ifdef CONFIG_HVM
>> +    case XENMEM_resource_ioreq_server:
>> +        if ( !is_hvm_domain(d) )
>> +            break;
>> +        /* One frame for the buf-ioreq ring, and one frame per 128 vcpus. */
>> +        nr = 1 + DIV_ROUND_UP(d->max_vcpus * sizeof(struct ioreq), 
>> PAGE_SIZE);
>> +        break;
>> +#endif
>> +    }
>> +
>> +    return nr;
>> +}
>> +
>>  int arch_acquire_resource(struct domain *d, unsigned int type,
>>                            unsigned int id, unsigned long frame,
>>                            unsigned int nr_frames, xen_pfn_t mfn_list[])
>> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index 122d1e7596..0962fc7169 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,25 @@ static int gnttab_get_shared_frame_mfn(struct domain 
>> *d,
>>      return 0;
>>  }
>>
>> +unsigned int gnttab_resource_max_frames(struct domain *d, unsigned int id)
>> +{
>> +    unsigned int nr = 0;
>> +
>> +    /* Don't need the grant lock.  This limit is fixed at domain create 
>> time. */
>> +    switch ( id )
>> +    {
>> +    case XENMEM_resource_grant_table_id_shared:
>> +        nr = d->grant_table->max_grant_frames;
>> +        break;
>> +
>> +    case XENMEM_resource_grant_table_id_status:
>> +        nr = grant_to_status_frames(d->grant_table->max_grant_frames);
> Two uses of d->grant_table, so perhaps define a stack variable for it?

Can do.

>  Also, should you not make sure 0 is returned in the case of a v1 table?

This was the case specifically discussed in the commit message, but
perhaps it needs expanding.

Doing so would be buggy.

Some utility is going to query the resource size, and then try to map it
(if it doesn't blindly know the size and/or subset it cares about already).

In between these two hypercalls from the utility, the guest can do a
v1=>v2 or v2=>v1 switch and make the resource spontaneously appear or
disappear.

The only case where we can know for certain whether the resource is
available is when we're in the map hypercall.  Therefore, userspace has
to be able to get to the map call if there is potentially a resource
available.

The semantics of the size call are really "this resource might exist,
and if it does, this is how large it is".


As for the grant status frames specifically, I think making them a
mappable resource might have been a poor choice in hind sight.

Only the guest can switch between grant versions.  GNTTABOP_set_version
strictly operates on current, unlike most of the other grant hypercalls
which take a domid and let dom0 specify something other than DOMID_SELF.

There is GNTTABOP_get_version, but it is racy to use in the same way as
described above, and if some utility does successfully map the status
frames, what will happen in practice is that a guest attempting to
switch from v2 back to v1 will have the set_version hypercall fail due
to outstanding refs on the frames.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.