[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ioreq: Assert with out of bounds vCPU ID



Le 14/11/2025 à 14:24, Andrew Cooper a écrit :
> On 14/11/2025 1:05 pm, Teddy Astie wrote:
>> A 4K page appears to be able to hold 128 ioreq entries, which luckly
>> matches the current vCPU limit. However, if we decide to increase the
>> domain vCPU limit, that doesn't hold anymore and this function would now
>> silently create a out of bounds pointer leading to confusing problems.
>>
>> All architectures have no more than 128 as vCPU limit on HVM guests,
>> and have pages that are at most 4 KB, so this case doesn't occurs in
>> with the current limits.
>>
>> Assert if the vCPU ID will lead to a out of bounds pointer.
>>
>> No functional change.
>>
>> Reported-by: Julian Vetter <julian.vetter@xxxxxxxxxx>
>> Signed-off-by: Teddy Astie <teddy.astie@xxxxxxxxxx>
>> ---
>> Not sure if this is the best approach, perhaps preventing compilation if the
>> vCPU limit is higher than what the ioreq page can hold is preferable ?
>>
>>   xen/common/ioreq.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
>> index f5fd30ce12..b2ef46ed7b 100644
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -99,6 +99,7 @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct 
>> vcpu *v)
>>
>>       ASSERT((v == current) || !vcpu_runnable(v));
>>       ASSERT(p != NULL);
>> +    ASSERT(v->vcpu_id < (PAGE_SIZE / sizeof(struct ioreq)));
>>
>>       return &p->vcpu_ioreq[v->vcpu_id];
>>   }
>
> The 128 vCPU problem was known and IOERQ servers intentionally had
> multi-page capabilities from the outset to address this problem.
>
> See the calculation in ioreq_server_max_frames().
>
> It hasn't been exercised in anger because there's still the APIC ID
> limit at 128 still, but IOREQ servers are expected to work correctly
> above this limit.


Yes, although only the first ioreq page exists and can be used (even
though ioreq_server_max_frames may report more). It looks like the logic
is incomplete there.


> That said, this function is clearly buggy and in need
> of fixing.
>
> To the assert specifically, that's really not appropriate.  If it were
> to ever fire, we'd have an XSA.  Logic if this nature either needs to be
> fail safe (if returning NULL is ok), or be a BUG() so it doesn't become
> unsafe in release builds.
>

It looks like it is possible to somewhat gracefully handle this, by e.g
crashing the offending domain instead of stepping in a assert. I'm
trying this for v2.

> ~Andrew



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.