|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 2/3] xen/domain: fix UBSAN null pointer dereference in vcpu_info_reset()
On 20.05.2026 15:40, Oleksii Kurochko wrote:
>
>
> On 5/20/26 2:03 PM, Jan Beulich wrote:
>> On 20.05.2026 13:33, Oleksii Kurochko wrote:
>>>
>>>
>>> On 5/19/26 1:53 PM, Jan Beulich wrote:
>>>> On 19.05.2026 13:22, Oleksii Kurochko wrote:
>>>>> On 5/19/26 12:55 PM, Oleksii Kurochko wrote:
>>>>>> On 5/19/26 11:37 AM, Jan Beulich wrote:
>>>>>>> On 19.05.2026 10:39, Oleksii Kurochko wrote:
>>>>>>>> vcpu_info_reset() maps v->vcpu_info_area.map to the per-vcpu slot
>>>>>>>> inside
>>>>>>>> the domain's shared_info page for vcpus with id < XEN_LEGACY_MAX_VCPUS,
>>>>>>>> and falls back to dummy_vcpu_info for vcpus beyond that limit.
>>>>>>>>
>>>>>>>> However, it does not guard against d->shared_info being NULL. The
>>>>>>>> shared_info() macro expands to a member access through d->shared_info,
>>>>>>>> so when an architecture does not allocate a shared_info page the
>>>>>>>> dereference triggers UBSAN:
>>>>>>>> UBSAN: Undefined behaviour in common/domain.c:325:10
>>>>>>>> member access within null pointer of type 'struct shared_info_t'
>>>>>>>>
>>>>>>>> Extend the existing fallback condition to also cover the case where no
>>>>>>>> shared_info page has been allocated, mapping the vcpu to
>>>>>>>> dummy_vcpu_info
>>>>>>>> instead. This is the correct behaviour: dummy_vcpu_info already serves
>>>>>>>> as the safe stand-in for vcpus that have no usable shared_info slot.
>>>>>>>>
>>>>>>>> Fixes: 295514ff75506 ("common: convert vCPU info area registration")
>>>>>>>
>>>>>>> I question this, largely (but not only) because I also ...
>>>>>>>
>>>>>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
>>>>>>>> Reviewed-by: Baptiste Le Duc <baptiste.le-duc@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> RISC-V does not allocate a shared_info page at the momemnt because its
>>>>>>>> guests run in dom0less mode and do not use the Xen PV ABI, so
>>>>>>>> d->shared_info remains NULL throughout domain lifetime.
>>>>>>>
>>>>>>> ... question this mode of operation. Yes, you may (for now) be able to
>>>>>>> get
>>>>>>> away without, but e.g. event channels will want supporting at some
>>>>>>> point.
>>>>>>> Which will require a shared info page. Better put that in place right
>>>>>>> away,
>>>>>>> even if the guests you test with don't use it (yet). Certain other
>>>>>>> common
>>>>>>> code also assumes d->shared_info to never be NULL for an alive domain.
>>>>>>>
>>>>>>
>>>>>> Would it be fine than to allocate it in arch_domain_create() ... :
>>>>>>
>>>>>> if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL )
>>>>>> goto fail;
>>>>>>
>>>>>> clear_page(d->shared_info);
>>>>>>
>>>>>> ... but without calling share_xen_page_with_guest() after that
>>>>>> allocation as share_xen_page_with_guest() isn't implemented at the
>>>>>> moment?
>>>>
>>>> I would have said "yes" here, but ...
>>>>
>>>>> Or could it be an option for all arch-s move allocation of
>>>>> d->shared_info to domain_create() in common just after
>>>>> arch_domain_create()?
>>>>
>>>> ... Andrew's reply pretty much rules out not only this option, but the
>>>> shared-info-page concept as a whole (for RISC-V). See my reply there. In
>>>> the meantime, the change as suggested may then indeed be what we want to
>>>> go with, albeit (a) with a better description and (b) perhaps covering
>>>> all d->shared_info uses.
>>>
>>> Looking at guest kernel code (Linux), FIFO is tried first, so if RISC-V
>>> is going to support only FIFO, d->shared_info could legally be NULL.
>>>
>>> Looking at the Xen side, if an architecture decides to support only
>>> FIFO, d->shared_info is touched only in vcpu_info_reset(), which is
>>> called from vcpu_create().
>>>
>>> All other places where d->shared_info is accessed should not be
>>> reachable except for one case in event_fifo.c: when a guest issues the
>>> EVTCHNOP_init_control hypercall, setup_ports() reads from shared_info(d,
>>> evtchn_pending):
>>> static void setup_ports(struct domain *d, unsigned int prev_evtchns)
>>> {
>>> ...
>>> if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending))
>>> evtchn->pending = true;
>>> ...
>>> }
>>> }
>>>
>>> This looks like it handles the transition from the 2L ABI to the FIFO
>>> ABI: if a guest started with 2L and then switched to FIFO, any events
>>> already pending in shared_info(d, evtchn_pending) need to be migrated to
>>> FIFO's per-channel evtchn->pending flag. But it looks like I am missing
>>> something here as I mentioned at the start that Linux uses or FIFO or 2L.
>>>
>>> Am I missing something?
>>
>> Quite likely you aren't, but I didn't check. My earlier "covering all" may
>> well resolve to merely stating things accordingly in the patch description.
>
> If either FIFO or 2L can be used, shouldn't guest_test_bit(d, port,
> &shared_info(d, evtchn_pending)) in setup_ports() be dropped? If FIFO
> was chosen by Linux, there won't be any events in &shared_info(d,
> evtchn_pending), so it is essentially dead code that could just be
> dropped.
Why would it be dead code? Who said that a guest couldn't to 2L for a
while, then switch to FIFO? Think of boot loaders, for example.
Jan
> Or would it be better to leave it and skip only if
> d->shared_info is allocated: if ( d->shared_info && guest_test_bit(...)
> ) to cover the case when a guest wants to switch from 2L to FIFO (if
> that is even a possible case at all, since as I mentioned above, the
> guest (Linux) chooses the event ABI once and it stays for its lifetime)?
>
> ~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |