|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 2/3] xen/domain: fix UBSAN null pointer dereference of d->shared_info
On 03.06.2026 10:07, Oleksii Kurochko wrote:
> On 6/3/26 7:54 AM, Jan Beulich wrote:
>> On 02.06.2026 18:11, Oleksii Kurochko wrote:
>>> On 6/2/26 1:19 PM, Jan Beulich wrote:
>>>> On 25.05.2026 15:20, Oleksii Kurochko wrote:
>>>>> It is legal to have d->shared_info equal to NULL for architectures which
>>>>> support only the FIFO ABI for event channel management.
>>>>>
>>>>> Having d->shared_info == NULL leads to a UBSAN issue on such
>>>>> architectures:
>>>>> UBSAN: Undefined behaviour in common/domain.c:325:10
>>>>> member access within null pointer of type 'struct
>>>>> shared_info_t'
>>>>>
>>>>> vcpu_info_reset() maps v->vcpu_info_area.map to the per-vcpu slot inside
>>>>> the domain's shared_info page for vcpus with id < XEN_LEGACY_MAX_VCPUS,
>>>>> and falls back to dummy_vcpu_info for vcpus beyond that limit.
>>>>> Extend the existing fallback condition to also cover the case where no
>>>>> shared_info page has been allocated, mapping the vcpu to dummy_vcpu_info
>>>>> instead. This is the correct behaviour: dummy_vcpu_info already serves
>>>>> as the safe stand-in for vcpus that have no usable shared_info slot.
>>>>>
>>>>> Additionally, if an architecture supports only the FIFO ABI, setup_ports()
>>>>> should be updated to avoid a NULL pointer dereference of d->shared_info,
>>>>> since in that case there will be no pending events in
>>>>> shared_info->evtchn_pending and the pending flag of the FIFO event channel
>>>>> does not need to be set to true.
>>>>> update_domain_wallclock_time() accesses d->shared_info via shared_info()
>>>>> macro. On architectures that do not allocate a shared_info page (currently
>>>>> RISC-V, which runs guests in dom0less mode without the PV ABI), this
>>>>> causes
>>>>> a NULL dereference. The early return is safe: if there is no shared_info
>>>>> page, there is nothing to update. For all existing architectures (x86,
>>>>> ARM)
>>>>> that do allocate it, the guard is never taken and behavior is unchanged.
>>>>>
>>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
>>>>> ---
>>>>> Changes in v2:
>>>>> - Update commit message + subject.
>>>>> - Drop Fixes tag.
>>>>> - Handle migration of pending events from 2L and FIFO ABIs when arch
>>>>> support only FIFO ABI.
>>>>
>>>> What does this item describe? On an arch supporting only FIFO, how could
>>>> evtchn need migrating from 2L?
>>>
>>> Agree this item is inaccurate.
>>>
>>> evtchn_init() always calls evtchn_2l_init(d) first
>>> (event_channel.c:1627), so every domain starts with 2L regardless of
>>> arch (of course, it is just initialization of evtchn_port_ops which
>>> aren't really used when only FIFO is supported).
>>>
>>> setup_ports() is called during the guest-initiated 2L→FIFO transition
>>> (event_fifo.c:637), not at arch init time. There is no arch that
>>> supports "only FIFO" as a starting state and that is why it is needed to
>>> guard setup_ports() against NULL d->shared_info when migrating 2L
>>> pending state to FIFO even 2L wasn't really used by an arch with only
>>> FIFO support.
>>
>> Imo on arch-es not supporting 2L, domains shouldn't start in 2L mode.
>
> Agree but will it be easy to achieve now with the current code base?
>
> The best what could be done it is avoid calling evtchn_2l_init() now in
> event_channel.c and:
> 1. Add a new Kconfig symbol, CONFIG_HAS_EVTCHN_2L (or re-use
> HAS_SHARED_INFO suggested before), selected by x86 and ARM.
I'd stick to just HAS_SHARED_INFO as long as a separate control for 2-
level evtchn isn't strictly needed.
> 2. In evtchn_init() (event_channel.c:1627), guard the call:
> #ifdef CONFIG_HAS_EVTCHN_2L
> evtchn_2l_init(d);
> #else
> evtchn_none_init(d);
> #endif
> 3. Add a small stub ops table (probably in event_fifo.c or a new
> event_none.c) with no-op set_pending/clear_pending/unmask, is_pending
> returning false, is_masked returning true (valid until
> evtchn_fifo_init_control() replaces them).
That's one of the options (the stubs could then as well live in
event_channel.c). Another might be to put the FIFO ops in place right
away, making sure they can cope with evtchn_fifo_init_control() not
having been called yet.
>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -320,7 +320,7 @@ void vcpu_info_reset(struct vcpu *v)
>>>>> struct domain *d = v->domain;
>>>>>
>>>>> v->vcpu_info_area.map =
>>>>> - ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>>>> + ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS && d->shared_info)
>>>>> ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>>>> : &dummy_vcpu_info);
>>>>> }
>>>>
>>>> At the example of this: The extra conditionals are dead code on x86 and
>>>> Arm. While the status of the respective Misra rule is still uncertain
>>>> for Xen, imo we'd be better off avoiding the introduction of new dead
>>>> code. Which in turn means we may need some kind of abstraction to have
>>>> these extra conditionals in place only for arch-es not supporting
>>>> shared-info at all.
>>>
>>> What about then add config HAS_SHARED_INFO to xen/common/Kconfig and then:
>>
>> We're getting closer. Imo we want to go farther, though: shared_info() as a
>> construct should be unavailable when !HAS_SHARED_INFO. _That_ then will
>> make obvious (by causing build failures) that all respective use sites were
>> properly dealt with.
>
> I will add then:
>
> +#ifdef CONFIG_HAS_SHARED_INFO
> #define shared_info(d, field) __shared_info(d, (d)->shared_info,
> field)
> +#endif
>
> But with doing that we have only option of using #ifdef HAS_SHARED_INFO
> in the place where shared_info() is used. If it is fine then I will be
> happy to do in this way.
Well, I gave a suggestion to avoid such #ifdef-ary, ...
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -319,10 +319,14 @@ void vcpu_info_reset(struct vcpu *v)
>>> {
>>> struct domain *d = v->domain;
>>>
>>> +#ifdef CONFIG_HAS_SHARED_INFO
>>> v->vcpu_info_area.map =
>>> - ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS && d->shared_info)
>>> - ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>> - : &dummy_vcpu_info);
>>> + (v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>> + ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>> + : &dummy_vcpu_info;
>>> +#else
>>> + v->vcpu_info_area.map = &dummy_vcpu_info;
>>> +#endif
>>> }
>>
>> I agree with #ifdef here.
>>
>>> --- a/xen/common/event_fifo.c
>>> +++ b/xen/common/event_fifo.c
>>> @@ -562,9 +562,10 @@ static void setup_ports(struct domain *d, unsigned
>>> int prev_evtchns)
>>>
>>> evtchn = evtchn_from_port(d, port);
>>>
>>> - if ( d->shared_info &&
>>> - guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
>>> +#ifdef CONFIG_HAS_SHARED_INFO
>>> + if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
>>> evtchn->pending = true;
>>> +#endif
>>
>> While as per above shared_info() would best not exist when !HAS_SHARED_INFO
>> (in which case #ifdef may be unavoidable here), an alternative where
>> IS_ENABLED() could be used here may want at least considering. E.g.
>> causing a link-time failure when shared_info() is used (and not compiled
>> out).
... here. There are downsides to this, so which route to go needs settling
on.
>>> --- a/xen/common/time.c
>>> +++ b/xen/common/time.c
>>> @@ -94,8 +94,9 @@ void update_domain_wallclock_time(struct domain *d)
>>> uint32_t *wc_version;
>>> uint64_t sec;
>>>
>>> - if ( !d->shared_info )
>>> - return;
>>> +#ifndef CONFIG_HAS_SHARED_INFO
>>> + return;
>>> +#endif
>>>
>>> spin_lock(&wc_lock);
>>
>> Constructs like this are imo somewhat ugly. Using IS_ENABLED() instead
>> would make things at least a little better (again imo).
>
> Considering mentioned above it would be better to #ifdef the whole buddy
> of the function:
>
> void update_domain_wallclock_time(struct domain *d)
> {
> +#ifdef CONFIG_HAS_SHARED_INFO
> +
> uint32_t *wc_version;
> uint64_t sec;
>
> -#ifndef CONFIG_HAS_SHARED_INFO
> - return;
> -#endif
> -
> spin_lock(&wc_lock);
>
> wc_version = &shared_info(d, wc_version);
> @@ -120,6 +118,8 @@ void update_domain_wallclock_time(struct domain *d)
> *wc_version = version_update_end(*wc_version);
>
> spin_unlock(&wc_lock);
> +
> +#endif /* CONFIG_HAS_SHARED_INFO */
> }
In which case there's no point having use sites actually call here. I.e.
we then may want to have an inline stub when !HAS_SHARED_INFO.
> Considering also that shared_info is expected to be used only for 2L
> then it would be better to introduce CONFIG_HAS_EVTCHN_2L instead.
How does the event channel model in use matter for
update_domain_wallclock_time()?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |