[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] xen/domain: fix UBSAN null pointer dereference of d->shared_info


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 3 Jun 2026 10:18:38 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 03 Jun 2026 08:18:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.06.2026 10:07, Oleksii Kurochko wrote:
> On 6/3/26 7:54 AM, Jan Beulich wrote:
>> On 02.06.2026 18:11, Oleksii Kurochko wrote:
>>> On 6/2/26 1:19 PM, Jan Beulich wrote:
>>>> On 25.05.2026 15:20, Oleksii Kurochko wrote:
>>>>> It is legal to have d->shared_info equal to NULL for architectures which
>>>>> support only the FIFO ABI for event channel management.
>>>>>
>>>>> Having d->shared_info == NULL leads to a UBSAN issue on such 
>>>>> architectures:
>>>>>     UBSAN: Undefined behaviour in common/domain.c:325:10
>>>>>            member access within null pointer of type 'struct 
>>>>> shared_info_t'
>>>>>
>>>>> vcpu_info_reset() maps v->vcpu_info_area.map to the per-vcpu slot inside
>>>>> the domain's shared_info page for vcpus with id < XEN_LEGACY_MAX_VCPUS,
>>>>> and falls back to dummy_vcpu_info for vcpus beyond that limit.
>>>>> Extend the existing fallback condition to also cover the case where no
>>>>> shared_info page has been allocated, mapping the vcpu to dummy_vcpu_info
>>>>> instead. This is the correct behaviour: dummy_vcpu_info already serves
>>>>> as the safe stand-in for vcpus that have no usable shared_info slot.
>>>>>
>>>>> Additionally, if an architecture supports only the FIFO ABI, setup_ports()
>>>>> should be updated to avoid a NULL pointer dereference of d->shared_info,
>>>>> since in that case there will be no pending events in
>>>>> shared_info->evtchn_pending and the pending flag of the FIFO event channel
>>>>> does not need to be set to true.
>>>>> update_domain_wallclock_time() accesses d->shared_info via shared_info()
>>>>> macro. On architectures that do not allocate a shared_info page (currently
>>>>> RISC-V, which runs guests in dom0less mode without the PV ABI), this 
>>>>> causes
>>>>> a NULL dereference. The early return is safe: if there is no shared_info
>>>>> page, there is nothing to update. For all existing architectures (x86, 
>>>>> ARM)
>>>>> that do allocate it, the guard is never taken and behavior is unchanged.
>>>>>
>>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
>>>>> ---
>>>>> Changes in v2:
>>>>>    - Update commit message + subject.
>>>>>    - Drop Fixes tag.
>>>>>    - Handle migration of pending events from 2L and FIFO ABIs when arch
>>>>>      support only FIFO ABI.
>>>>
>>>> What does this item describe? On an arch supporting only FIFO, how could
>>>> evtchn need migrating from 2L?
>>>
>>> Agree this item is inaccurate.
>>>
>>> evtchn_init() always calls evtchn_2l_init(d) first
>>> (event_channel.c:1627), so every domain starts with 2L regardless of
>>> arch (of course, it is just initialization of evtchn_port_ops which
>>> aren't really used when only FIFO is supported).
>>>
>>> setup_ports() is called during the guest-initiated 2L→FIFO transition
>>> (event_fifo.c:637), not at arch init time. There is no arch that
>>> supports "only FIFO" as a starting state and that is why it is needed to
>>> guard setup_ports() against NULL d->shared_info when migrating 2L
>>> pending state to FIFO even 2L wasn't really used by an arch with only
>>> FIFO support.
>>
>> Imo on arch-es not supporting 2L, domains shouldn't start in 2L mode.
> 
> Agree but will it be easy to achieve now with the current code base?
> 
> The best what could be done it is avoid calling evtchn_2l_init() now in 
> event_channel.c and:
> 1. Add a new Kconfig symbol, CONFIG_HAS_EVTCHN_2L (or re-use 
> HAS_SHARED_INFO suggested before), selected by x86 and ARM.

I'd stick to just HAS_SHARED_INFO as long as a separate control for 2-
level evtchn isn't strictly needed.

> 2. In evtchn_init() (event_channel.c:1627), guard the call:
>     #ifdef CONFIG_HAS_EVTCHN_2L
>        evtchn_2l_init(d);
>     #else
>        evtchn_none_init(d);
>     #endif
> 3. Add a small stub ops table (probably in event_fifo.c or a new 
> event_none.c) with no-op set_pending/clear_pending/unmask, is_pending 
> returning false, is_masked returning true (valid until 
> evtchn_fifo_init_control() replaces them).

That's one of the options (the stubs could then as well live in
event_channel.c). Another might be to put the FIFO ops in place right
away, making sure they can cope with evtchn_fifo_init_control() not
having been called yet.

>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -320,7 +320,7 @@ void vcpu_info_reset(struct vcpu *v)
>>>>>        struct domain *d = v->domain;
>>>>>    
>>>>>        v->vcpu_info_area.map =
>>>>> -        ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>>>> +        ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS && d->shared_info)
>>>>>             ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>>>>             : &dummy_vcpu_info);
>>>>>    }
>>>>
>>>> At the example of this: The extra conditionals are dead code on x86 and
>>>> Arm. While the status of the respective Misra rule is still uncertain
>>>> for Xen, imo we'd be better off avoiding the introduction of new dead
>>>> code. Which in turn means we may need some kind of abstraction to have
>>>> these extra conditionals in place only for arch-es not supporting
>>>> shared-info at all.
>>>
>>> What about then add config HAS_SHARED_INFO to xen/common/Kconfig and then:
>>
>> We're getting closer. Imo we want to go farther, though: shared_info() as a
>> construct should be unavailable when !HAS_SHARED_INFO. _That_ then will
>> make obvious (by causing build failures) that all respective use sites were
>> properly dealt with.
> 
> I will add then:
> 
> +#ifdef CONFIG_HAS_SHARED_INFO
>   #define shared_info(d, field)      __shared_info(d, (d)->shared_info, 
> field)
> +#endif
> 
> But with doing that we have only option of using #ifdef HAS_SHARED_INFO 
> in the place where shared_info() is used. If it is fine then I will be 
> happy to do in this way.

Well, I gave a suggestion to avoid such #ifdef-ary, ...

>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -319,10 +319,14 @@ void vcpu_info_reset(struct vcpu *v)
>>>    {
>>>        struct domain *d = v->domain;
>>>
>>> +#ifdef CONFIG_HAS_SHARED_INFO
>>>        v->vcpu_info_area.map =
>>> -        ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS && d->shared_info)
>>> -         ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>> -         : &dummy_vcpu_info);
>>> +        (v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>> +        ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
>>> +        : &dummy_vcpu_info;
>>> +#else
>>> +    v->vcpu_info_area.map = &dummy_vcpu_info;
>>> +#endif
>>>    }
>>
>> I agree with #ifdef here.
>>
>>> --- a/xen/common/event_fifo.c
>>> +++ b/xen/common/event_fifo.c
>>> @@ -562,9 +562,10 @@ static void setup_ports(struct domain *d, unsigned
>>> int prev_evtchns)
>>>
>>>            evtchn = evtchn_from_port(d, port);
>>>
>>> -        if ( d->shared_info &&
>>> -             guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
>>> +#ifdef CONFIG_HAS_SHARED_INFO
>>> +        if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
>>>                evtchn->pending = true;
>>> +#endif
>>
>> While as per above shared_info() would best not exist when !HAS_SHARED_INFO
>> (in which case #ifdef may be unavoidable here), an alternative where
>> IS_ENABLED() could be used here may want at least considering. E.g.
>> causing a link-time failure when shared_info() is used (and not compiled
>> out).

... here. There are downsides to this, so which route to go needs settling
on.

>>> --- a/xen/common/time.c
>>> +++ b/xen/common/time.c
>>> @@ -94,8 +94,9 @@ void update_domain_wallclock_time(struct domain *d)
>>>        uint32_t *wc_version;
>>>        uint64_t sec;
>>>
>>> -    if ( !d->shared_info )
>>> -        return;
>>> +#ifndef CONFIG_HAS_SHARED_INFO
>>> +    return;
>>> +#endif
>>>
>>>        spin_lock(&wc_lock);
>>
>> Constructs like this are imo somewhat ugly. Using IS_ENABLED() instead
>> would make things at least a little better (again imo).
> 
> Considering mentioned above it would be better to #ifdef the whole buddy 
> of the function:
> 
> void update_domain_wallclock_time(struct domain *d)
>   {
> +#ifdef CONFIG_HAS_SHARED_INFO
> +
>       uint32_t *wc_version;
>       uint64_t sec;
> 
> -#ifndef CONFIG_HAS_SHARED_INFO
> -    return;
> -#endif
> -
>       spin_lock(&wc_lock);
> 
>       wc_version = &shared_info(d, wc_version);
> @@ -120,6 +118,8 @@ void update_domain_wallclock_time(struct domain *d)
>       *wc_version = version_update_end(*wc_version);
> 
>       spin_unlock(&wc_lock);
> +
> +#endif /* CONFIG_HAS_SHARED_INFO */
>   }

In which case there's no point having use sites actually call here. I.e.
we then may want to have an inline stub when !HAS_SHARED_INFO.

> Considering also that shared_info is expected to be used only for 2L 
> then it would be better to introduce CONFIG_HAS_EVTCHN_2L instead.

How does the event channel model in use matter for
update_domain_wallclock_time()?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.