[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 05/10] hardware domain: convert to domain roles



On 14.05.2021 22:54, Daniel P. Smith wrote:
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -169,13 +169,14 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
>  static inline struct vcpu *choose_hwdom_vcpu(void)
>  {
>      unsigned idx;
> +    struct domain *hwdom = get_hardware_domain();

When introducing new pointer variables, please make them pointer-
to-const whenever possible.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4917,7 +4917,7 @@ mfn_t alloc_xen_pagetable_new(void)
>      {
>          void *ptr = alloc_xenheap_page();
>  
> -        BUG_ON(!hardware_domain && !ptr);
> +        BUG_ON(!ptr);

This loses an important aspect: We should not crash here anymore once
we've made it far enough to have started constructing Dom0. As you can
see ...

>          return ptr ? virt_to_mfn(ptr) : INVALID_MFN;

... here, the case does actually get handled.

If you make behavioral changes in, especially, an otherwise largely
(seeing its overall size) mechanical change, please make sure you call
them out in the description.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -776,6 +776,9 @@ static struct domain *__init create_dom0(const module_t 
> *image,
>      if ( IS_ERR(d) || (alloc_dom0_vcpu0(d) == NULL) )
>          panic("Error creating domain 0\n");
>  
> +    /* Ensure the correct roles are assigned */
> +    d->xsm_roles = CLASSIC_DOM0_PRIVS;

Didn't an earlier change put this in place already? This shouldn't be
needed in arch-specific code. The cover letter also doesn't mention
that you're not touching Arm code in this RFC, so a similar change
would then be missing there.

> @@ -302,23 +303,50 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
> vcpu_id)
>      return NULL;
>  }
>  
> -static int late_hwdom_init(struct domain *d)
> +/* pivot_hw_ctl:
> + *  This is a one-way pivot from existing to new hardware domain. Upon 
> success
> + *  the domain *next_hwdom will be in control of the hardware and domain
> + *  *curr_hwdom will no longer have access.
> + */
> +static int pivot_hw_ctl(struct domain *next_hwdom)
>  {
>  #ifdef CONFIG_LATE_HWDOM
> -    struct domain *dom0;
> +    bool already_found = false;
> +    struct domain **pd = &domain_list, *curr_hwdom = NULL;
> +    domid_t dom0_id = 0;
>      int rv;
>  
> -    if ( d != hardware_domain || d->domain_id == 0 )
> +#ifdef CONFIG_PV_SHIM
> +    /* On PV shim dom0 != 0 */
> +    dom0_id = get_initial_domain_id();
> +#endif

The suddent need for shim specific logic here also wants explaining
in the description (or, if possible, splitting into a separate
change).

> @@ -559,17 +589,19 @@ struct domain *domain_create(domid_t domid,
>      /* Sort out our idea of is_control_domain(). */
>      d->is_privileged = is_priv;
>  
> -    if (is_priv)
> +    /* reality is that is_priv is only set when construction dom0 */
> +    if (is_priv) {
>          d->xsm_roles = CLASSIC_DOM0_PRIVS;
> +        hardware_domain = d;
> +    }
>  
>      /* Sort out our idea of is_hardware_domain(). */
> -    if ( domid == 0 || domid == hardware_domid )
> +    if ( domid == hardware_domid )

With this change it looks to me as if ...

>      {
>          if ( hardware_domid < 0 || hardware_domid >= DOMID_FIRST_RESERVED )
>              panic("The value of hardware_dom must be a valid domain ID\n");

... this was rendered dead code.

> -        old_hwdom = hardware_domain;
> -        hardware_domain = d;
> +        d->xsm_roles = CLASSIC_HWDOM_PRIVS;

Yet another place where this value gets stored. Ideally there would
be exactly one such place.

> @@ -682,12 +714,14 @@ struct domain *domain_create(domid_t domid,
>          if ( (err = sched_init_domain(d, 0)) != 0 )
>              goto fail;
>  
> -        if ( (err = late_hwdom_init(d)) != 0 )
> +        if ( (err = pivot_hw_ctl(d)) != 0 )
>              goto fail;
>  
>          /*
>           * Must not fail beyond this point, as our caller doesn't know 
> whether
> -         * the domain has been entered into domain_list or not.
> +         * the domain has been entered into domain_list or not. Additionally
> +         * if a hardware control pivot occurred then a failure will leave the
> +         * platform without access to hardware.
>           */

s/will/would/, considering the initial "Must not ..."?

> @@ -711,8 +745,6 @@ struct domain *domain_create(domid_t domid,
>      err = err ?: -EILSEQ; /* Release build safety. */
>  
>      d->is_dying = DOMDYING_dead;
> -    if ( hardware_domain == d )
> -        hardware_domain = old_hwdom;
>      atomic_set(&d->refcnt, DOMAIN_DESTROYED);
>  
>      sched_destroy_domain(d);

Isn't this dealing with earlier failures, and hence needs if not
retaining, then replacing?

> @@ -808,6 +840,42 @@ out:
>  }
>  
>  

I realize you've found a pair of blank lines here, but rather than ...

> +bool is_hardware_domain_started()
> +{
> +    bool exists = false;
> +    struct domain **pd = &domain_list;
> +
> +    if ( *pd != NULL) {
> +        rcu_read_lock(&domlist_read_lock);
> +
> +        for ( ; *pd != NULL; pd = &(*pd)->next_in_list )
> +            if ( (*pd)->xsm_roles & XSM_HW_CTRL )
> +                break;
> +
> +        rcu_read_unlock(&domlist_read_lock);
> +
> +        if ( *pd != NULL )
> +            exists = true;
> +    }
> +
> +    if (exists)
> +        ASSERT(*pd == hardware_domain);
> +
> +    return exists;
> +}
> +
> +

... adding more and ...

> +struct domain *get_hardware_domain()
> +{
> +    if (hardware_domain == NULL)
> +        return NULL;
> +
> +    ASSERT(hardware_domain->xsm_roles & XSM_HW_CTRL);
> +
> +    return hardware_domain;
> +}
> +
> +

... yet more, please insert in the middle of those original two
blank lines. Patch application (especially when larger offsets
are involved, e.g. during backporting activities) benefits from
meaningful context lines rather than many almost identical ones
(and then even relatively close to each other).

As to is_hardware_domain_started() - I'm afraid this is too much
overhead in case there are hundreds or thousands of guests.

> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -228,12 +228,12 @@ static void dump_hwdom_registers(unsigned char key)
>  {
>      struct vcpu *v;
>  
> -    if ( hardware_domain == NULL )
> +    if ( is_hardware_domain_started() )
>          return;

Aren't you inverting the original condition?

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -776,7 +776,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      ret = 0;
>      if ( !pdev->domain )
>      {
> -        pdev->domain = hardware_domain;
> +        pdev->domain = get_hardware_domain();
>          ret = iommu_add_device(pdev);
>          if ( ret )
>          {
> @@ -784,7 +784,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              goto out;
>          }
>  
> -        list_add(&pdev->domain_list, &hardware_domain->pdev_list);
> +        list_add(&pdev->domain_list, &pdev->domain->pdev_list);

It's not immediately obvious that pdev->domain couldn't have changed
by the time we make it here - did you check? I consider this possible
in principle, if e.g. in an error case the device got associated
with the quarantine domain.

> @@ -879,7 +879,7 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>      if ( ret )
>          goto out;
>  
> -    if ( pdev->domain == hardware_domain  )
> +    if ( is_hardware_domain(pdev->domain) )
>          pdev->quarantine = false;
>  
>      pdev->fault.count = 0;
> @@ -1403,7 +1403,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
>       * domain or dom_io then it must be assigned to a guest, or be
>       * hidden (owned by dom_xen).
>       */
> -    else if ( pdev->domain != hardware_domain &&
> +    else if ( !is_hardware_domain(pdev->domain) &&
>                pdev->domain != dom_io )
>          rc = -EBUSY;

May I ask that you split out such cleaning up of cases of open-coded
helpers into a separate (prereq) patch, especially when (like here)
the containing patch is already a pretty big one?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -475,6 +475,7 @@ struct domain
>  #define XSM_XENSTORE  (1U<<31) /* Xenstore: domain that can do privileged 
> operations on xenstore */
>  #define CLASSIC_DOM0_PRIVS (XSM_PLAT_CTRL | XSM_DOM_BUILD | XSM_DOM_SUPER | \
>               XSM_DEV_EMUL | XSM_HW_CTRL | XSM_HW_SUPER | XSM_XENSTORE)
> +#define CLASSIC_HWDOM_PRIVS (XSM_HW_CTRL | XSM_DEV_EMUL)

Oh, maybe I was wrong with saying that the same value gets put in
place in multiple locations. The fact that you start distinguishing
Dom0 and hwdom needs calling out in the description. I'm not
convinced of the inclusion of XSM_DEV_EMUL.

I also think CLASSIC_DOM0_PRIVS then should use CLASSIC_HWDOM_PRIVS
instead of re-enumerating what the latter contains, unless there's
a definitive plan for the latter to include bits the former
shouldn't include.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.