[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 22 May 2024 09:52:44 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 22 May 2024 07:52:49 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
> long gla,
>          goto out_put_gfn;
>      }
>  
> +    if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present &&
> +         subpage_mmio_write_accept(mfn, gla) &&

Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n?

> +         (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) )
> +    {
> +        rc = 1;
> +        goto out_put_gfn;
> +    }

Overall this new if() is pretty similar to the immediate preceding one.
So similar that I wonder whether the two shouldn't be folded. In fact
it looks as if the new one is needed only for the case where you'd pass
through (to a DomU) a device partially used by Xen. That could certainly
do with mentioning explicitly.

> +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry)

Considering what the function does and what it returns, perhaps better
s/get/map/? The "get_page" part of the name generally has a different
meaning in Xen's memory management.

> +{
> +    void __iomem *mapped_page;
> +
> +    if ( entry->mapped )
> +        return entry->mapped;
> +
> +    mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE);
> +
> +    spin_lock(&subpage_ro_lock);
> +    /* Re-check under the lock */
> +    if ( entry->mapped )
> +    {
> +        spin_unlock(&subpage_ro_lock);
> +        iounmap(mapped_page);

The only unmap is on an error path here and on another error path elsewhere.
IOW it looks as if devices with such marked pages are meant to never be hot
unplugged. I can see that being intentional for the XHCI console, but imo
such a restriction also needs prominently calling out in a comment next to
e.g. the function declaration.

> +        return entry->mapped;
> +    }
> +
> +    entry->mapped = mapped_page;
> +    spin_unlock(&subpage_ro_lock);
> +    return entry->mapped;
> +}
> +
> +static void subpage_mmio_write_emulate(
> +    mfn_t mfn,
> +    unsigned int offset,
> +    const void *data,
> +    unsigned int len)
> +{
> +    struct subpage_ro_range *entry;
> +    void __iomem *addr;

Wouldn't this better be pointer-to-volatile, with ...

> +    list_for_each_entry(entry, &subpage_ro_ranges, list)
> +    {
> +        if ( mfn_eq(entry->mfn, mfn) )
> +        {
> +            if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) )
> +            {
> + write_ignored:
> +                gprintk(XENLOG_WARNING,
> +                        "ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len 
> %u\n",
> +                        mfn_x(mfn), offset, len);
> +                return;
> +            }
> +
> +            addr = subpage_mmio_get_page(entry);
> +            if ( !addr )
> +            {
> +                gprintk(XENLOG_ERR,
> +                        "Failed to map page for MMIO write at 
> 0x%"PRI_mfn"%03x\n",
> +                        mfn_x(mfn), offset);
> +                return;
> +            }
> +
> +            switch ( len )
> +            {
> +            case 1:
> +                writeb(*(const uint8_t*)data, addr);
> +                break;
> +            case 2:
> +                writew(*(const uint16_t*)data, addr);
> +                break;
> +            case 4:
> +                writel(*(const uint32_t*)data, addr);
> +                break;
> +            case 8:
> +                writeq(*(const uint64_t*)data, addr);
> +                break;

... this being how it's written? (If so, volatile suitably carried through to
other places as well.)

> +            default:
> +                /* mmio_ro_emulated_write() already validated the size */
> +                ASSERT_UNREACHABLE();
> +                goto write_ignored;
> +            }
> +            return;
> +        }
> +    }
> +    /* Do not print message for pages without any writable parts. */
> +}
> +
> +bool subpage_mmio_write_accept(mfn_t mfn, unsigned long gla)
> +{
> +    unsigned int offset = PAGE_OFFSET(gla);
> +    const struct subpage_ro_range *entry;
> +
> +    list_for_each_entry_rcu(entry, &subpage_ro_ranges, list)

Considering the other remark about respective devices impossible to go
away, is the RCU form here really needed? Its use gives the (false)
impression of entry removal being possible.

> +        if ( mfn_eq(entry->mfn, mfn) &&
> +             !test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) )

Btw, "qwords" in the field name is kind of odd when SUBPAGE_MMIO_RO_ALIGN
in principle suggests that changing granularity ought to be possible by
simply adjusting that #define. Maybe "->ro_elems"?

> --- a/xen/arch/x86/pv/ro-page-fault.c
> +++ b/xen/arch/x86/pv/ro-page-fault.c
> @@ -330,6 +330,7 @@ static int mmio_ro_do_page_fault(struct x86_emulate_ctxt 
> *ctxt,
>              return X86EMUL_UNHANDLEABLE;
>      }
>  
> +    mmio_ro_ctxt.mfn = mfn;
>      ctxt->data = &mmio_ro_ctxt;
>      if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, 
> &mmio_ro_ctxt.bdf) )
>          return x86_emulate(ctxt, &mmcfg_intercept_ops);

Wouldn't you better set .mfn only on the "else" path, just out of context?
Suggesting that the new field in the struct could actually overlay the
(seg,bdf) tuple (being of relevance only to MMCFG intercept handling).
This would be more for documentation purposes than to actually save space.
(If so, perhaps the "else" itself would also better be dropped while making
the adjustment.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.