[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] x86/mm: add API for marking only part of a MMIO page read only



On Wed, Jul 05, 2023 at 10:23:53AM +0200, Jan Beulich wrote:
> On 01.07.2023 00:28, Marek Marczykowski-Górecki wrote:
> > On Tue, May 30, 2023 at 01:56:34PM +0200, Jan Beulich wrote:
> >> On 05.05.2023 23:25, Marek Marczykowski-Górecki wrote:
> >>> --- a/xen/arch/x86/hvm/hvm.c
> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >>> @@ -1990,6 +1990,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, 
> >>> unsigned long gla,
> >>>          goto out_put_gfn;
> >>>      }
> >>>  
> >>> +    if ( (p2mt == p2m_mmio_direct) && npfec.write_access && 
> >>> npfec.present &&
> >>> +         subpage_mmio_write_accept(mfn, gla) &&
> >>> +         (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) )
> >>> +    {
> >>> +        rc = 1;
> >>> +        goto out_put_gfn;
> >>> +    }
> >>
> >> But npfec.write_access set doesn't mean it was a write permission
> >> violation, does it? 
> > 
> > Doesn't it? IIUC it means it was a write attempt, to a mapped page
> > (npfec.present), and since we've got EPT violation, it got denied. 
> 
> But the denial may have been for reasons other than the W bit being
> clear, at least in principle? Abusing the bit now, even if in
> practice there was no other possible reason on existing hardware
> with the features we presently use, might lead to hard to locate
> issues in case a different reason appears down the road.

Ok, so how do you propose to check if it was a write violation?

(...)

> >> Since you mark the qwords which are to be protected, how is one to set
> >> up safely two discontiguous ranges on the same page? I think I had
> >> asked for v1 already why you don't do things the other way around:
> >> Initially the entire page is protected, and then writable regions are
> >> carved out.
> > 
> > Because that's not how the API is used. This API is for those how want
> > to write-protect some specific ranges (to be written exclusively by
> > Xen). They don't need to even know what is else is on the same page.
> > Take XHCI case as an example: it gets the range to write-protect by
> > enumerating XHCI extended capabilities, which is a linked list. The
> > driver gets info where XHCI console registers are.  Things before/after
> > them on that page may not even be XHCI extended caps at all.
> > This in fact is very similar approach to already existing
> > mmio_ro_ranges.
> 
> While I agree there's a similarity, multiple entities caring about the
> same MFN isn't an expected scenario there. Whereas here you explicitly
> add support for such.
> 
> Furthermore you sub-divide pages covered by mmio_ro_ranges here, so
> defaulting to "full page protected" and then carving out sub-regions
> would be the more natural approach imo.

But then the API would be awkward to use. Instead of "mark this (smaller
than a page) region as read-only" so Xen can use it safely, you would
(likely) need marking _two_ regions as writable, after marking a page as
read-only. So, either you'd need separate (3?) calls, array of ranges or
something similar.

> >> I guess I shouldn't further ask about overlapping r/o ranges and their
> >> cleaning up. But at least a comment towards the restriction would be
> >> nice. Perhaps even check upon registration that no part of the range
> >> is already marked r/o.
> > 
> > Yes, this is a good suggestion, I'll add that.
> 
> As long as all restrictions are properly spelled out, this may be
> sufficient. But please don't be surprised if I ask again on a
> subsequent version.

(...)

> >>> +static void subpage_mmio_write_emulate(
> >>> +    mfn_t mfn,
> >>> +    unsigned int offset,
> >>> +    const void *data,
> >>> +    unsigned int len)
> >>> +{
> >>> +    struct subpage_ro_range *entry;
> >>> +    void __iomem *addr;
> >>> +
> >>> +    rcu_read_lock(&subpage_ro_rcu);
> >>> +
> >>> +    list_for_each_entry_rcu(entry, &subpage_ro_ranges, list)
> >>> +    {
> >>> +        if ( mfn_eq(entry->mfn, mfn) )
> >>> +        {
> >>> +            if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, 
> >>> entry->ro_qwords) )
> >>> +                goto write_ignored;
> >>
> >> I think you can get away with just a single "goto" by putting the gprintk()
> >> (and its label) here.
> > 
> > write_ignored label is used also below in "default" case (which should
> > be unreachable, but still).
> 
> Right, which is why I said 'just a single "goto"' (implying a label would
> still be needed).
> 
> > Do you suggest jumping from default case into here?
> 
> Yes, that would be one way of structuring things.

IMO jumping into a middle of some nested conditional/loop is harder to
follow, I'd prefer to avoid such practice. Furthermore, if moving the
gprintk here, it still needs "goto out_unlock", so it isn't really
saving any "goto", just changing its target.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.