[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 23 Jan 2025 10:49:36 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Manuel Andreas <manuel.andreas@xxxxxx>
  • Delivery-date: Thu, 23 Jan 2025 09:49:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.01.2025 18:45, Roger Pau Monné wrote:
> On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote:
>> The MMIO cache is intended to have one entry used per independent memory
>> access that an insn does. This, in particular, is supposed to be
>> ignoring any page boundary crossing. Therefore when looking up a cache
>> entry, the access'es starting (linear) address is relevant, not the one
>> possibly advanced past a page boundary.
>>
>> In order for the same offset-into-buffer variable to be usable in
>> hvmemul_phys_mmio_access() for both the caller's buffer and the cache
>> entry's it is further necessary to have the un-adjusted caller buffer
>> passed into there.
>>
>> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page 
>> boundary")
>> Reported-by: Manuel Andreas <manuel.andreas@xxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> This way problematic overlaps are only reduced (to ones starting at the
>> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
>> go further - if a subsequent access is larger than an earlier one, but
>> the splitting results in a chunk to cross the end "boundary" of the
>> earlier access, an assertion will still trigger. Explicit memory
>> accesses (ones encoded in an insn by explicit or implicit memory
>> operands) match the assumption afaict (i.e. all those accesses are of
>> uniform size, and hence they either fully overlap or are mapped to
>> distinct cache entries).
>>
>> Insns accessing descriptor tables, otoh, don't fulfill these
>> expectations: The selector read (if coming from memory) will always be
>> smaller than the descriptor being read, and if both (insanely) start at
>> the same linear address (in turn mapping MMIO), said assertion will kick
>> in. (The same would be true for an insn trying to access itself as data,
>> as long as certain size "restrictions" between insn and memory operand
>> are met. Except that linear_read() disallows insn fetches from MMIO.) To
>> deal with such, I expect we will need to further qualify (tag) cache
>> entries, such that reads/writes won't use insn fetch entries, and
>> implicit-supervisor-mode accesses won't use entries of ordinary
>> accesses. (Page table accesses don't need considering here for now, as
>> our page walking code demands page tables to be mappable, implying
>> they're in guest RAM; such accesses also don't take the path here.)
>> Thoughts anyone, before I get to making another patch?
>>
>> Considering the insn fetch aspect mentioned above I'm having trouble
>> following why the cache has 3 entries. With insn fetches permitted,
>> descriptor table accesses where the accessed bit needs setting may also
>> fail because of that limited capacity of the cache, due to the way the
>> accesses are done. The read and write (cmpxchg) are independent accesses
>> from the cache's perspective, and hence we'd need another entry there.
>> If, otoh, the 3 entries are there to account for precisely this (which
>> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
>> mapped accesses by buffer offset"] not saying anything at all), then we
>> should be fine in this regard. If we were to permit insn fetches, which
>> way to overcome this (possibly by allowing the write to re-use the
>> earlier read's entry in this special situation) would remain to be
>> determined.
> 
> I'm not that familiar with the emulator logic for memory accesses, but
> it seems like we are adding more and more complexity and special
> casing.  Maybe it's the only way to go forward, but I wonder if there
> could be some other way to solve this.  However, I don't think I
> will have time to look into it, and hence I'm not going to oppose to
> your proposal.

I'll see what I can do; it's been quite a while, so I'll first need to
swap context back in.

> Are there however some tests, possibly XTF, that we could use to
> ensure the behavior of accesses is as we expect?

Manuel's report included an XTF test, which I expect will become a part
of XTF once this fix went in. I fear though that there is an issue
Andrew has been pointing out, which may prevent this from happening
right away (even if with osstest having disappeared that's now only a
latent issue, until gitlab CI would start exercising XTF): With the
issue unfixed on older trees (i.e. those remaining after this series
was backported as appropriate), the new test would fail there.

>> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
>>              return cache;
>>      }
>>  
>> -    if ( !create )
>> +    /*
>> +     * Bail if a new entry shouldn't be allocated, utilizing that ->space 
>> has
>                                                       ^rely on ->space having 
> ...
> Would be easier to read IMO.

Changed; I'm not overly fussed, yet at the same time I also don't really
agree with your comment.

>> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
>>  
>>  static int hvmemul_linear_mmio_access(
>>      unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
>> -    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
>> +    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
>> +    unsigned long start, bool known_gpfn)
> 
> I think start is a bit ambiguous, start_gla might be clearer (same
> below for the start parameter).

Fine with me - changed for all three hvmemul_linear_mmio_*(). It wasn't
clear to me whether you also meant the local variables in
linear_{read,write}(); since you said "parameter" I assumed you didn't.
If you did, I fear I'd be less happy to make the change there too, for
"addr" then preferably also wanting to change to "gla". Yet that would
cause undue extra churn.

>> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
>>       * an access that was previously handled as MMIO. Thus it is imperative 
>> that
>>       * we handle this access in the same way to guarantee completion and 
>> hence
>>       * clean up any interim state.
>> +     *
>> +     * Care must be taken, however, to correctly deal with crossing 
>> RAM/MMIO or
>> +     * MMIO/RAM boundaries. While we want to use a single cache entry 
>> (tagged
>> +     * by the starting linear address), we need to continue issuing (i.e. 
>> also
>> +     * upon replay) the RAM access for anything that's ahead of or past 
>> MMIO,
>> +     * i.e. in RAM.
>>       */
>> -    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
>> +    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
>> +    if ( !cache ||
>> +         addr + bytes <= start + cache->skip ||
>> +         addr >= start + cache->size )
> 
> Seeing as this bound checks is also used below, could it be a macro or
> inline function?
> 
> is_cached() or similar?

Hmm. Yes, it's twice the same expression, yet that helper would require
four parameters. That's a little too much for my taste; I'd prefer to
keep things as they are. After all there are far more redundancies between
the two functions.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.