Xen project Mailing List

Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 23 Jan 2025 10:49:36 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Manuel Andreas <manuel.andreas@xxxxxx>

Delivery-date: Thu, 23 Jan 2025 09:49:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.01.2025 18:45, Roger Pau Monné wrote: > On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote: >> The MMIO cache is intended to have one entry used per independent memory >> access that an insn does. This, in particular, is supposed to be >> ignoring any page boundary crossing. Therefore when looking up a cache >> entry, the access'es starting (linear) address is relevant, not the one >> possibly advanced past a page boundary. >> >> In order for the same offset-into-buffer variable to be usable in >> hvmemul_phys_mmio_access() for both the caller's buffer and the cache >> entry's it is further necessary to have the un-adjusted caller buffer >> passed into there. >> >> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page >> boundary") >> Reported-by: Manuel Andreas <manuel.andreas@xxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> This way problematic overlaps are only reduced (to ones starting at the >> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access() >> go further - if a subsequent access is larger than an earlier one, but >> the splitting results in a chunk to cross the end "boundary" of the >> earlier access, an assertion will still trigger. Explicit memory >> accesses (ones encoded in an insn by explicit or implicit memory >> operands) match the assumption afaict (i.e. all those accesses are of >> uniform size, and hence they either fully overlap or are mapped to >> distinct cache entries). >> >> Insns accessing descriptor tables, otoh, don't fulfill these >> expectations: The selector read (if coming from memory) will always be >> smaller than the descriptor being read, and if both (insanely) start at >> the same linear address (in turn mapping MMIO), said assertion will kick >> in. (The same would be true for an insn trying to access itself as data, >> as long as certain size "restrictions" between insn and memory operand >> are met. Except that linear_read() disallows insn fetches from MMIO.) To >> deal with such, I expect we will need to further qualify (tag) cache >> entries, such that reads/writes won't use insn fetch entries, and >> implicit-supervisor-mode accesses won't use entries of ordinary >> accesses. (Page table accesses don't need considering here for now, as >> our page walking code demands page tables to be mappable, implying >> they're in guest RAM; such accesses also don't take the path here.) >> Thoughts anyone, before I get to making another patch? >> >> Considering the insn fetch aspect mentioned above I'm having trouble >> following why the cache has 3 entries. With insn fetches permitted, >> descriptor table accesses where the accessed bit needs setting may also >> fail because of that limited capacity of the cache, due to the way the >> accesses are done. The read and write (cmpxchg) are independent accesses >> from the cache's perspective, and hence we'd need another entry there. >> If, otoh, the 3 entries are there to account for precisely this (which >> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory >> mapped accesses by buffer offset"] not saying anything at all), then we >> should be fine in this regard. If we were to permit insn fetches, which >> way to overcome this (possibly by allowing the write to re-use the >> earlier read's entry in this special situation) would remain to be >> determined. > > I'm not that familiar with the emulator logic for memory accesses, but > it seems like we are adding more and more complexity and special > casing. Maybe it's the only way to go forward, but I wonder if there > could be some other way to solve this. However, I don't think I > will have time to look into it, and hence I'm not going to oppose to > your proposal. I'll see what I can do; it's been quite a while, so I'll first need to swap context back in. > Are there however some tests, possibly XTF, that we could use to > ensure the behavior of accesses is as we expect? Manuel's report included an XTF test, which I expect will become a part of XTF once this fix went in. I fear though that there is an issue Andrew has been pointing out, which may prevent this from happening right away (even if with osstest having disappeared that's now only a latent issue, until gitlab CI would start exercising XTF): With the issue unfixed on older trees (i.e. those remaining after this series was backported as appropriate), the new test would fail there. >> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi >> return cache; >> } >> >> - if ( !create ) >> + /* >> + * Bail if a new entry shouldn't be allocated, utilizing that ->space >> has > ^rely on ->space having > ... > Would be easier to read IMO. Changed; I'm not overly fussed, yet at the same time I also don't really agree with your comment. >> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct >> >> static int hvmemul_linear_mmio_access( >> unsigned long gla, unsigned int size, uint8_t dir, void *buffer, >> - uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn) >> + uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, >> + unsigned long start, bool known_gpfn) > > I think start is a bit ambiguous, start_gla might be clearer (same > below for the start parameter). Fine with me - changed for all three hvmemul_linear_mmio_*(). It wasn't clear to me whether you also meant the local variables in linear_{read,write}(); since you said "parameter" I assumed you didn't. If you did, I fear I'd be less happy to make the change there too, for "addr" then preferably also wanting to change to "gla". Yet that would cause undue extra churn. >> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add >> * an access that was previously handled as MMIO. Thus it is imperative >> that >> * we handle this access in the same way to guarantee completion and >> hence >> * clean up any interim state. >> + * >> + * Care must be taken, however, to correctly deal with crossing >> RAM/MMIO or >> + * MMIO/RAM boundaries. While we want to use a single cache entry >> (tagged >> + * by the starting linear address), we need to continue issuing (i.e. >> also >> + * upon replay) the RAM access for anything that's ahead of or past >> MMIO, >> + * i.e. in RAM. >> */ >> - if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) ) >> + cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0); >> + if ( !cache || >> + addr + bytes <= start + cache->skip || >> + addr >= start + cache->size ) > > Seeing as this bound checks is also used below, could it be a macro or > inline function? > > is_cached() or similar? Hmm. Yes, it's twice the same expression, yet that helper would require four parameters. That's a little too much for my taste; I'd prefer to keep things as they are. After all there are far more redundancies between the two functions. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.