Xen project Mailing List

Re: [PATCH 2/5] x86/HVM: allocate emulation cache entries dynamically

To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 6 Sep 2024 20:20:09 +0100

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 06 Sep 2024 19:20:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04/09/2024 2:29 pm, Jan Beulich wrote: > Both caches may need higher capacity, and the upper bound will need to > be determined dynamically based on CPUID policy (for AMX at least). Is this to cope with TILE{LOAD,STORE}, or something else? It's not exactly clear, even when looking at prior AMX series. > While touching the check in hvmemul_phys_mmio_access() anyway, also > tighten it: To avoid overrunning the internal buffer we need to take the > offset into the buffer into account. Does this really want to be mixed with a prep patch ? > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > This is a patch taken from the AMX series, which was part of the v3 > submission. All I did is strip out the actual AMX bits (from > hvmemul_cache_init()), plus of course change the description. As a > result some local variables there may look unnecessary, but this way > it's going to be less churn when the AMX bits are added. The next patch > pretty strongly depends on the changed approach (contextually, not so > much functionally), and I'd really like to avoid rebasing that one ahead > of this one, and then this one on top of that. Fine by me. > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -26,6 +26,18 @@ > #include <asm/iocap.h> > #include <asm/vm_event.h> > > +/* > + * We may read or write up to m512 or up to a tile row as a number of > + * device-model transactions. > + */ > +struct hvm_mmio_cache { > + unsigned long gla; > + unsigned int size; > + unsigned int space:31; > + unsigned int dir:1; > + uint8_t buffer[] __aligned(sizeof(long)); I know this is a minor tangent, but you are turning a regular struct into a flexible one. Could we introduce __counted_by() and start using it here? At the toolchain level, it lets the compiler understand the real size of the object, so e.g. the sanitisers can spot out-of-bounds accesses through the flexible member. But, even in the short term, having /* TODO */ # define __counted_by(member) in compiler.h still leaves us with better code, because struct hvm_mmio_cache { unsigned long gla; unsigned int size; unsigned int space:31; unsigned int dir:1; uint8_t buffer[] __aligned(sizeof(long)) __counted_by(size); }; is explicitly clear in a case where the "space" field creates some ambiguity. > @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char > int hvmemul_cache_init(struct vcpu *v) > { > /* > - * No insn can access more than 16 independent linear addresses (AVX512F > - * scatters/gathers being the worst). Each such linear range can span a > - * page boundary, i.e. may require two page walks. Account for each insn > - * byte individually, for simplicity. > + * AVX512F scatter/gather insns can access up to 16 independent linear > + * addresses, up to 8 bytes size. Each such linear range can span a page > + * boundary, i.e. may require two page walks. > + */ > + unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1); > + unsigned int i, max_bytes = 64; > + struct hvmemul_cache *cache; > + > + /* > + * Account for each insn byte individually, both for simplicity and to > + * leave some slack space. > */ Hang on. Do we seriously use a separate cache entry for each instruction byte ? ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.