[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/7] x86: Introduce x86_decode_lite()


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 8 Oct 2024 10:16:09 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 08 Oct 2024 08:16:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.10.2024 19:17, Andrew Cooper wrote:
> On 07/10/2024 1:56 pm, Jan Beulich wrote:
>> On 02.10.2024 17:27, Andrew Cooper wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/x86/x86_emulate/decode-lite.c
>>> @@ -0,0 +1,311 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +
>>> +#include "private.h"
>>> +
>>> +#define Imm8   (1 << 0)
>>> +#define Imm    (1 << 1)
>>> +#define Moffs  (1 << 2)
>>> +#define Branch (1 << 5) /* ... that we care about */
>>> +/*      ModRM  (1 << 6) */
>>> +#define Known  (1 << 7)
>>> +
>>> +#define ALU_OPS                                 \
>>> +    (Known|ModRM),                              \
>>> +    (Known|ModRM),                              \
>>> +    (Known|ModRM),                              \
>>> +    (Known|ModRM),                              \
>>> +    (Known|Imm8),                               \
>>> +    (Known|Imm)
>>> +
>>> +static const uint8_t init_or_livepatch_const onebyte[256] = {
>>> +    [0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR  */
>>> +    [0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */
>>> +    [0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */
>>> +    [0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */
>>> +/*  [0x40 ... 0x4f] = REX prefixes */
>> For these and other aspects further down, may I ask for a comment at the
>> top of the file setting the scope for this new function (and its
>> associated data) as being strictly 64-bit only? And perhaps even strictly
>> covering only what Xen may legitimately use (largely excluding APX for
>> the foreseeable future, i.e. until such time that we might decide to
>> allow Xen itself to use APX throughout its code).
>>
>> Besides APX, with more immediate uses in mind, I wonder about e.g.
>> BMI/BMI2 insns, which live outside the one/two-byte maps.
> 
> They're not needed yet, and it would require extra decode complexity.
> 
>> I would further appreciate if we could be consistent with the mentioning
>> (or not) of prefixes: The REX ones here are the only ones that the table
>> mentions right now. In fact I wonder whether a Prefix attribute wouldn't
>> be nice to have, so you don't need to open-code all of them in the
>> function itself.
> 
> The comment about REX prefixes is only here because you insisted on it.

Did I? Looking back at my v1 replies I can't spot such a comment. But anyway,
what I'd like to ask for is consistency: Mention all (relevant) prefixes, or
none of them.

>>> +    [0x6c ... 0x6f] = (Known),             /* INS/OUTS */
>>> +
>>> +    [0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */
>>> +    [0x80]          = (Known|ModRM|Imm8),  /* Grp1 */
>>> +    [0x81]          = (Known|ModRM|Imm),   /* Grp1 */
>>> +
>>> +    [0x83]          = (Known|ModRM|Imm8),  /* Grp1 */
>>> +    [0x84 ... 0x8e] = (Known|ModRM),       /* TEST/XCHG/MOV/MOV-SREG/LEA 
>>> r/rm */
>>> +
>>> +    [0x90 ... 0x99] = (Known),             /* NOP/XCHG rAX/CLTQ/CQTO */
>> Omitting PUSH (at 0x8f) is somewhat odd, considering that it's a pretty
>> "normal" insn.
> 
> Except it's not.  It's the XOP prefix too, and would require extra
> decode complexity.

Well, just like you don't decode VEX/EVEX, you could opt not to decode XOP
despite decoding this PUSH form. Otherwise I'm inclined to ask that no PUSH
or POP should be decoded.

>>> +    [0xc6]          = (Known|ModRM|Imm8),  /* Grp11, Further ModRM decode 
>>> */
>>> +    [0xc7]          = (Known|ModRM|Imm),   /* Grp11, Further ModRM decode 
>>> */
>>> +
>>> +    [0xcb ... 0xcc] = (Known),             /* LRET/INT3 */
>>> +    [0xcd]          = (Known|Imm8),        /* INT $imm8 */
>> No IRET, when you have things like e.g. ICEBP and CLTS?
> 
> The absence of IRET is intentional, because it can't be used safely. 
> SYSRET/EXIT are excluded too for consistency.
> 
> IRET can be added if/when it is needed, and someone has figured out a
> way of adjusting the exception table entry.

If it has an attached exception table entry. If it doesn't, I think its
use would in principle be fine. And an attached exception table entry could
similarly be a problem for any other insn, couldn't it? The same argument
would in particular apply to LRET, I think.

Anyway, once again my main concern here are apparent inconsistencies. Any
of them will make it more difficult to judge what needs / doesn't need
adding here going forward.

>>> +};
>>> +
>>> +/*
>>> + * Bare minimum x86 instruction decoder to parse the alternative 
>>> replacement
>>> + * instructions and locate the IP-relative references that may need 
>>> updating.
>>> + *
>>> + * These are:
>>> + *  - disp8/32 from near branches
>>> + *  - RIP-relative memory references
>>> + *
>>> + * The following simplifications are used:
>>> + *  - All code is 64bit, and the instruction stream is safe to read.
>>> + *  - The 67 prefix is not implemented, so the address size is only 64bit.
>> As to the earlier remark - maybe this part of the comment could simply
>> move to the top of the file?
> 
> I really don't want to split the comment.  It needs to all live together.
> 
> Given this is a single-function file, I also don't see the need for this
> comment to move away from here.  You can't interpret the decode tables
> without reading the function.

I'm torn; I think having some fundamental information at the top of the file
would help. Yet keeping commentary together is also a valid aspect.

>>> + * Inputs:
>>> + *  @ip  The position to start decoding from.
>>> + *  @end End of the replacement block.  Exceeding this is considered an 
>>> error.
>>> + *
>>> + * Returns: x86_decode_lite_t
>>> + *  - On failure, length of 0.
>>> + *  - On success, length > 0.  For rel_sz > 0, rel points at the relative
>>> + *    field in the instruction stream.
>>> + */
>>> +x86_decode_lite_t init_or_livepatch x86_decode_lite(void *ip, void *end)
>> Imo both pointers would be nice to be to-const.
> 
> In v1, you also identified why that wouldn't compile.

Hmm, right. It's been a while. Would it be too much to ask that you add a
sentence to that effect to the description?

>>> +{
>>> +    void *start = ip, *rel = NULL;
>>> +    unsigned int opc, rel_sz = 0;
>>> +    uint8_t b, d, rex = 0, osize = 4;
>>> +
>>> +#define OPC_TWOBYTE (1 << 8)
>>> +
>>> +    /* Mutates IP, uses END. */
>>> +#define FETCH(ty)                                       \
>>> +    ({                                                  \
>>> +        ty _val;                                        \
>>> +                                                        \
>>> +        if ( (ip + sizeof(ty)) > end )                  \
>>> +            goto overrun;                               \
>>> +        _val = *(ty *)ip;                               \
>>> +        ip += sizeof(ty);                               \
>>> +        _val;                                           \
>>> +    })
>>> +
>>> +    for ( ;; ) /* Prefixes */
>>> +    {
>>> +        switch ( b = FETCH(uint8_t) )
>>> +        {
>>> +        case 0x26: /* ES override */
>>> +        case 0x2e: /* CS override */
>>> +        case 0x36: /* DS override */
>>> +        case 0x3e: /* SS override */
>>> +        case 0x64: /* FS override */
>>> +        case 0x65: /* GS override */
>> If you don't like the idea of a Prefix attribute
> 
> I don't like the idea of making this intentionally dissimilar to the
> main decoder, just to safe a few lines of source code.
> 
> GCC optimises it into a bitmap anyway.

Well, okay then.

>>  I wonder in how far we
>> actually need all of the above, when you already ...
>>
>>> +        case 0xf0: /* LOCK */
>>> +        case 0xf2: /* REPNE */
>>> +        case 0xf3: /* REP */
>>> +            break;
>>> +
>>> +        case 0x66: /* Operand size override */
>>> +            osize = 2;
>>> +            break;
>>> +
>>> +        /* case 0x67: Address size override, not implemented */
>> ... deliberately leave of this one.
> 
> Excluding 67 is intentional because it a) has no business being used,
> and b) adds a huge amount of decode complexity.

I largely agree with a), but why b)? The only difference in decode is
how to treat moffs. In any event - the comments here again were because
things looks somewhat inconsistent the way they are right now. As that's
deliberate, it's perhaps tolerable.

> Whereas two of segment prefixes are already necessary to decode the
> alternatives we have today.
>>> +        case 0x40 ... 0x4f: /* REX */
>>> +            rex = b;
>>> +            continue;
>>> +
>>> +        default:
>>> +            goto prefixes_done;
>>> +        }
>>> +        rex = 0; /* REX cancelled by subsequent legacy prefix. */
>>> +    }
>>> + prefixes_done:
>>> +
>>> +    if ( rex & 0x08 ) /* REX.W */
>> Can you please use REX_W here?
> 
> Oh, it is available.  Ok.
> 
>>
>>> +        osize = 8;
>>> +
>>> +    /* Fetch the main opcode byte(s) */
>>> +    if ( b == 0x0f )
>>> +    {
>>> +        b = FETCH(uint8_t);
>>> +        opc = OPC_TWOBYTE | b;
>>> +
>>> +        d = twobyte[b];
>>> +    }
>>> +    else
>>> +    {
>>> +        opc = b;
>>> +        d = onebyte[b];
>>> +    }
>>> +
>>> +    if ( unlikely(!(d & Known)) )
>>> +        goto unknown;
>>> +
>>> +    if ( d & ModRM )
>>> +    {
>>> +        uint8_t modrm = FETCH(uint8_t);
>>> +        uint8_t mod = modrm >> 6;
>>> +        uint8_t reg = (modrm >> 3) & 7;
>>> +        uint8_t rm = modrm & 7;
>>> +
>>> +        /* ModRM/SIB decode */
>>> +        if ( mod == 0 && rm == 5 ) /* RIP relative */
>>> +        {
>>> +            rel = ip;
>>> +            rel_sz = 4;
>>> +            FETCH(uint32_t);
>>> +        }
>>> +        else if ( mod != 3 && rm == 4 ) /* SIB */
>>> +        {
>>> +            uint8_t sib = FETCH(uint8_t);
>>> +            uint8_t base = sib & 7;
>>> +
>>> +            if ( mod == 0 && base == 5 )
>>> +                goto disp32;
>>> +        }
>>> +
>>> +        if ( mod == 1 ) /* disp8 */
>>> +            FETCH(uint8_t);
>>> +        else if ( mod == 2 ) /* disp32 */
>>> +        {
>>> +        disp32:
>>> +            FETCH(uint32_t);
>> The values aren't used, so the types don't matter overly much, yet int8_t
>> and int32_t would be a more accurate representation of what's being
>> fetched.
> 
> Why does that matter?  I'm skipping a number of bytes, not interpreting
> the result.

It matters from a doc pov only at this point. When one simply reads this
code, one may wonder "why unsigned". Just like I did.

>>> +        }
>>> +
>>> +        /* ModRM based decode adjustements */
>>> +        switch ( opc )
>>> +        {
>>> +        case 0xc7: /* Grp11 XBEGIN is a branch. */
>>> +            if ( modrm == 0xf8 )
>>> +                d |= Branch;
>>> +            break;
>>> +        case 0xf6: /* Grp3 TEST(s) have extra Imm8 */
>>> +            if ( reg == 0 || reg == 1 )
>>> +                d |= Imm8;
>>> +            break;
>>> +        case 0xf7: /* Grp3 TEST(s) have extra Imm */
>>> +            if ( reg == 0 || reg == 1 )
>>> +                d |= Imm;
>>> +            break;
>>> +        }
>> In this switch() you don't distinguish 1- and 2-byte maps at all.
> 
> See OPC_TWOBYTE.  They are distinguished here.

Oh, I've overlooked that, sorry.

>>> +    }
>>> +
>>> +    if ( d & Branch )
>>> +    {
>>> +        /*
>>> +         * We don't tolerate 66-prefixed call/jmp in alternatives.  Some 
>>> are
>>> +         * genuinely decoded differently between Intel and AMD CPUs.
>>> +         *
>>> +         * We also don't support APX instructions, so don't have to cope 
>>> with
>>> +         * JMPABS which is the first branch to have an 8-byte immediate.
>>> +         */
>>> +        if ( osize < 4 )
>>> +            goto bad_osize;
>>> +
>>> +        rel = ip;
>>> +        rel_sz = (d & Imm8) ? 1 : 4;
>>> +    }
>>> +
>>> +    if ( d & (Imm | Imm8 | Moffs) )
>>> +    {
>>> +        if ( d & Imm8 )
>>> +            osize = 1;
>>> +        else if ( d & Moffs )
>>> +            osize = 8;
>>> +        else if ( osize == 8 && !(opc >= 0xb8 && opc <= 0xbf) )
>> Again want to also take the opcode map into account, even if - by luck -
>> this would work as is for now.
>>
>> Also could I talk you into converting the two comparisons into one, along
>> the lines of "(opc | 7) != 0xbf"?
> 
> That's the kind of obfuscation which should be left to the compiler.

If only it actually did. I didn't check recently, but last I checked they
still didn't take opportunities like this.

>>> --- a/xen/arch/x86/x86_emulate/private.h
>>> +++ b/xen/arch/x86/x86_emulate/private.h
>>> @@ -9,7 +9,9 @@
>>>  #ifdef __XEN__
>>>  
>>>  # include <xen/bug.h>
>>> +# include <xen/init.h>
>>>  # include <xen/kernel.h>
>>> +# include <xen/livepatch.h>
>>>  # include <asm/endbr.h>
>>>  # include <asm/msr-index.h>
>>>  # include <asm/x86-vendors.h>
>> It's only the new file that needs these - can we limit the dependencies
>> to just that one by putting these new #include-s there?
> 
> Not if you want the userspace harness in patch 2 to compile.

This is inside #ifdef __XEN__, so how can the userspace harness'es build
matter? Of course the #include-s, once moved, would need to be inside a
similar #ifdef.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.