[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2-ish] x86/boot: Factor move_xen() out of __start_xen()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Tue, 20 Dec 2022 20:56:39 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LVDBNLaEpEcfsn73oF+/ncq9ZWaheeiOylu8e+AjMNc=; b=aVMC42npUcAJv4cI5lKPYhEqtzDkXiBRnfNOizACh+WokFwbbU2TzCe9JjAVi0MTwYUvkctFGgakwiP7pmqMCNofKIY5vqs3nGI7fX/wQ48SS8AvSlY1uHAmXxTV1iHTyeQ3NezdoKrw0yBhFZd88Zogmsabu9REAP+HGGySs6mygo21PDEfU/UHuvkbysJv79YGTOKGvJiFr3J2x5CKzdOo2ZvcrfTjUC4AuebWTmERA4Xqszn+B6VIAdX3fiGm49yzseNW2lxqCSc17RwB7xiLX7323Ovh3HCoaBOJxFXp4rcqOCeiynZ3IrqyuLuCzCf8p2su5hKIo01YIVteCQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GkIK8IYkqIsPl0ki1xdZkHS2sF4tIQ26uqQmIhNIVNi4+RcdHuz2S27nOBim7daD/dM3xomF2KNujmwK+xTVdWBGr+6L9itdOWE0sQ1Q2aNpHLieZ1oNvH+QFGWEDPE5grIvSCA+04BEcaqpP3jUNHV5V2Uwik9dYcIQyG11AkQmppUTgSQWHpHYebzqM5AYQHjdG4PmjVJN6FNrlJ3Upu/7zjYK26F3HaGOupJRQmhM1g8Dn2EqfXHqBVTNLXFR711O6OQ/P4mvRevTm/jP5IAcMgup82y27+EuKbeTAR9XuGpUTDhUYL6M0V5UZ06ud39MDir0LdES+h/v0F50bw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 20 Dec 2022 20:57:04 +0000
  • Ironport-data: A9a23:sS49GKMhEvJfSbDvrR2HlsFynXyQoLVcMsEvi/4bfWQNrUok0GRVn GdNXTqFbv2PY2D9KY92aY2z9kMP75DUnNNgGgto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQA+KmU4YoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvzrRC9H5qyo4mpC5ARmPpingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0s1HHXly3 qYVFHNXUTKknvKpx6vgceY506zPLOGzVG8ekldJ6GiBSNwAHtXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+vJxujCPpOBy+OGF3N79U9qGX8hK2G2fo XrL5T/RCRAGLt2PjzGC9xpAg8efzXOlBtJKSNVU8NZKkQPDlz1JESETWFiXrqKXo1SuS8xQf hl8Fi0G6PJaGFaQZtv3UgC8oXWElgUBQNcWGOo/gCmSzoLE7gDfAXILJhZRZdpjuMIoSDgC0 l6Sg8ivFTFpqKeSS3+W6vGTtzzaBMQOBWoLZCtBRw1a5dDm+dk3lkiWFoolF7OphNroHz222 yqNsCU1m7QUi4gMyrm/+lfExTmro/AlUzII2+keZUr9hisRWWJvT9XABYTzhRqYELukcw==
  • Ironport-hdrordr: A9a23:rCvgTq+vVJJF2mSF68Buk+AuI+orL9Y04lQ7vn2ZKSY5TiVXra CTdZUgpHnJYVMqMk3I9uruBEDtex3hHNtOkOss1NSZLW7bUQmTXeJfBOLZqlWNJ8S9zJ856U 4JScND4bbLfDxHZKjBgTVRE7wbsaa6GKLDv5ah85+6JzsaGp2J7G1Ce3am+lUdfng+OXKgfq Dsm/auoVCbCAwqR/X+PFYpdc7ZqebGkZr3CCR2eyLOuGG1/EiVAKeRKWnj4isj
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZEYuAyVgzwEEad0Cj70OyJmEML6520VMAgAB2soA=
  • Thread-topic: [PATCH v2-ish] x86/boot: Factor move_xen() out of __start_xen()

On 20/12/2022 1:51 pm, Jan Beulich wrote:
> On 16.12.2022 21:17, Andrew Cooper wrote:
>> Partly for clarity because there is a lot of subtle magic at work here.
>> Expand the commentary of what's going on.
>>
>> Also, because there is no need to double copy the stack (32kB) to retrieve
>> local values spilled to the stack under the old alias, when all of the
>> aforementioned local variables are going out of scope anyway.
>>
>> There is also a logic change when walking l2_xenmap[].  The old logic would
>> skip recursing into 4k mappings; this would corrupt Xen if it were used.
>> There are no 4k mappings in l2_xenmap[] this early on boot, so assert the
>> property instead of leaving a latent breakage.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> CC: Wei Liu <wl@xxxxxxx>
>>
>> We probably want to drop PGE from XEN_MINIMAL_CR4.  It will simplify several
>> boot paths which don't need to care about global pages, and PGE is 
>> conditional
>> anyway now with the PCID support.
> Perhaps, especially if - as you say - this allows for simplification.
>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -467,6 +467,113 @@ static void __init move_memory(
>>      }
>>  }
>>  
>> +static void __init noinline move_xen(void)
>> +{
>> +    l4_pgentry_t *pl4e;
>> +    l3_pgentry_t *pl3e;
>> +    l2_pgentry_t *pl2e;
>> +    unsigned long tmp;
>> +    int i, j, k;
> Can these become "unsigned int" please at this occasion?

Fixed.

> (Perhaps as
> another reason to make the change, mention that the old instances of i and
> j did shadow outer scope variables in the same function?)

Oh, so it does.  I'm slightly surprised that we haven't seen a compiler
objection from that.

>
>> +    /*
>> +     * The caller has selected xen_phys_start, ensuring that the old and new
>> +     * locations do not overlap.  Make it so.
> As a non-native speaker I'm struggling with what "Make it so" is supposed
> to tell me here.

"Actually do it"

I'll drop it.  It's not overly important to the understanding here.

>
>> +     * Prevent the compiler from reordering anything across this point.  
>> Such
>> +     * things will end badly.
>> +     */
>> +    barrier();
>> +
>> +    /*
>> +     * Copy out of the current alias, into the directmap at the new 
>> location.
>> +     * This includes a snapshot of the current stack.
>> +     */
>> +    memcpy(__va(__pa(_start)), _start, _end - _start);
>> +
>> +    /*
>> +     * We are now in a critical region.  Any write (modifying a global,
>> +     * spilling a local to the stack, etc) via the current alias will get 
>> lost
>> +     * when we switch to the new pagetables.  This even means no printk()'s
>> +     * debugging purposes.
> Nit: Missing "for"?

Fixed.

>
>> +     * Walk the soon-to-be-used pagetables in the new location, relocating 
>> all
>> +     * intermediate (non-leaf) entries to point to their new-location
>> +     * equivalents.  All writes are via the directmap alias.
>> +     */
>> +    pl4e = __va(__pa(idle_pg_table));
>> +    for ( i = 0 ; i < L4_PAGETABLE_ENTRIES; i++, pl4e++ )
>> +    {
>> +        if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
>> +            continue;
>> +
>> +        *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) + xen_phys_start);
>> +        pl3e = __va(l4e_get_paddr(*pl4e));
>> +        for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ )
>> +        {
>> +            if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
>> +                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
>> +                continue;
>> +
>> +            *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start);
>> +            pl2e = __va(l3e_get_paddr(*pl3e));
>> +            for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ )
>> +            {
>> +                if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
>> +                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
>> +                    continue;
>> +
>> +                *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + 
>> xen_phys_start);
>> +            }
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Walk the soon-to-be-used l2_xenmap[], relocating all the leaf 
>> mappings
>> +     * so text/data/bss etc refer to the new location in memory.
>> +     */
>> +    pl2e = __va(__pa(l2_xenmap));
>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ )
>> +    {
>> +        if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
>> +            continue;
>> +
>> +        /*
>> +         * We don't have 4k mappings in l2_xenmap[] at this point in boot, 
>> nor
>> +         * is this anticipated to change.  Simply assert the fact, rather 
>> than
>> +         * introducing dead logic to decend into l1 tables.
> Nit: "descend"?

Fixed.

>
>> +         */
>> +        ASSERT(l2e_get_flags(*pl2e) & _PAGE_PSE);
> The warning about the use of printk() around here could make one think
> that using ASSERT() (or BUG()) is similarly bad. However, aiui using
> printk() isn't by itself a problem, just that the log message would be
> lost as far as the circular buffer goes. The message would be observable
> on the serial con sole though, at least with "sync_console". It is on
> this basis that using ASSERT() here makes sense. Perhaps the printk()
> related comment could be slightly adjusted to express this?

I did try to describe it, and it gets complicated, so I opted for the
simple approach.

Before the pagetable switch, it will operate "properly" and emit text
onto the screen, serial, etc.

After the pagetable switch, all Xen data will be lost, which includes
the main console buffer, but also things like the local state for Xen's
UART driver.  I gave up trying to reason what would happen at that point.


The ASSERT() is very much a best-effort attempt.  Anyone liable to trip
the assert is already working on the boot pagetable and ought to be aware.

>
>> +        *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start);
>> +    }
>> +
>> +    /*
>> +     * Switch to relocated pagetables, shooting down global mappings as we 
>> go.
>> +     */
>> +    asm volatile (
>> +        "mov    %%cr4, %[cr4]\n\t"
>> +        "andb   $~%c[pge], %b[cr4]\n\t"
>> +        "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 0 */
>> +        "mov    %[cr3], %%cr3\n\t"     /* CR3 = new pagetables */
>> +        "orb    $%c[pge], %b[cr4]\n\t"
> I understand you need %c to apply the ~ in the operand of ANDB above.
> But could I talk you into keeping things as simple as possible here
> by using plain %[pge]?

The $ is still useful in the second case, for consistency if nothing else.

Both of these instances disappear if PGE gets cleaned up on boot.

>
>> +        "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 1 */
>> +        : [cr4] "=&a" (tmp) /* Could be "r", but "a" makes better asm */
>> +        : [cr3] "r" (__pa(idle_pg_table)),
>> +          [pge] "i" (X86_CR4_PGE)
>> +        : "memory" );
> The removed stack copying worries me, to be honest. Yes, for local
> variables of ours it doesn't matter because they are about to go out
> of scope. But what if the compiler instantiates any for its own use,
> e.g. ...
>
>> +    /*
>> +     * End of the critical region.  Updates to locals and globals now work 
>> as
>> +     * expected.
>> +     *
>> +     * Updates to local variables which have been spilled to the stack since
>> +     * the memcpy() have been lost.  But we don't care, because they're all
>> +     * going out of scope imminently.
>> +     */
>> +
>> +    printk("New Xen image base address: %#lx\n", xen_phys_start);
> ... the result of the address calculation for the string literal
> here? Such auxiliary calculations can happen at any point in the
> function, and won't necessarily (hence the example chosen) become
> impossible for the compiler to do with the memory clobber in the
> asm(). And while the string literal's address is likely cheap
> enough to calculate right in the function invocation sequence (and
> an option would also be to retain the printk() in the caller),
> other instrumentation options could be undermined by this as well.

Right now, the compiler is free to calculate the address of the string
literal in a register, and move it the other side of the TLB flush. 
This will work just fine.

However, the compiler cannot now, or ever in the future, spill such a
calculation to the stack.

Whether it's spelt "memory", or something else in the future, OSes
really do genuinely need a way of telling the compiler "you literally
cannot move anything the other side of this asm()", because otherwise
malfunctions will occur.

It's not the locals which worry me - we really do lose things like
coverage data in here.  Short of writing it fully in asm (which would be
irritating to maintain), I don't see any other option.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.