[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV32: restore PAE-extended-CR3 logic


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 4 Apr 2023 12:31:31 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SK5yEcXmvmF9LNfOBdI39ph6figLkQWOz0P31QY8dBg=; b=odvfKTC28lKLHP5xiLycaN/exVvFShAjRlrNj5mQBB6hKk5a5lY24Bd7iJm2l4PNELNE4StN1AlQ8HuY/iJZdZGBUyW8w/lOMEln7ENs08eDpZvXZFWNjf+6aZC/zgAtfzH0e9Q3N8SrQQWbIOJjfZ/P6j8Eb1bJgk8ONmXorArF4b/+eKuAiPwt1bX0N6EmCYdR11SJ79NHH5CiWJtDt03QvPzjn7279v7ylkzKeh0+9LH3OTKV+fva//Qs6l298BKnO8RdCXpCax10kMCzTqyU6MhJ5plYZ3+MVHsyQd6MpLGJ9r6lAA/ADkRiRcmpZDwoz3LUmO0i8z1SjdkeVA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RPs6cxSRpxJ0evezqtVW3GhqqpF9JGlPRnTVvwbE5C1EmYgJfz51qB0WNAF2D/ztlv6pHuNsdxxoRmwvILwIoJFG1TtOXtrnQqrS7k5Vnvv9JhPTqvVhUsNm/3yngqifFXEbCowGYFJJvkhLpiHj71/8TFasf//gC8sh7FnFwecicQ7qGYKk0CbVJleLbgCdxizgYr/tq9/QL+NtfLXsJRR7XpVjl0u3iHsqVAy6nNzg1pKDrP6DAj35qBAMaAnOjxemCAzoDYn+Bo3fnGeUEOkVOiJ6VEoXdXFUTPDc8eXX0laobm8uT8LsdkTXGUXvECUqiUzCGJcPXL0864C9Wg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 04 Apr 2023 10:31:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.04.2023 12:12, Roger Pau Monné wrote:
> On Wed, Feb 15, 2023 at 03:54:11PM +0100, Jan Beulich wrote:
>> While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still
>> applies to guests also when run on a 64-bit hypervisor: The "extended
>> CR3" format has to be used there as well, to fit the address in the only
>> 32-bit wide register there. As a result it was a mistake that the check
>> was never enabled for that case, and was then mistakenly deleted in the
>> course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can assume
>> CONFIG_PAGING_LEVELS==4"]).
>>
>> Similarly during Dom0 construction kernel awareness needs to be taken
>> into account, and respective code was again mistakenly never enabled for
>> 32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by
>> 5d1181a5ea5e ["xen: Remove x86_32 build target"]).
>>
>> At the same time restrict enabling of the assist for Dom0 to just the
>> 32-bit case. Furthermore there's no need for an atomic update there.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> I was uncertain whether to add a check to the CR3 guest read path,
>> raising e.g. #GP(0) when the value read wouldn't fit but also may not
>> be converted to "extended" format (overflow is possible there in
>> principle because of the control tools "slack" in promote_l3_table()).
>>
>> In that context I was puzzled to find no check on the CR3 guest write
>> path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or any
>> of the low reserved ones) could observe anomalous behavior rather than
>> plain failure.
>>
>> As to a Fixes: tag - it's pretty unclear which of the many original
>> 32-on-64 changes to blame. I don't think the two cited commits should
>> be referenced there, as they didn't break anything that wasn't already
>> broken.
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_
>>      unsigned int   partial_flags = page->partial_flags;
>>      l3_pgentry_t   l3e = l3e_empty();
>>  
>> +    /*
>> +     * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not
>> +     * understand the weird 'extended cr3' format for dealing with 
>> high-order
>> +     * address bits. We cut some slack for control tools (before vcpu0 is
>> +     * initialised).
> 
> Don't we then need some check in the vCPU init path to assure that the
> cr3 is < 32bits if we allow those to initially be set?
> 
> Or will the initialization unconditionally overwrite any previous cr3
> value?

That's not the way I understand this "cut some slack". Instead I read it
to be meant to cover for the VM-assist bit not being set, yet. Beyond
that it is assumed to be tool stack's responsibility to constrain
addresses suitably. If it doesn't, it'll simply break the guest. (There
is some guessing on my part involved here, as the original introduction
of that code didn't further explain things.)

Nevertheless going beyond what was there originally might be desirable.
Yet it's not really clear to me when / how to carry out such further
checking. For example I don't fancy walking all of the domain's pages
when it's about to be unpaused for the first time.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.