[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32


  • To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
  • Date: Sun, 24 Oct 2021 22:20:34 -0700
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zRw+C1ltqjv8mAnE2AA7Jc4EUdGxJvW5/xO0T7uXZ2c=; b=fXcesXo0NxgQMTJsNAD08/YG9AjPi6CXUoRWUOdIM0oQOCHYwrwu7d5+l2sHUV8OEBsKsV8FCDmoSTo7LoNIAwDXBWFZHrssOTAzxO47pq3+h5QwnxLaaczhiBZ30I6DEWI+Au2MIoI88oLmy/Ptnhx5G3S5Uu0u3eE6bSvGOWZ6ypp94X/DCdbDEkYoERTEfypbjikjS9g1km155I4Cy7nsoIoPESbkThAL11CBX0nyabZFMo2ViYQlcFU1OeF2xCaPxXzoBqOZHPSiiyHyYbwJgm3gnVWoyiEJyRsgHUqp3mxwlRAPiy+u137XIftLt9YSdnljbmZIBi1jnebH5A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FkLXV7leeT9UAFGgz4CSqc8e/oxSRsrel8Xb3+rKcABtckT67sHm2mXF27jnxm3/SBKqvFwvWFXSpMyoFVTj088zP284vOcBAabZbqdfg/H2pe4NRLb/lwMq+xwvlT1iD13T2aSC4CBsurF4/Yi7hOEE1y1S1Ftwy9eVTVZudPojcyMne3ypeBy+/0rmeC7Nc7+N4dM8/qqEuyB2RBkStd/+YLNxmFkPbxeATMbcoL+gYJYQbcMlkzE/5nwTSWdF750e35jsELo4i6GJxMdkmpEzAEpxIcU02BWOVfO8Xq0bF79PsgvrPubOZwzv50+UssL3hrW2qXZDb3MiX5fqxQ==
  • Authentication-results: oracle.com; dkim=none (message not signed) header.d=none;oracle.com; dmarc=none action=none header.from=oracle.com;
  • Cc: linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, jgross@xxxxxxxx, sstabellini@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, bp@xxxxxxxxx, hpa@xxxxxxxxx, andrew.cooper3@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, iwj@xxxxxxxxxxxxxx, jbeulich@xxxxxxxx, julien@xxxxxxx, wl@xxxxxxx, joe.jin@xxxxxxxxxx
  • Delivery-date: Mon, 25 Oct 2021 05:21:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Boris,

On 10/12/21 10:17 AM, Boris Ostrovsky wrote:
> 
> On 10/12/21 3:24 AM, Dongli Zhang wrote:
>> The sched_clock() can be used very early since upstream
>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>> with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
>> time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
>> when accessing &__this_cpu_read(xen_vcpu)->time as in below:
> 
> 
> Please drop "upstream". It's always upstream here.
> 
> 
>> +
>> +    /*
>> +     * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
>> +     * and the VM would use them until xen_vcpu_setup() is used to
>> +     * allocate/relocate them at arbitrary address.
>> +     *
>> +     * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
>> +     * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
>> +     * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
>> +     *
>> +     * Therefore we delay xen_hvm_init_time_ops() to
>> +     * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
>> +     */
>> +    if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> 
> 
> What about always deferring this when panicing? Would that work?
> 
> 
> Deciding whether to defer based on cpu number feels a bit awkward.
> 
> 
> -boris
> 

I did some tests and I do not think this works well. I prefer to delay the
initialization only for VCPU >= 32.

This is the syslog if we always delay xen_hvm_init_time_ops(), regardless
whether VCPU >= 32.

[    0.032372] Booting paravirtualized kernel on Xen HVM
[    0.032376] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.037683] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64
nr_node_ids:2
[    0.041876] percpu: Embedded 49 pages/cpu s162968 r8192 d29544 u262144

--> There is a clock backwards from 0.041876 to 0.000010.

[    0.000010] Built 2 zonelists, mobility grouping on.  Total pages: 2015744
[    0.000012] Policy zone: Normal
[    0.000014] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-rc6xen+
root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8
console=tty1 crashkernel=512M-:192M


This is because the initial pv_sched_clock is native_sched_clock(), and it
switches to xen_sched_clock() in xen_hvm_init_time_ops(). Is it fine to always
have a clock backward for non-kdump kernel?

To avoid the clock backward, we may register a dummy clocksource which always
returns 0, before xen_hvm_init_time_ops(). I do not think this is reasonable.

Thank you very much!

Dongli Zhang



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.