[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3 of 4] KEXEC: Allocate crash structures in low memory



On 09/03/12 15:52, Keir Fraser wrote:
> On 09/03/2012 14:42, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>
>> On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such
>> as the CPU crash notes will go into the xenheap, which tends to be in
>> upper memory.  This causes problems on machines with more than 64GB
>> (or 4GB if no PAE support) of ram as the crashkernel physically cant
>> access the crash notes.
> Why wouldn't you run a 64-bit crashkernel?
>
>  -- Keir

There is certainly an argument to be made along those lines, and I
personally lean to that side of the argument.  However, as our dom0
kernel is 32bits, there is no substantial testing of the drivers built
for 64bit.

Given the number of driver errors we have to fix in 32bit mode, and the
number of errors which only show in the kdump environment, the
predominant view is that using a 64bit kdump kernel with 32bit dom0 is
just asking for extra trouble in an area which is not easy to test in
the first place.

As a result, we will not be updating to a 64bit kdump kernel until we
update to a 64bit dom0 kernel.  Along with that comes many performance
considerations, meaning that it is not going to happen soon.

~Andrew

>> The solution is to force Xen to allocate certain structures in lower
>> memory.  This is achieved by introducing two new command line
>> parameters; low_crashinfo and crashinfo_maxaddr.  Because of the
>> potential impact on 32bit PV guests, and that this problem does not
>> exist for 64bit dom0 on 64bit Xen, this new functionality defaults to
>> the codebase's previous behavior, requiring the user to explicitly
>> add extra command line parameters to change the behavior.
>>
>> This patch consists of 3 logically distinct but closely related
>> changes.
>>  1) Add the two new command line parameters.
>>  2) Change crash note allocation to use lower memory when instructed.
>>  3) Change the conring buffer to use lower memory when instructed.
>>
>> There result is that the crash notes and console ring will be placed
>> in lower memory so useful information can be recovered in the case of
>> a crash.
>>
>> Changes since v1:
>>  -  Patch xen-command-line.markdown to document new options
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>
>> diff -r 1e79fb200722 -r 3669bd0ac6b9 docs/misc/xen-command-line.markdown
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -188,6 +188,13 @@ The optional trailing `x` indicates that
>>  ### cpuid\_mask\_xsave\_eax
>>  ### cpuidle
>>  ### cpuinfo
>> +### crashinfo_maxaddr
>> +> `= <size>`
>> +
>> +> Default: `4G`
>> +
>> +Specify the maximum address to allocate certain strucutres, if used in
>> combination with the `low_crashinfo` command line option.
>> +
>>  ### crashkernel
>>  ### credit2\_balance\_over
>>  ### credit2\_balance\_under
>> @@ -273,6 +280,13 @@ Set the logging level for Xen.  Any log
>>
>>  The optional `<rate-limited level>` options instructs which severities 
>> should
>> be rate limited.
>>
>> +### low\_crashinfo
>> +> `= none | min | all`
>> +
>> +> Default: `none` if not specified at all, or to `min` if `low\_crashinfo` 
>> is
>> present without qualification.
>> +
>> +This option is only useful for hosts with a 32bit dom0 kernel, wishing to 
>> use
>> kexec functionality in the case of a crash.  It represents which data
>> structures should be deliberatly allocated in low memory, so the crash kernel
>> may find find them.  Should be used in combination with `crashinfo_maxaddr`.
>> +
>>  ### max\_cstate
>>  ### max\_gsi\_irqs
>>  ### maxcpus
>> diff -r 1e79fb200722 -r 3669bd0ac6b9 xen/arch/x86/setup.c
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -586,6 +586,10 @@ void __init __start_xen(unsigned long mb
>>      }
>>      cmdline_parse(cmdline);
>>
>> +    /* Must be after command line argument parsing and before
>> +     * allocing any xenheap structures wanted in lower memory. */
>> +    kexec_early_calculations();
>> +
>>      parse_video_info();
>>
>>      set_current((struct vcpu *)0xfffff000); /* debug sanity */
>> diff -r 1e79fb200722 -r 3669bd0ac6b9 xen/common/kexec.c
>> --- a/xen/common/kexec.c
>> +++ b/xen/common/kexec.c
>> @@ -36,7 +36,9 @@ bool_t kexecing = FALSE;
>>  typedef struct { Elf_Note * start; size_t size; } crash_note_range_t;
>>  static crash_note_range_t * crash_notes;
>>
>> -/* Lock to prevent race conditions when allocating the crash note buffers. 
>> */
>> +/* Lock to prevent race conditions when allocating the crash note buffers.
>> + * It also serves to protect calls to alloc_from_crash_heap when allocating
>> + * crash note buffers in lower memory. */
>>  static DEFINE_SPINLOCK(crash_notes_lock);
>>
>>  static Elf_Note *xen_crash_note;
>> @@ -62,6 +64,24 @@ static struct {
>>      unsigned long size;
>>  } ranges[16] __initdata;
>>
>> +/* Low crashinfo mode.  Start as INVALID so serveral codepaths can set up
>> + * defaults without needing to know the state of the others. */
>> +enum low_crashinfo low_crashinfo_mode = LOW_CRASHINFO_INVALID;
>> +
>> +/* This value is only considered if low_crash_mode is set to MIN or ALL, so
>> + * setting a default here is safe. Default to 4GB.  This is because the
>> current
>> + * KEXEC_CMD_get_range compat hypercall trucates 64bit pointers to 32 bits.
>> The
>> + * typical usecase for crashinfo_maxaddr will be for 64bit Xen with 32bit
>> dom0
>> + * and 32bit crash kernel. */
>> +static paddr_t __initdata crashinfo_maxaddr = 4ULL << 30;
>> +
>> +/* = log base 2 of crashinfo_maxaddr after checking for sanity. Default to
>> + * larger than the entire physical address space. */
>> +paddr_t crashinfo_maxaddr_bits = 64;
>> +
>> +/* Pointers to keep track of the crash heap region. */
>> +static void *crash_heap_current = NULL, *crash_heap_end = NULL;
>> +
>>  /*
>>   * Parse command lines in the format
>>   *
>> @@ -140,6 +160,58 @@ static void __init parse_crashkernel(con
>>  }
>>  custom_param("crashkernel", parse_crashkernel);
>>
>> +/* Parse command lines in the format:
>> + *
>> + *   low_crashinfo=[none,min,all]
>> + *
>> + * - none disables the low allocation of crash info.
>> + * - min will allocate enough low information for the crash kernel to be 
>> able
>> + *       to extract the hypervisor and dom0 message ring buffers.
>> + * - all will allocate additional structures such as domain and vcpu structs
>> + *       low so the crash kernel can perform an extended analysis of state.
>> + */
>> +static void __init parse_low_crashinfo(const char * str)
>> +{
>> +
>> +    if ( !strlen(str) )
>> +        /* default to min if user just specifies "low_crashinfo" */
>> +        low_crashinfo_mode = LOW_CRASHINFO_MIN;
>> +    else if ( !strcmp(str, "none" ) )
>> +        low_crashinfo_mode = LOW_CRASHINFO_NONE;
>> +    else if ( !strcmp(str, "min" ) )
>> +        low_crashinfo_mode = LOW_CRASHINFO_MIN;
>> +    else if ( !strcmp(str, "all" ) )
>> +        low_crashinfo_mode = LOW_CRASHINFO_ALL;
>> +    else
>> +    {
>> +        printk("Unknown low_crashinfo parameter '%s'.  Defaulting to 
>> min.\n",
>> str);
>> +        low_crashinfo_mode = LOW_CRASHINFO_MIN;
>> +    }
>> +}
>> +custom_param("low_crashinfo", parse_low_crashinfo);
>> +
>> +/* Parse command lines in the format:
>> + *
>> + *   crashinfo_maxaddr=<addr>
>> + *
>> + * <addr> will be rounded down to the nearest power of two.  Defaults to 64G
>> + */
>> +static void __init parse_crashinfo_maxaddr(const char * str)
>> +{
>> +    u64 addr;
>> +
>> +    /* if low_crashinfo_mode is unset, default to min. */
>> +    if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID )
>> +        low_crashinfo_mode = LOW_CRASHINFO_MIN;
>> +
>> +    if ( (addr = parse_size_and_unit(str, NULL)) )
>> +        crashinfo_maxaddr = addr;
>> +    else
>> +        printk("Unable to parse crashinfo_maxaddr. Defaulting to %p\n",
>> +               (void*)crashinfo_maxaddr);
>> +}
>> +custom_param("crashinfo_maxaddr", parse_crashinfo_maxaddr);
>> +
>>  void __init set_kexec_crash_area_size(u64 system_ram)
>>  {
>>      unsigned int idx;
>> @@ -298,6 +370,20 @@ static size_t sizeof_cpu_notes(const uns
>>      return bytes;
>>  }
>>
>> +/* Allocate size_t bytes of space from the previously allocated
>> + * crash heap if the user has requested that crash notes be allocated
>> + * in lower memory.  There is currently no case where the crash notes
>> + * should be free()'d. */
>> +static void * alloc_from_crash_heap(const size_t bytes)
>> +{
>> +    void * ret;
>> +    if ( crash_heap_current + bytes > crash_heap_end )
>> +        return NULL;
>> +    ret = (void*)crash_heap_current;
>> +    crash_heap_current += bytes;
>> +    return ret;
>> +}
>> +
>>  /* Allocate a crash note buffer for a newly onlined cpu. */
>>  static int kexec_init_cpu_notes(const unsigned long cpu)
>>  {
>> @@ -312,7 +398,10 @@ static int kexec_init_cpu_notes(const un
>>          return ret;
>>
>>      nr_bytes = sizeof_cpu_notes(cpu);
>> -    note = xmalloc_bytes(nr_bytes);
>> +
>> +    /* If we dont care about the position of allocation, malloc. */
>> +    if ( low_crashinfo_mode == LOW_CRASHINFO_NONE )
>> +        note = xmalloc_bytes(nr_bytes);
>>
>>      /* Protect the write into crash_notes[] with a spinlock, as this 
>> function
>>       * is on a hotplug path and a hypercall path. */
>> @@ -330,6 +419,11 @@ static int kexec_init_cpu_notes(const un
>>      }
>>      else
>>      {
>> +        /* If we care about memory possition, alloc from the crash heap,
>> +         * also protected by the crash_notes_lock. */
>> +        if ( low_crashinfo_mode > LOW_CRASHINFO_NONE )
>> +            note = alloc_from_crash_heap(nr_bytes);
>> +
>>          crash_notes[cpu].start = note;
>>          crash_notes[cpu].size = nr_bytes;
>>          spin_unlock(&crash_notes_lock);
>> @@ -389,6 +483,18 @@ static struct notifier_block cpu_nfb = {
>>      .notifier_call = cpu_callback
>>  };
>>
>> +void __init kexec_early_calculations(void)
>> +{
>> +    /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor
>> +     * "crashinfo_maxaddr" have been specified on the command line, so
>> +     * explicitly set to NONE. */
>> +    if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID )
>> +        low_crashinfo_mode = LOW_CRASHINFO_NONE;
>> +
>> +    if ( low_crashinfo_mode > LOW_CRASHINFO_NONE )
>> +        crashinfo_maxaddr_bits = fls(crashinfo_maxaddr) - 1;
>> +}
>> +
>>  static int __init kexec_init(void)
>>  {
>>      void *cpu = (void *)(unsigned long)smp_processor_id();
>> @@ -399,6 +505,29 @@ static int __init kexec_init(void)
>>
>>      register_keyhandler('C', &crashdump_trigger_keyhandler);
>>
>> +    if ( low_crashinfo_mode > LOW_CRASHINFO_NONE )
>> +    {
>> +        size_t crash_heap_size;
>> +
>> +        /* This calculation is safe even if the machine is booted in
>> +         * uniprocessor mode. */
>> +        crash_heap_size = sizeof_cpu_notes(0) +
>> +            sizeof_cpu_notes(1) * (nr_cpu_ids - 1);
>> +        crash_heap_size = PAGE_ALIGN(crash_heap_size);
>> +
>> +        crash_heap_current = alloc_xenheap_pages(
>> +            get_order_from_bytes(crash_heap_size),
>> +            MEMF_bits(crashinfo_maxaddr_bits) );
>> +
>> +        if ( ! crash_heap_current )
>> +            return -ENOMEM;
>> +
>> +        crash_heap_end = crash_heap_current + crash_heap_size;
>> +    }
>> +
>> +    /* crash_notes may be allocated anywhere Xen can reach in memory.
>> +       Only the individual CPU crash notes themselves must be allocated
>> +       in lower memory if requested. */
>>      crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids);
>>      if ( ! crash_notes )
>>          return -ENOMEM;
>> diff -r 1e79fb200722 -r 3669bd0ac6b9 xen/drivers/char/console.c
>> --- a/xen/drivers/char/console.c
>> +++ b/xen/drivers/char/console.c
>> @@ -596,7 +596,7 @@ void __init console_init_postirq(void)
>>          opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh);
>>
>>      order = get_order_from_bytes(max(opt_conring_size, conring_size));
>> -    while ( (ring = alloc_xenheap_pages(order, 0)) == NULL )
>> +    while ( (ring = alloc_xenheap_pages(order,
>> MEMF_bits(crashinfo_maxaddr_bits))) == NULL )
>>      {
>>          BUG_ON(order == 0);
>>          order--;
>> diff -r 1e79fb200722 -r 3669bd0ac6b9 xen/include/xen/kexec.h
>> --- a/xen/include/xen/kexec.h
>> +++ b/xen/include/xen/kexec.h
>> @@ -25,6 +25,19 @@ void set_kexec_crash_area_size(u64 syste
>>  #define KEXEC_IMAGE_CRASH_BASE   2
>>  #define KEXEC_IMAGE_NR           4
>>
>> +enum low_crashinfo {
>> +    LOW_CRASHINFO_INVALID = 0,
>> +    LOW_CRASHINFO_NONE = 1,
>> +    LOW_CRASHINFO_MIN = 2,
>> +    LOW_CRASHINFO_ALL = 3
>> +};
>> +
>> +/* Low crashinfo mode.  Start as INVALID so serveral codepaths can set up
>> + * defaults without needing to know the state of the others. */
>> +extern enum low_crashinfo low_crashinfo_mode;
>> +extern paddr_t crashinfo_maxaddr_bits;
>> +void kexec_early_calculations(void);
>> +
>>  int machine_kexec_load(int type, int slot, xen_kexec_image_t *image);
>>  void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image);
>>  void machine_kexec_reserved(xen_kexec_reserve_t *reservation);
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.