[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion



A few more data points: I also tested Xen 4.6 on VMware ESXi 5.5, and
it yields similar results. Not surprising, since Fusion uses basically
the same virtualization engine.

However, ESXi offers many more choices of number of processors, number
of cores, hyperthreading, etc. The weird processor ID assignment (0,
2, 4, 6, ...) occurs only with 4 or 8 processors, 1 core per socket,
and no hyperthreading. If I change any of these parameters, the
processor IDs become sequential.

It appears in the 4- and 8-processor cases, VMware is emulating
something like a Xeon E7340:
https://github.com/deater/test_proc/blob/master/x86_64/x86_64.intel.6.15.11.xeon_e7340

In fact someone asked a question about running Xen on this platform
way back when: 
http://lists.xenproject.org/archives/html/xen-users/2008-05/msg00691.html

Others of similar vintage assign processor IDs 0 and 3 on a
2-processor system:
https://www.centos.org/forums/viewtopic.php?t=30255

or even 0 and 6: http://serverfault.com/questions/302429/interpreting-cpuinfo

So there are real hardware platforms with non-sequential processor
IDs. They are quite ancient and don't support CAT, but that doesn't
rule out the possibility of a newer or future platform behaving
similarly.

At least there is no evidence of a platform assigning extremely large
processor IDs; until then we are safe using arrays and bitmaps. The
issue is sizing these data structures appropriately.

--Ed


On Wed, Nov 25, 2015 at 1:04 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 25.11.15 at 08:48, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
>> On Tue, Nov 24, 2015 at 03:34:45AM -0700, Jan Beulich wrote:
>>> Chao, could you - inside Intel - please check whether there are
>>> any assumptions on the respective CPUID leaf output that aren't
>>> explicitly stated in the SDM right now (like resulting in contiguous
>>> socket numbers), and ask for them getting made explicit (if there
>>> are any), or it being made explicit that no assumptions at all are
>>> to be made at all on the presented values
>>
>> Actually there is already such statement in SDM (ch8.9.1, vol3):
>>
>> "The value of valid APIC_IDs need not be contiguous across package
>> boundary or core boundaries".
>
> That's a statement on APIC ID space (which necessarily can't be
> contiguous on systems with a non-power-of-2 core count), but I
> was asking about the socket ID space.
>
>>> (in which case we'd
>>> have to consume MADT parsing data in set_nr_sockets(), e.g.
>>> by replacing num_processors there with one more than the
>>> maximum APIC ID of any non-disabled CPU)?
>>
>> Even with this, we still have problem for hotplug case, the inserted
>> CPU may have a APIC_ID bigger than the maximum APIC_ID here.
>>
>> But let's back to the real world. Most machines that support CAT should
>> have continuous SOCKET_ID so it's not a problem. Giving that CAT is the
>> only feature uses this, I guess this suggestion might be better than
>> other solutions in practice.
>
> And we could actually cater for that by extrapolating the value
> added to cover disabled_cpus.
>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.