[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2

>>> On 13.11.15 at 00:00, <ariel.atom2@xxxxxxxxxx> wrote:
> Am 12.11.15 um 17:43 schrieb Andrew Cooper:
>> On 12/11/15 14:29, Atom2 wrote:
>>> Hi Andrew,
>>> thanks for your reply. Answers are inline further down.
>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper:
>>>> On 12/11/15 12:52, Jan Beulich wrote:
>>>>>>>> On 12.11.15 at 02:08, <ariel.atom2@xxxxxxxxxx> wrote:
>>>>>> After the upgrade HVM domUs appear to no longer work - regardless
>>>>>> of the
>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV
>>>>>> domUs, however, work just fine as before on both dom0 kernels.
>>>>>> xl dmesg shows the following information after the first crashed HVM
>>>>>> domU which is started as part of the machine booting up:
>>>>>> [...]
>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest
>>>>>> state (0).
>>>>>> (XEN) ************* VMCS Area **************
>>>>>> (XEN) *** Guest State ***
>>>>>> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011,
>>>>>> gh_mask=ffffffffffffffff
>>>>>> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000,
>>>>>> gh_mask=ffffffffffffffff
>>>>>> (XEN) CR3: actual=0x0000000000800000, target_count=0
>>>>>> (XEN)      target0=0000000000000000, target1=0000000000000000
>>>>>> (XEN)      target2=0000000000000000, target3=0000000000000000
>>>>>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc)  RIP =
>>>>>> 0x0000000100000000 (0x0000000100000000)
>>>>> Other than RIP looking odd for a guest still in non-paged protected
>>>>> mode I can't seem to spot anything wrong with guest state.
>>>> odd? That will be the source of the failure.
>>>> Out of long mode, the upper 32bit of %rip should all be zero, and it
>>>> should not be possible to set any of them.
>>>> I suspect that the guest has exited for emulation, and there has been a
>>>> bad update to %rip.  The alternative (which I hope is not the case) is
>>>> that there is a hardware errata which allows the guest to accidentally
>>>> get it self into this condition.
>>>> Are you able to rerun with a debug build of the hypervisor?
>>> [snip]
>>> Another question is whether prior to enabling the debug USE flag it
>>> might make sense to re-compile with gcc-4.8.5 (please see my previous
>>> list reply) to rule out any compiler related issues. Jan, Andrew -
>>> what are your thoughts?
>> First of all, check whether the compiler makes a difference on 4.5.2
> Hi Andrew,
> I changed the compiler and there was no change to the better: 
> Unfortunately the HVM domU is still crashing with a similar error 
> message as soon as it is being started.
>> If both compiles result in a guest crashing in that manner, test a debug
>> Xen to see if any assertions/errors are encountered just before the
>> guest crashes.
> As the compiler did not make any difference, I enabled the debug USE 
> flag, re-compiled (using gcc-4.9.3), and rebooted using a serial console 
> to capture output. Unfortunately I did not get very far and things 
> become even stranger: This time the system did not even finnish the boot 
> process, but rather hard-stopped pretty early with a message reading 
> "Panic on CPU 3: DOUBLE FAULT -- system shutdown". The captured logfile 
> is attached as "serial log.txt".
> As this happened immediately after the CPU microcode update, I thought 
> there might be a connection and disabled the microcode update. After the 
> next reboot it seemed as if the boot process got a bit further as 
> evidenced by a few more lines in the log file (those between lines 136 
> and 197 in the second log file named "serial log no ucode.txt"), but in 
> the end it finnished off with an identical error message (only the CPU # 
> was different this time, but that number seems to change between boots 
> anyways).
> I hope that makes some sense to you.

Not really, other than now even more suspecting bad hardware or
something fundamentally wrong with your build. Did you retry with
a freshly built 4.5.1? Could you alternatively try with a known good
build of 4.5.2 (e.g. from osstest)?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.