[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel panic with 2.6.32-30 under network activity



It happens again a few minutes ago. It is the same kernel stack each time (alignment check: 0000 [#1] SMP etc ...)

The dom0 where all the faulty domU are running is a dual Xeon 5420 so 8 real cores available.
20 domUs are running on it, 35 vcpus are set up, is that too much ? The bug happens randomly on domUs
I was running the same config with xen3.2 without any issue.


It may be related, no issue with 2.6.24, and issue with 2.6.32.


2011/3/16 Jan Beulich <JBeulich@xxxxxxxxxx>
>>> On 16.03.11 at 11:11, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Wed, 2011-03-16 at 09:34 +0000, Jan Beulich wrote:
>> >>> On 16.03.11 at 04:20, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
>> > On Thu, Mar 10, 2011 at 12:25:55PM +0100, Olivier Hanesse wrote:
>> >> [469390.126691] alignment check: 0000 [#1] SMP
>> >
>> > aligment check? Was there anything else in the log before this? Was there
>> > anything in the Dom0 log?
>>
>> This together with
>>
>> >> [469390.126795] RSP: e02b:ffff88001ec3f9b8  EFLAGS: 00050286
>>
>> makes me wonder if either eflags got restored from a corrupted
>> stack slot somewhere, or whether something in the kernel or one
>> of the modules intentionally played with EFLAGS.AC.
>
> Can a PV kernel running in ring-3 change AC?

Yes. We had this problem until we cleared the flag in
create_bounce_frame().

> The Intel manual says "They should not be modified by application
> programs" over a list including AC but the list also includes e.g. IOPL
> and IF so I suspect it meant "can not" rather than "should not"? In
> which case it can't happen by accident.

No, afaik "should not" is the correct term.

> The hypervisor appears to clear the guest's EFLAGS.AC on context switch
> to a guest and failsafe bounce but not in e.g. do_iret so it's not
> entirely clear what his policy is...

do_iret() isn't increasing privilege, and hence restoring whatever
the outer context of iret had in place is correct. The important
thing is that on the transition to kernel mode the flag must always
get cleared (which I think has been the case since the problem
in create_bounce_frame() was fixed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.