[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



Thanks for taking the time to reply.

I'm out of the office today, so don't have direct access to the
machine in question until tomorrow... but I'll do my best to answer
(inline below) and I'll follow up tomorrow with concrete answers.

On Wed, Aug 8, 2012 at 4:35 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 07.08.12 at 22:14, Ben Guthro <ben@xxxxxxxxxx> wrote:
>> Any suggestions on how best to chase this down?
>>
>> The first S3 suspend/resume cycle works, but the second does not.
>>
>> On the second try, I never get any interrupts delivered to ahci.
>> (at least according to /proc/interrupts)
>>
>>
>> syslog traces from the first (good) and the second (bad) are attached,
>> as well as the output from the "*" debug Ctrl+a handler in both cases.
>
> You should have provided this also for the state before the
> first suspend. The state after the first resume already looks
> corrupted (presumably just not as badly):

I'll be able to send this tomorrow.

>
> (XEN) PCI-MSI interrupt information:
> (XEN)  MSI    26 vec=71 lowest  edge   assert  log lowest dest=00000001 
> mask=0/1/-1
> (XEN)  MSI    27 vec=00  fixed  edge deassert phys lowest dest=00000001 
> mask=0/1/-1
>                      ^^
> (XEN)  MSI    28 vec=29 lowest  edge   assert  log lowest dest=00000001 
> mask=0/1/-1
> (XEN)  MSI    29 vec=79 lowest  edge   assert  log lowest dest=00000001 
> mask=0/1/-1
> (XEN)  MSI    30 vec=81 lowest  edge   assert  log lowest dest=00000001 
> mask=0/1/-1
> (XEN)  MSI    31 vec=99 lowest  edge   assert  log lowest dest=00000001 
> mask=0/1/-1
>
> so this is likely the reason for thing falling apart on the second
> iteration:
>
> (XEN)   Interrupt Remapping: supported and enabled.
> (XEN)   Interrupt remapping table (nr_entry=0x10000. Only dump P=1 entries 
> here):
> (XEN)        SVT  SQ   SID      DST  V  AVL DLM TM RH DM FPD P
> (XEN)   0000:  1   0  f0f8 00000001 38    0   1  0  1  1   0 1
> ...
> (XEN)   0014:  1   0  00d8 00000001 a1    0   1  0  1  1   0 1
> (XEN)   0015:  1   0  00fa 00000001 00    0   0  0  0  0   0 1
>                                               ^     ^  ^
> (XEN)   0016:  1   0  f0f8 00000001 31    0   1  1  1  1   0 1
> (XEN)   0017:  1   0  00a0 00000001 a9    0   1  0  1  1   0 1
> (XEN)   0018:  1   0  0200 00000001 b1    0   1  0  1  1   0 1
> (XEN)   0019:  1   0  00c8 00000001 c9    0   1  0  1  1   0 1
>
> Surprisingly in both cases we get (with the other vector fields varying
> accordingly)
>
> (XEN)    IRQ:  26 affinity:0001 vec:71 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:279(-S--),
> (XEN)    IRQ:  27 affinity:0001 vec:21 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:278(-S--),
>                                     ^^
> (XEN)    IRQ:  28 affinity:0001 vec:29 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:277(-S--),
> (XEN)    IRQ:  29 affinity:0001 vec:79 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:276(-S--),
> (XEN)    IRQ:  30 affinity:0001 vec:81 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:275(PS--),
> (XEN)    IRQ:  31 affinity:0001 vec:99 type=PCI-MSI         status=00000010 
> in-flight=0 domain-list=0:274(PS--),
>
> The interrupt in question belongs to 0000:00:1f.2, i.e. the
> AHCI contoller.

This would be consistent with what I've observed.

>
> Unfortunately I can't make sense of the kernel side config space
> restore messages - an offset of 1 gets reported for the device in
> question (and various other odd offsets exist), yet 3.5's
> drivers/pci/pci.c:pci_restore_config_space_range() calls
> pci_restore_config_dword() with an offset that's always divisible
> by 4. Could you clarify which kernel version you were using here?
> We first need to determine whether the kernel corrupts something
> (after all, config space isn't protected from Dom0 modifications) -
> if that's the case, we may need to understand why older Xen was
> immune against that. If that's not the case, adding some extra
> logging to Xen's pci_restore_msi_state() would seem the best
> first step, plus (maybe) logging of Dom0 post-resume config space
> accesses to the device in question.

This particular failure is using linux-3.2.23 + some of Konrad's
branches that haven't been merged into mainline (s3 branches, are
probably the most appropriate here)

>
> The most likely thing happening (though unclear where) is that
> the corresponding struct msi_msg instance gets cleared in the
> course of the first resume (but after the corresponding interrupt
> remapping entry already got restored).
>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.