[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another S3 issue in Xen 4.14


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 1 Oct 2020 13:43:52 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 01 Oct 2020 12:44:04 +0000
  • Ironport-sdr: WOn/9cbOVMCFAfKu4pPAN5eUuxTIy+sxxlKrXa63dfN4I4+El8Xopr9kqIsWw3VWVNPGfa5f/4 KVsmg80Zt6i3+ECiXvIHraqirBMxs4fGDfTIK/rb3hZ1TUZGIKStIgvpmOv0S4upMuUhtc6wqn NeXYkgsEMyH2jisoeE/WKAYUoPxlZL9AzfCHKSzb3qgMXk9+PpRIrxpNzkuzEOgmn1cTyz2QFc lJtyBgMXfmhBfuI55I7keFkNecEDh1c+P8Tsh+VpQS4LOTNbw5YAZskL2vhlyc5OofrWdjoftU h7Q=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/10/2020 13:31, Marek Marczykowski-Górecki wrote:
> On Thu, Oct 01, 2020 at 01:59:32PM +0200, Jan Beulich wrote:
>> On 01.10.2020 03:12, Marek Marczykowski-Górecki wrote:
>>> After patching the previous issue ("x86/S3: Fix Shadow Stack resume
>>> path") I still encounter issues resume from S3.
>>> Since I had it working on Xen 4.13 on this particular hardware (Thinkpad
>>> P52), I bisected it and got this:
>>>
>>> commit 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1
>>> Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> Date:   Wed Dec 11 20:59:19 2019 +0000
>>>
>>>     x86/S3: Drop {save,restore}_rest_processor_state() completely
>>>     
>>>     There is no need to save/restore FS/GS/XCR0 state.  It will be handled
>>>     suitably on the context switch away from the idle.
>>>     
>>>     The CR4 restoration in restore_rest_processor_state() was actually 
>>> fighting
>>>     later code in enter_state() which tried to keep CR4.MCE clear until 
>>> everything
>>>     was set up.  Delete the intermediate restoration, and defer final 
>>> restoration
>>>     until after MCE is reconfigured.
>>>     
>>>     Restoring PAT can be done earlier, and ideally before paging is 
>>> enabled.  By
>>>     moving it into the trampoline during the setup for 64bit, the call can 
>>> be
>>>     dropped from cpu_init().  The EFI path boot path doesn't disable 
>>> paging, so
>>>     make the adjustment when switching onto Xen's pagetables.
>>>     
>>>     The only remaing piece of restoration is load_system_tables(), so 
>>> suspend.c
>>>     can be deleted in its entirety.
>>>     
>>>     Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>     Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>
>>> Parent of this commit suspends and resumes just fine. With this commit
>>> applied, it (I think) it panics, at least I get reboot after 5s. Sadly, I
>>> don't have serial console there.
>>>
>>> I tried also master and stable-4.14 with this commit reverted (and also
>>> the other fix applied), but it doesn't work. In this case I get a hang on
>>> resume (power led still flashing, but fan woke up). There are probably
>>> some other dependencies.
>> Since bisection may also point you at some intermediate breakage, which
>> these last results of yours seem to support, could you check whether
>> 55f8c389d434 put immediately on top of the above commit makes a difference,
>> and if so resume bisecting from there?
> Nope, 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1 with 55f8c389d434 on top
> it still hangs on resume.

Ok.  I'll see about breaking the change apart so we can bisect which
specific bit of code movement broke things.

~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.