[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another S3 issue in Xen 4.14

  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 1 Oct 2020 13:43:52 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 01 Oct 2020 12:44:04 +0000
  • Ironport-sdr: WOn/9cbOVMCFAfKu4pPAN5eUuxTIy+sxxlKrXa63dfN4I4+El8Xopr9kqIsWw3VWVNPGfa5f/4 KVsmg80Zt6i3+ECiXvIHraqirBMxs4fGDfTIK/rb3hZ1TUZGIKStIgvpmOv0S4upMuUhtc6wqn NeXYkgsEMyH2jisoeE/WKAYUoPxlZL9AzfCHKSzb3qgMXk9+PpRIrxpNzkuzEOgmn1cTyz2QFc lJtyBgMXfmhBfuI55I7keFkNecEDh1c+P8Tsh+VpQS4LOTNbw5YAZskL2vhlyc5OofrWdjoftU h7Q=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/10/2020 13:31, Marek Marczykowski-Górecki wrote:
> On Thu, Oct 01, 2020 at 01:59:32PM +0200, Jan Beulich wrote:
>> On 01.10.2020 03:12, Marek Marczykowski-Górecki wrote:
>>> After patching the previous issue ("x86/S3: Fix Shadow Stack resume
>>> path") I still encounter issues resume from S3.
>>> Since I had it working on Xen 4.13 on this particular hardware (Thinkpad
>>> P52), I bisected it and got this:
>>> commit 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1
>>> Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> Date:   Wed Dec 11 20:59:19 2019 +0000
>>>     x86/S3: Drop {save,restore}_rest_processor_state() completely
>>>     There is no need to save/restore FS/GS/XCR0 state.  It will be handled
>>>     suitably on the context switch away from the idle.
>>>     The CR4 restoration in restore_rest_processor_state() was actually 
>>> fighting
>>>     later code in enter_state() which tried to keep CR4.MCE clear until 
>>> everything
>>>     was set up.  Delete the intermediate restoration, and defer final 
>>> restoration
>>>     until after MCE is reconfigured.
>>>     Restoring PAT can be done earlier, and ideally before paging is 
>>> enabled.  By
>>>     moving it into the trampoline during the setup for 64bit, the call can 
>>> be
>>>     dropped from cpu_init().  The EFI path boot path doesn't disable 
>>> paging, so
>>>     make the adjustment when switching onto Xen's pagetables.
>>>     The only remaing piece of restoration is load_system_tables(), so 
>>> suspend.c
>>>     can be deleted in its entirety.
>>>     Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>     Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>> Parent of this commit suspends and resumes just fine. With this commit
>>> applied, it (I think) it panics, at least I get reboot after 5s. Sadly, I
>>> don't have serial console there.
>>> I tried also master and stable-4.14 with this commit reverted (and also
>>> the other fix applied), but it doesn't work. In this case I get a hang on
>>> resume (power led still flashing, but fan woke up). There are probably
>>> some other dependencies.
>> Since bisection may also point you at some intermediate breakage, which
>> these last results of yours seem to support, could you check whether
>> 55f8c389d434 put immediately on top of the above commit makes a difference,
>> and if so resume bisecting from there?
> Nope, 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1 with 55f8c389d434 on top
> it still hangs on resume.

Ok.  I'll see about breaking the change apart so we can bisect which
specific bit of code movement broke things.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.