[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pstore does not work under xen


  • To: James Dingwall <james@xxxxxxxxxxxxxx>, Kees Cook <keescook@xxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>
  • From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • Date: Wed, 25 Sep 2019 11:38:54 -0400
  • Autocrypt: addr=boris.ostrovsky@xxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFH8CgsBEAC0KiOi9siOvlXatK2xX99e/J3OvApoYWjieVQ9232Eb7GzCWrItCzP8FUV PQg8rMsSd0OzIvvjbEAvaWLlbs8wa3MtVLysHY/DfqRK9Zvr/RgrsYC6ukOB7igy2PGqZd+M MDnSmVzik0sPvB6xPV7QyFsykEgpnHbvdZAUy/vyys8xgT0PVYR5hyvhyf6VIfGuvqIsvJw5 C8+P71CHI+U/IhsKrLrsiYHpAhQkw+Zvyeml6XSi5w4LXDbF+3oholKYCkPwxmGdK8MUIdkM d7iYdKqiP4W6FKQou/lC3jvOceGupEoDV9botSWEIIlKdtm6C4GfL45RD8V4B9iy24JHPlom woVWc0xBZboQguhauQqrBFooHO3roEeM1pxXjLUbDtH4t3SAI3gt4dpSyT3EvzhyNQVVIxj2 FXnIChrYxR6S0ijSqUKO0cAduenhBrpYbz9qFcB/GyxD+ZWY7OgQKHUZMWapx5bHGQ8bUZz2 SfjZwK+GETGhfkvNMf6zXbZkDq4kKB/ywaKvVPodS1Poa44+B9sxbUp1jMfFtlOJ3AYB0WDS Op3d7F2ry20CIf1Ifh0nIxkQPkTX7aX5rI92oZeu5u038dHUu/dO2EcuCjl1eDMGm5PLHDSP 0QUw5xzk1Y8MG1JQ56PtqReO33inBXG63yTIikJmUXFTw6lLJwARAQABtDNCb3JpcyBPc3Ry b3Zza3kgKFdvcmspIDxib3Jpcy5vc3Ryb3Zza3lAb3JhY2xlLmNvbT6JAjgEEwECACIFAlH8 CgsCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEIredpCGysGyasEP/j5xApopUf4g 9Fl3UxZuBx+oduuw3JHqgbGZ2siA3EA4bKwtKq8eT7ekpApn4c0HA8TWTDtgZtLSV5IdH+9z JimBDrhLkDI3Zsx2CafL4pMJvpUavhc5mEU8myp4dWCuIylHiWG65agvUeFZYK4P33fGqoaS VGx3tsQIAr7MsQxilMfRiTEoYH0WWthhE0YVQzV6kx4wj4yLGYPPBtFqnrapKKC8yFTpgjaK jImqWhU9CSUAXdNEs/oKVR1XlkDpMCFDl88vKAuJwugnixjbPFTVPyoC7+4Bm/FnL3iwlJVE qIGQRspt09r+datFzPqSbp5Fo/9m4JSvgtPp2X2+gIGgLPWp2ft1NXHHVWP19sPgEsEJXSr9 tskM8ScxEkqAUuDs6+x/ISX8wa5Pvmo65drN+JWA8EqKOHQG6LUsUdJolFM2i4Z0k40BnFU/ kjTARjrXW94LwokVy4x+ZYgImrnKWeKac6fMfMwH2aKpCQLlVxdO4qvJkv92SzZz4538az1T m+3ekJAimou89cXwXHCFb5WqJcyjDfdQF857vTn1z4qu7udYCuuV/4xDEhslUq1+GcNDjAhB nNYPzD+SvhWEsrjuXv+fDONdJtmLUpKs4Jtak3smGGhZsqpcNv8nQzUGDQZjuCSmDqW8vn2o hWwveNeRTkxh+2x1Qb3GT46uuQINBFH8CgsBEADGC/yx5ctcLQlB9hbq7KNqCDyZNoYu1HAB Hal3MuxPfoGKObEktawQPQaSTB5vNlDxKihezLnlT/PKjcXC2R1OjSDinlu5XNGc6mnky03q yymUPyiMtWhBBftezTRxWRslPaFWlg/h/Y1iDuOcklhpr7K1h1jRPCrf1yIoxbIpDbffnuyz kuto4AahRvBU4Js4sU7f/btU+h+e0AcLVzIhTVPIz7PM+Gk2LNzZ3/on4dnEc/qd+ZZFlOQ4 KDN/hPqlwA/YJsKzAPX51L6Vv344pqTm6Z0f9M7YALB/11FO2nBB7zw7HAUYqJeHutCwxm7i BDNt0g9fhviNcJzagqJ1R7aPjtjBoYvKkbwNu5sWDpQ4idnsnck4YT6ctzN4I+6lfkU8zMzC gM2R4qqUXmxFIS4Bee+gnJi0Pc3KcBYBZsDK44FtM//5Cp9DrxRQOh19kNHBlxkmEb8kL/pw XIDcEq8MXzPBbxwHKJ3QRWRe5jPNpf8HCjnZz0XyJV0/4M1JvOua7IZftOttQ6KnM4m6WNIZ 2ydg7dBhDa6iv1oKdL7wdp/rCulVWn8R7+3cRK95SnWiJ0qKDlMbIN8oGMhHdin8cSRYdmHK kTnvSGJNlkis5a+048o0C6jI3LozQYD/W9wq7MvgChgVQw1iEOB4u/3FXDEGulRVko6xCBU4 SQARAQABiQIfBBgBAgAJBQJR/AoLAhsMAAoJEIredpCGysGyfvMQAIywR6jTqix6/fL0Ip8G jpt3uk//QNxGJE3ZkUNLX6N786vnEJvc1beCu6EwqD1ezG9fJKMl7F3SEgpYaiKEcHfoKGdh 30B3Hsq44vOoxR6zxw2B/giADjhmWTP5tWQ9548N4VhIZMYQMQCkdqaueSL+8asp8tBNP+TJ PAIIANYvJaD8xA7sYUXGTzOXDh2THWSvmEWWmzok8er/u6ZKdS1YmZkUy8cfzrll/9hiGCTj u3qcaOM6i/m4hqtvsI1cOORMVwjJF4+IkC5ZBoeRs/xW5zIBdSUoC8L+OCyj5JETWTt40+lu qoqAF/AEGsNZTrwHJYu9rbHH260C0KYCNqmxDdcROUqIzJdzDKOrDmebkEVnxVeLJBIhYZUd t3Iq9hdjpU50TA6sQ3mZxzBdfRgg+vaj2DsJqI5Xla9QGKD+xNT6v14cZuIMZzO7w0DoojM4 ByrabFsOQxGvE0w9Dch2BDSI2Xyk1zjPKxG1VNBQVx3flH37QDWpL2zlJikW29Ws86PHdthh Fm5PY8YtX576DchSP6qJC57/eAAe/9ztZdVAdesQwGb9hZHJc75B+VNm4xrh/PJO6c1THqdQ 19WVJ+7rDx3PhVncGlbAOiiiE3NOFPJ1OQYxPKtpBUukAlOTnkKE6QcA4zckFepUkfmBV1wM Jg6OxFYd01z+a+oL
  • Cc: Juergen Gross <jgross@xxxxxxxx>, "Luck, Tony" <tony.luck@xxxxxxxxx>, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>, Anton Vorontsov <anton@xxxxxxxxxx>, Matthias Kaehlcke <mka@xxxxxxxxxxxx>, Colin Cross <ccross@xxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 25 Sep 2019 15:39:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 9/25/19 7:01 AM, James Dingwall wrote:
> On Mon, Sep 23, 2019 at 08:41:05PM -0400, Boris Ostrovsky wrote:
>> On 9/23/19 6:59 PM, Kees Cook wrote:
>>> On Mon, Sep 23, 2019 at 03:42:27PM +0000, James Dingwall wrote:
>>>> On Thu, Sep 19, 2019 at 12:37:40PM -0400, Boris Ostrovsky wrote:
>>>>> On 9/19/19 12:14 PM, James Dingwall wrote:
>>>>>> On Thu, Sep 19, 2019 at 03:51:33PM +0000, Luck, Tony wrote:
>>>>>>>> I have been investigating a regression in our environment where pstore 
>>>>>>>> (efi-pstore specifically but I suspect this would affect all 
>>>>>>>> implementations) no longer works after upgrading from a 4.4 to 5.0 
>>>>>>>> kernel when running under xen.  (This is an Ubuntu kernel but I don't 
>>>>>>>> think there are patches which affect this area.)
>>>>>>> I don't have any answer for this ... but want to throw out the idea that
>>>>>>> VMM systems could provide some hypercalls to guests to save/return
>>>>>>> some blob of memory (perhaps the "save" triggers automagically if the
>>>>>>> guest crashes?).
>>>>>>>
>>>>>>> That would provide a much better pstore back end than relying on 
>>>>>>> emulation
>>>>>>> of EFI persistent variables (which have severe contraints on size, and 
>>>>>>> don't
>>>>>>> support some pstore modes because you can't dynamically update EFI 
>>>>>>> variables
>>>>>>> hundreds of times per second).
>>>>>>>
>>>>>> For clarification this is a dom0 crash rather than an HVM guest with 
>>>>>> EFI.  I
>>>>>> should probably have also mentioned the xen verion has changed from 
>>>>>> 4.8.4 to
>>>>>> 4.11.2 in case its behaviour on detection of crashed domain has changed.
>>>>>>
>>>>>> (For capturing guest crashes we have enabled xenconsole logging so the
>>>>>> hvc0 log is available in dom0.)
>>>>> Do you only see this difference between 4.4 and 5.0 when you crash via
>>>>> sysrq?
>>>>>
>>>>> Because that's where things changed. On 4.4 we seem to be forcing an
>>>>> oops, which eventually calls kmsg_dump() and then panic. On 5.0 we call
>>>>> panic() directly from sysrq handler. And because Xen's panic notifier
>>>>> doesn't return we never get a chance to call kmsg_dump().
>>>>>
>>>> Ok, I see that change in 8341f2f222d729688014ce8306727fdb9798d37e.  I 
>>>> hadn't tested it any other way before.  Using the null pointer 
>>>> de-reference module code at [1] a pstore record is generated as expected 
>>>> when the module is loaded (panic_on_oops=1).
>>> This change looks correct -- it just gets us directly to the panic()
>>> state instead of exercising the various exception handlers.
>>>
>>>> I have also tested swapping the kmsg_dump() / 
>>>> atomic_notifier_call_chain() around in panic.c and this also results in 
>>>> a pstore record being created with sysrq-c.  I don't know if that would 
>>>> be an acceptable solution though since it may break behaviour that other 
>>>> things depend on.
>>> I don't think reordering these is a good idea: as the comments say,
>>> there might be work done in the notifier chain that kmsg_dump() will
>>> want to capture (e.g. the KASLR base offset).
>>>
>>> The situation seems to be that notifier callbacks must return -- I think
>>> Xen needs fixing here.
>>>
>> I only had one quick sanity test with a PV guest so this needs more
>> testing. James, can you give it a try?
> I have tested the patch in a pv domU and in dom0.  The kernel was built 
> with a default Ubuntu .config which sets CONFIG_PANIC_TIMEOUT=0. uname 
> -r = 5.0.0-27-generic.
>
> In the domU (no custom kernel parameters):
>
> - sysrq-c: I saw my debug printk messages about kmsg_dump() being 
> invoked after the traceback and the domain became listed as crashed in 
> the xl status output.
>
> - halt -p: clean shutdown
>
> - shutdown -r now: clean reboot
>
>
> In the domU with panic=15 in kernel parameters:
>
> - sysrq-c: as without panic=15 then final message "Rebooting in 15 
> seconds.." printed but domain never rebooted.  Without this patch the 
> domain doesn't reboot or print the Rebooting message presumably because 
> of the non-returning panic handler.  If that message is reached then I 
> think I would expect a reboot.  (In our Linux 4.4 / Xen 4.8.4 
> environment no value of panic resulted in reboot.)

(+xen-devel which I just noticed has not been copied on this thread)

Whether a guest is rebooted is controlled by on_crash parameter in the
config file. By default it's 'destroy'.


>
> - halt -p: clean shutdown
>
> - shutdown -r now: clean reboot
>
>
> In dom0 with oops=panic panic=15 in the kernel parameters:
>
> - sysrq-c: kmsg_dump() debug messages printed, last linux message 
> "Rebooting in 15 seconds..", after 15s "(XEN) Hardware Dom0 crashed: 
> rebooting machine in 5 seconds.", after 5s system rebooted.  On the 
> next start a pstore record is present as expected.
>
> - halt -p: clean shutdown, no pstore record present on next start.
>
> - shutdown -r now: clean reboot, no pstore record present on next 
> start.
>
>
> In dom0 with panic=0:
>
> - sysrq-c: dom0 crashes, no reboot messages printed from Xen or kernel 
> and system hangs.  A pstore record is present on next start.
>
> - halt -p: clean shutdown, no pstore record present on next start.
>
> - shutdown -r now: clean reboot, no pstore record present on next 
> start.
>
>
> In my opinion this patch:
>  - fixes the original issue of no pstore record being generated for a 
> dom0 panic.
>  - respects the dom0 panic=... value.  This is a change in behaviour in 
> how xen handles the crashed dom0 from always rebooting to only rebooting 
> if panic > 0.

Even thought this is now correct behavior it's a change from what we've
always had. And I think we should preserve that.

Meaning that we have to ignore if panic_timeout is set to zero either on
boot line or via sysfs.


-boris

>  - causes a Rebooting message to be printed in a crashed pv domU when 
> panic > 0 but the domain does not reboot when it should.
>
> James
>
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 750f46ad018a..d88f118028b4 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -269,16 +269,17 @@ void xen_reboot(int reason)
>>                 BUG();
>>  }
>>  
>> +static int reboot_reason = SHUTDOWN_reboot;
>>  void xen_emergency_restart(void)
>>  {
>> -       xen_reboot(SHUTDOWN_reboot);
>> +       xen_reboot(reboot_reason);
>>  }
>>  
>>  static int
>>  xen_panic_event(struct notifier_block *this, unsigned long event, void
>> *ptr)
>>  {
>>         if (!kexec_crash_loaded())
>> -               xen_reboot(SHUTDOWN_crash);
>> +               reboot_reason = SHUTDOWN_crash;
>>         return NOTIFY_DONE;
>>  }
>>  
>> @@ -290,6 +291,8 @@ static struct notifier_block xen_panic_block = {
>>  int xen_panic_handler_init(void)
>>  {
>>         atomic_notifier_chain_register(&panic_notifier_list,
>> &xen_panic_block);
>> +       if (panic_timeout == 0)
>> +               set_arch_panic_timeout(-1, CONFIG_PANIC_TIMEOUT);
>>         return 0;
>>  }
>>
>>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.