[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V5] x86/altp2m: Fix crash with INVALID_ALTP2M EPTP index


  • To: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>
  • From: George Dunlap <george.dunlap@xxxxxxxxxx>
  • Date: Fri, 20 Jul 2018 18:18:18 +0100
  • Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwUCVFq6vQIbDAUJAeEzgAAKCRCmNjwxBZC0bWknD/97 Tkh3PMAcvMZINmJefBdYYspmwTWZSR9USsy68oWzDsXKNDNTqBC781lR/7PSqhqaSOmSnty3 FNblaBYKfMV3OOWgrP0H8Voqp4IgH3yOOkQLVITIwulqbbxQtmCsJ3xkhZm6CA0EKbc9VM/j FX3aCAfOJf52vlY1gXjYOvVjrdrRrBXEjs8E5f6EsrQKDrWCKNx/9qRfmtsQeKHTsgpINkpZ s11ClX/sM/RCR9/BgB/K08QQZYsWD6lgZh1KxLXRzKRunba0L+jpcRsoQFUMj/ofrfnHAdl0 q2upzISM/wR8aer+kekMo+y00schmYJYu5JAAzbjQQuhCAg0UTBGPaNwteL2l3c9Ps8on1nl mq9TnbYwGLAxJzXSb3BATgz7dygpsBBNS5WhUNQgIJvcZJbLggEIqjZGs8o7/+dt4klwxCYL FVlsWYSwEjX0UYHVLMS/F7FcXbCMUeoN/4krmRyv7YICE/VDQSDPcSKedzWvQM8T+5uY5pFJ NiIaa6asFndP50GiKbFtD6xAM+rbnwT7Io+iPtvD/3ddMXQs58IVMzgNA/hcdOX/qlx6Jqk/ hYQQsl4HoQsx/GyrNiwiPErTx32QNeXxoGYm6kwxt7F5qK7AN5tyYNkEyoxYrv8bl9VjAve8 hpECyf4O1mOGC/dIuBCDk8gxL5Pbo3jl98LBZQQYAQoADwIbDAUCVlNqsQUJA9njdAAKCRCm NjwxBZC0bbJMEACigmtpL2lzS47DXydApr1X8SYCHIPc39OjvmErjP05lKUZjmesmhlM5eKO gPb/fzeJ0wXB4J8OyseIJ0D/XwyLLQeM8d/HUFFMBWr+HE7jIukAUXeQ6GRwR+MBYGK/KmR9 JHbMAUz8f3G087Ma12BfpNWayndlFwR3rvdV4lvlyx6cl0EaFhbzPu/N07HG5MTk0evtphgZ 7wuG1oAtO+DGA6orHEicor6nBAQNZzPyjqo40dBxTs+amx7UndMRPSL1dD57eJwbbvBeNa8I w8wT7oNy2/C21VWmSy5XzMzcUTgmjmQz6DSNJPz2dMK4Y/LtcVFTfSZTmlBIkfoc9Vay2EB9 3z2EmjZwGT7n/DRu9QDtLbXyeVTBuLTaP3D+q5AyR1/5Z4T0LhwNvxeND5yO+YNAwqocZwL+ OcctpSZUBpAuU4Ju/9JKMX57GlnbjB8YGahoBJsQZx4CZyw0MXlkCk5cR0EPjY9iI2CEA5lO QueOSbo0hf1ZJwCx724lx0WSwL8ngd8wZTYMNc8GngaU61kmzfcuCklhokTxQdK7Efme5ccv A1txzgGewx9mDhPgNcJweasBnyL0N3wya2RMAzm04gCio8y4FKQepwQpKCNKAYZIU4juAPxn nb6cbBGiMGO1NDuxG+qvl1cMElnq+cuhSUlZdr2sE9JRfa0gucLBZQQYAQoADwIbDAUCWHQN VAUJBfqGFwAKCRCmNjwxBZC0bbgCD/oC6mWUrxQKWPDvFE9+fzm8UKqKP7aciz+gvWUN3o4i 4sRFNyvAEOW/QY2zwM1pN07BFZ3Z+8AVxpgR6h7RQzDJYSPZ5k5WWCJzJEQs2sPI5rfYJGK8 um7mlsSvf2xcLK/1Aj07BmWDjR6glDDRY+iMmSSdHe6Te6tiQPPS6Woj8AE3qf5lBsdvcEln nrkSwzNeVKRQQROUOskVw4WmCsNJjZtKmrVpgId3df/5HWG7Bi4nPwA8IFOt6O72lJlkORFy DF5P7ML7Pc5LbEFimzETPBxTJzVu1UoOQb/THB+qxhKMXXudSf/5sdMhwvOwItIcc5pib/v6 7gWK48bAzoOTgNYzmDCVC/roeLLU2SpEQIlIR0eAaWImgt8VEtre3Gch33e41DtbUli54DX0 dRdhqQaDM1T1q77VyDoZcs+SpGX9Ic9mxl+BN+6vtGIUVgaOG5pF85aQlRfCD6IlFQgiZtiR XeRpeIYG27RUw5kIljW+VxPMdBUvZpUXEazqjoPvBKybg0oKFfMXrMj4vHo6J0FD3ZEToGnP dANspUCZRewRozjp7ZWIu7QfGasfJNQ8c1IDiAFl3rV+dAGXXdmrDcX6w2q5lqoFz+8npK2I ehKCA94U+J/RLywUiaLuHnXt40WvQ98kHm7uTsy36iWqqawPqzmn8m5ruynVHmmcXsLBZQQY AQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9hjn1k5WcRHlu19WGuH6q0Kgm1 LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8kYj2Hn1QgX5SqQsysWTHWOEse GeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467FS/k4FJ5CHNRumvhLa0l2HEEu 5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWreDoaFqzq1TKtzHhFgQG7yFUE epxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4LH3hxQtiaIpuXqq2D4z63h6vC x2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4AjiKZ5qWNSEdvEpL43fTvZYxQh DCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180ADw7a3gnmr5RumcZP3NGSSZA 6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTRYJ2ms7oCe870gh4D1wFFqTLe yXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkHpTt3YYZvrhS2MO2EYEcWjyu6 LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGBq+/kRPrWXpoaQn7FXWGfMqU+ NkY9enyrlw==
  • Cc: george.dunlap@xxxxxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, kevin.tian@xxxxxxxxx, jbeulich@xxxxxxxx, jun.nakajima@xxxxxxxxx
  • Delivery-date: Fri, 20 Jul 2018 17:18:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 07/20/2018 05:29 PM, Razvan Cojocaru wrote:
> On 07/20/2018 06:07 PM, George Dunlap wrote:
>> On 06/28/2018 03:35 PM, Razvan Cojocaru wrote:
>>> A VM exit handler executed immediately after enabling #VE might
>>> find a stale __vmsave()d EPTP_INDEX, stored by calling
>>> altp2m_vcpu_destroy() when SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS
>>> had been enabled by altp2m_vcpu_update_vmfunc_ve().
>>>
>>> vmx_vmexit_handler() __vmread()s EPTP_INDEX as soon as
>>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, so if an
>>> application enables altp2m on a domain, succesfully calls
>>> xc_altp2m_set_vcpu_enable_notify(), then disables altp2m and
>>> exits, a second run of said application will likely read the
>>> INVALID_ALTP2M EPTP_INDEX set when disabling altp2m in the first
>>> run, and crash the host with the BUG_ON(idx >= MAX_ALTP2M),
>>> between xc_altp2m_set_vcpu_enable_notify() and
>>> xc_altp2m_set_domain_state(..., false).
>>>
>>> The problem is not restricted to an INVALID_ALTP2M EPTP_INDEX
>>> (which can only sanely happen on altp2m uninit), but applies
>>> to any stale index previously saved - which means that all
>>> altp2m_vcpu_update_vmfunc_ve() calls must also call
>>> altp2m_vcpu_update_p2m() after setting
>>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS, in order to make sure
>>> that the stored EPTP_INDEX is always valid at
>>> vmx_vmexit_handler() time.
>>
>> I'm sorry, this description still doesn't make hardly any sense to me,
>> nor the solution, even after reading all the previous threads on the
>> issue.  The description doesn't, for instance, mention vcpu_pause() at
>> all, in spite of the fact that it seems (from the previous discussion)
>> that this is a critical part of why this solution works; nor is there
>> any comment in the code about the required discipline regarding
>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS,  making it fairly likely that
>> someone will re-introduce a bug like this in the future.
>>
>> My normal template for something like this is
>> 1. Explain what the current situation is
>> 2. Explain why that's a problem
>> 3. Describe what you're changing and how it fixes it.
>>
>> I can't help but think the right thing to do here is in vmx.c somewhere
>> -- it is, after all, code in vmx.c that:
>> 1. Sets and clears SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS
>> 2. Writes EPTP_INDEX
>> 3. Assumes that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS => EPTP_INDEX is
>> valid.
>>
>> What about something like the attached, instead (compile-tested only)?
> George, thanks for the review, comments and new patch! You're the third
> person telling me that the patch description is hard to parse - I'll
> definitely work on that skill in the future (and sorry for the
> inconvenience).

No worries -- everything here is a bit of a tangled mess.

> The vcpu_pause() lead was a red herring in my initial investigation of
> the issue, and that is the reason why it didn't make it into the patch
> description. The pausing already done is fine.
> 
> I've tested your patch on my system (where I can reproduce the crash
> with a 100% reproduction rate without it), and I've had no crashes - so
> it does seem to have fixed the problem. Thinking about the crash path,
> it also makes sense that it would fix the problem - I can't think of any
> objections to it.
> 
> Let me try the explanation again:
> 
> The current situation: when we run twice an altp2m client application
> which uses altp2m_vcpu_update_vmfunc_ve() (it _has_ to be twice), the
> following happens: after the first run of the application,
> altp2m_vcpu_destroy() gets called as part of the cleanup process, and
> this stores INVALID_ALTP2M EPTP_INDEX in the VMCS.

Right, I meant, the current situation in terms of the way the code in
Xen / the processor currently behaves / what it expects.

I tried to follow that pattern in my own patch.  The key to the whole
bug is this:

* vmx_vmexit_handler() assumes that is VIRT_EXCEPTION is set, that
EPTP_INDEX is valid

Once you state it that way, you realize, OK that's false.  But why is it
false?

* Because VIRT_EXCEPTION is enabled without touching EPTP_INDEX

That's the core problem.  That description by itself should make anyone
go, "Yeah, that will be a a problem."  The details of how that can go
wrong is just icing on the cake / grep fodder for people looking for how
to fix their own problem.

The reason this ever worked, AFAICT, is that EPTP_INDEX was accidentally
correct.  If we'd initialized EPTP_INDEX with 0xDEADBEEF on VMCS
creation, then you also would have hit the bug.  (In fact, that might
not be a bad idea.)

Furthermore, imagine the following scenario:

* dom0 enables altp2m on domain A
* dom0 switches altp2m to view 1 on domain A
* dom0 enables #VE on domain A
* domain A has a vmexit
  -> At this point, EPTP_INDEX is 0, so the vmexit code will drop a
reference on altp2m index 1 and increase the reference count on altp2m
index 0 #

My patch fixes the above issue, but your patch doesn't (AFAICT).  What
altp2m_vcpu_destroy() did wasn't fundamentally buggy; it just
highlighted the issue by doing the equivalent of putting 0xDEADBEEF in
EPTP_INDEX; and what your patch did was to reverse that, by making
EPTP_INDEX accidentally correct again the next time you ran your test.

(Let me know if I'm wrong about that!)

Stating the problem like that -- saying what the assumption is that's
being violated and why -- is not only more clear, but it also leads to a
more robust solution.

> I just
> thought that I should change the code that's _not_ VMX-specific in case
> altp2m is extended to SVM.

Right, but that assumes the internals are going to be similar somehow.
It's better if you don't have to make assumptions about the internals of
an interface you're calling.

> Reviewed-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
> Tested-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

Thanks. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.