[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 8/9] common/grant_table: block speculative out-of-bound accesses


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Norbert Manthey <nmanthey@xxxxxxxxx>
  • Date: Tue, 19 Feb 2019 22:47:13 +0100
  • Autocrypt: addr=nmanthey@xxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFoJQc0BEADM8Z7hB7AnW6ErbSMsYkKh4HLAPfoM+wt7Fd7axHurcOgFJEBOY2gz0isR /EDiGxYyTgxt5PZHJIfra0OqXRbWuLltbjhJACbu35eaAo8UM4/awgtYx3O1UCbIlvHGsYDg kXjF8bBrVjPu0+g55XizX6ot/YPAgmWTdH8qXoLYVZVWJilKlTqpYEVvarSn/BVgCbIsQIps K93sOTN9eJKDSqHvbkgKl9XG3WsZ703431egIpIZpfN0zZtzumdZONb7LiodcFHJ717vvd89 3Hv2bYv8QLSfYsZcSnyU0NVzbPhb1WtaduwXwNmnX1qHJuExzr8EnRT1pyhVSqouxt+xkKbV QD9r+cWLChumg3g9bDLzyrOTlEfAUNxIqbzSt03CRR43dWgfgGiLDcrqC2b1QR886WDpz4ok xX3fdLaqN492s/3c59qCGNG30ebAj8AbV+v551rsfEba+IWTvvoQnbstc6vKJCc2uG8rom5o eHG/bP1Ug2ht6m/0uWRyFq9C27fpU9+FDhb0ZsT4UwOCbthe35/wBZUg72zDpT/h5lm64G6C 0TRqYRgYcltlP705BJafsymmAXOZ1nTCuXnYAB9G9LzZcKKq5q0rP0kp7KRDbniirCUfp7jK VpPCOUEc3tS1RdCCSeWNuVgzLnJdR8W2h9StuEbb7hW4aFhwRQARAQABtCROb3JiZXJ0IE1h bnRoZXkgPG5tYW50aGV5QGFtYXpvbi5kZT6JAj0EEwEIACcFAloJQc0CGyMFCQlmAYAFCwkI BwIGFQgJCgsCBBYCAwECHgECF4AACgkQZ+8yS8zN62ajmQ/6AlChoY5UlnUaH/jgcabyAfUC XayHgCcpL1SoMKvc2rCA8PF0fza3Ep2Sw0idLqC/LyAYbI6gMYavSZsLcsvY6KYAZKeaEriG 7R6cSdrbmRcKpPjwvv4iY6G0DBTeaqfNjGe1ECY8u522LprDQVquysJIf3YaEyxoK/cLSb0c kjzpqI1P9Vh+8BQb5H9gWpakbhFIwbRGHdAF1roT7tezmEshFS0IURJ2ZFEI+ZgWgtl1MBwN sBt65im7x5gDo25h8A5xC9gLXTc4j3tk+3huaZjUJ9mCbtI12djVtspjNvDyUPQ5Mxw2Jwar C3/ZC+Nkb+VlymmErpnEUZNltcq8gsdYND4TlNbZ2JhD0ibiYFQPkyuCVUiVtimXfh6po9Yt OkE0DIgEngxMYfTTx01Zf6iwrbi49eHd/eQQw3zG5nn+yZsEG8UcP1SCrUma8p93KiKOedoL n43kTg4RscdZMjj4v6JkISBcGTR4uotMYP4M0zwjklnFXPmrZ6/E5huzUpH9B7ZIe/SUu8Ur xww/4dN6rfqbNzMxmya8VGlEQZgUMWcck+cPrRLB09ZOk4zq9i/yaHDEZA1HNOfQ9UCevXV5 7seXSX7PCY6WDAdsT3+FuaoQ7UoWN3rdpb+064QKZ0FsHeGzUd7MZtlgU4EKrh25mTSNZYRs nTz2zT/J33e5Ag0EWglBzQEQAKioD1gSELj3Y47NE11oPkzWWdxKZdVr8B8VMu6nVAAGFRSf Dms4ZmwGY27skMmOH2srnZyTfm9FaTKr8RI+71Fh9nfB9PMmwzA7OIY9nD73/HqPywzTTleG MlALmnuY6xFRSDmqmvxDHgWyzB4TgPWt8+hW3+TJKCx2RgLAdSuULZla4lia+NlS8WNRUDGK sFJCCB3BW5I/cocfpBEUqLbbmnPuD9UKpEnFcYWD9YaDNcBTjSc7iDsvtpdrBXg5VETOz/TQ /CmVs9h/5zug8O4bXxHEEJpCAxs4cGKxowBqx/XJfkwdWeo/LdaeR+LRbXvq4A32HSkyj9sV vygwt2OFEk493JGik8qtAA/oPvuqVPJGacxmZ7zKR12c0mnKCHiexFJzFbC7MSiUhhe8nNiM p6Sl6EZmsTUXhV2bd2M12Bqcss3TTJ1AcW04T4HYHVCSxwl0dVfcf3TIaH0BSPiwFxz0FjMk 10umoRvUhYYoYpPFCz8dujXBlfB8q2tnHltEfoi/EIptt1BMNzTYkHKArj8Fwjf6K+nQ3a8p 1cWfkYpA5bRqbhbplzpa0u1Ex0hZk6pka0qcVgqmH31O2OcSsqeKfUfHkzj3Q6dmuwm1je/f HWH9N1gDPEp1RB5bIxPnOG1Z4SNl9oVQJhc4qoJiqbvkciivYcH7u2CBkboFABEBAAGJAiUE GAEIAA8FAloJQc0CGwwFCQlmAYAACgkQZ+8yS8zN62YU9Q//WTnN28aBX1EhDidVho80Ql2b tV1cDRh/vWTcM4qoM8vzW4+F/Ive6wDVAJ7zwAv8F8WPzy+acxtHLkyYk14M6VZ1eSy0kV0+ RZQdQ+nPtlb1MoDKw2N5zhvs8A+WD8xjDIA9i21hQ/BNILUBINuYKyR19448/41szmYIEhuJ R2fHoLzNdXNKWQnN3/NPTuvpjcrkXKJm2k32qfiys9KBcZX8/GpuMCc9hMuymzOr+YlXo4z4 1xarEJoPOQOXnrmxN4Y30/qmf70KHLZ0GQccIm/o/XSOvNGluaYv0ZVJXHoCnYvTbi0eYvz5 OfOcndqLOfboq9kVHC6Yye1DLNGjIVoShJGSsphxOx2ryGjHwhzqDrLiRkV82gh6dUHKxBWd DXfirT8a4Gz/tY9PMxan67aSxQ5ONpXe7g7FrfrAMe91XRTf50G3rHb8+AqZfxZJFrBn+06i p1cthq7rJSlYCqna2FedTUT+tK1hU9O0aK4ZYYcRzuTRxjd4gKAWDzJ1F/MQ12ftrfCAvs7U sVbXv2TndGIleMnheYv1pIrXEm0+sdz5v91l2/TmvkyyWT8s2ksuZis9luh+OubeLxHq090C hfavI9WxhitfYVsfo2kr3EotGG1MnW+cOkCIX68w+3ZS4nixZyJ/TBa7RcTDNr+gjbiGMtd9 pEddsOqYwOs=
  • Cc: Juergen Gross <jgross@xxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Martin Pohlack <mpohlack@xxxxxxxxx>, wipawel@xxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, David Woodhouse <dwmw@xxxxxxxxxxxx>, "Martin Mazein\(amazein\)" <amazein@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Julian Stecklina <jsteckli@xxxxxxxxx>, Bjoern Doebel <doebel@xxxxxxxxx>
  • Delivery-date: Tue, 19 Feb 2019 21:47:43 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 2/18/19 17:08, Jan Beulich wrote:
>>>> On 18.02.19 at 14:49, <nmanthey@xxxxxxxxx> wrote:
>> On 2/15/19 11:34, Jan Beulich wrote:
>>>>>> On 15.02.19 at 10:55, <nmanthey@xxxxxxxxx> wrote:
>>>> On 2/13/19 12:50, Jan Beulich wrote:
>>>>>>>> On 08.02.19 at 14:44, <nmanthey@xxxxxxxxx> wrote:
>>>>>> Guests can issue grant table operations and provide guest controlled
>>>>>> data to them. This data is also used for memory loads. To avoid
>>>>>> speculative out-of-bound accesses, we use the array_index_nospec macro
>>>>>> where applicable. However, there are also memory accesses that cannot
>>>>>> be protected by a single array protection, or multiple accesses in a
>>>>>> row. To protect these, a nospec barrier is placed between the actual
>>>>>> range check and the access via the block_speculation macro.
>>>>>>
>>>>>> As different versions of grant tables use structures of different size,
>>>>>> and the status is encoded in an array for version 2, speculative
>>>>>> execution might touch zero-initialized structures of version 2 while
>>>>>> the table is actually using version 1.
>>>>> Why zero-initialized? Did I still not succeed demonstrating to you
>>>>> that speculation along a v2 path can actually overrun v1 arrays,
>>>>> not just access parts with may still be zero-initialized?
>>>> I believe a speculative v2 access can touch data that has been written
>>>> by valid v1 accesses before, zero initialized data, or touch the NULL
>>>> page. Given the macros for the access I do not believe that a v2 access
>>>> can touch a page that is located behind a page holding valid v1 data.
>>> I've given examples before of how I see this to be possible. Would
>>> you mind going back to one of the instances, and explaining to me
>>> how you do _not_ see any room for an overrun there? Having
>>> given examples, I simply don't know how else I can explain this to
>>> you without knowing at what specific part of the explanation we
>>> diverge. (And no, I'm not excluding that I'm making up an issue
>>> where there is none.)
>> What we want to real out is that the actually use version1, while
>> speculation might use version2, right? I hope you refer to this example
>> of your earlier email.
>>
>> On 1/29/19 16:11, Jan Beulich wrote:
>>> Let's look at an example: gref 256 points into the middle of
>>> the first page when using v1 calculations, but at the start
>>> of the second page when using v2 calculations. Hence, if the
>>> maximum number of grant frames was 1, we'd overrun the
>>> array, consisting of just a single element (256 is valid as a
>>> v1 gref in that case, but just out of bounds as a v2 one).
>> From how I read your example and my explanation, the key difference is
>> in the size of the shared_raw array. In case gref 256 is a valid v1
>> handle, then the shared_raw array has space for at least 256 entries, as
>> shared_raw was allocated for the number of requested entries. The access
>> to shared_raw is controlled with the macro shared_entry_v2:
>>  222 #define SHGNT_PER_PAGE_V2 (PAGE_SIZE / sizeof(grant_entry_v2_t))
>>  223 #define shared_entry_v2(t, e) \
>>  224     ((t)->shared_v2[(e)/SHGNT_PER_PAGE_V2][(e)%SHGNT_PER_PAGE_V2])
>> Since the direct access to the shared_v2 array depends on the
>> SHGNT_PER_PAGE_V2 value, this has to be less than the size of that
>> array. Hence, shared_raw will not be overrun (neither for version 1 nor
>> version 2). However, this division might result in accessing an element
>> of shared_raw that has not been initialized by version1 before. However,
>> right after allocation, shared_raw is zero initialized. Hence, this
>> might result in an access of the NULL page.
> The question is: How much of shared_raw[] will be zero-initialized?
> The example I've given uses relatively small grant reference values,
> so for the purpose here let's assume gt->max_grant_frames is 1.
> In this case shared_raw[] is exactly one entry in size. Hence the
> speculative access you describe will not necessarily access the NULL
> page.
>
> Obviously the same issue exists with higher limits and higher grant
> reference numbers.
The solution to this problem is really simple, I mixed up grant frames
and grant entries. I agree that shared_raw can be accessed out-of-bounds
and should be protected. I will adapt the commit message accordingly,
and revise the modifications I added to the code base.
>
>>>>>> @@ -1321,7 +1327,8 @@ unmap_common(
>>>>>>          goto unlock_out;
>>>>>>      }
>>>>>>  
>>>>>> -    act = active_entry_acquire(rgt, op->ref);
>>>>>> +    act = active_entry_acquire(rgt, array_index_nospec(op->ref,
>>>>>> +                                                       
>>>> nr_grant_entries(rgt)));
>>>>> ... you add a use e.g. here to _guard_ against speculation.
>>>> The adjustment you propose is to exchange the switch statement in
>>>> nr_grant_entries with a if( evaluate_nospec( gt->gt_version == 1 ), so
>>>> that the returned values are not speculated?
>>> At this point I'm not proposing a particular solution. I'm just
>>> putting on the table an issue left un-addressed. I certainly
>>> wouldn't welcome converting the switch() to an if(), even if
>>> right now there's no v3 on the horizon. (It's actually quite
>>> the inverse: If someone came and submitted a patch to change
>>> the various if()-s on gt_version to switch()-es, I'd welcome this.)
>> I am happy to add block_speculation() macros into each case of the
>> switch statement.
> Ugly, but perhaps the only possible solution at this point.
>
>>>> Do you want me to
>>>> cache the value in functions that call this method regularly to avoid
>>>> the penalty of the introduced lfence for each call?
>>> That would go back to the question of what good it does to
>>> latch value into a local variable when you don't know whether
>>> the compiler will put that variable in a register or in memory.
>>> IOW I'm afraid that to be on the safe side there's no way
>>> around the repeated LFENCEs.
>> The difference here would be that in case the value is stored into a
>> local variable (independently of memory or register) and an lfence was
>> executed, this value can be trusted and does not have to be checked
>> again, as it's no longer guest controlled.
> Ah, yes, you're right (it just wasn't clear to me that you implied
> adding a fence together with the caching of the value). So perhaps
> that's then also the way to go for the hunks under discussion in
> patch 3?

Well, yes. That should effectively bound the a.index values in the other
hunks in patch 3 as well. I will adapt that patch accordingly. Until
now, I stepped back from this solution, as I want to avoid using the
lfence instruction as much as possible.

Best,
Norbert





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.