[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Upstream Dom0 DRM problems regarding swiotlb


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Michael Labriola <michael.d.labriola@xxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Thu, 14 Feb 2019 07:03:38 +0100
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNHkp1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmRlPsLAeQQTAQIAIwUCU4xw6wIbAwcL CQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJELDendYovxMvi4UH/Ri+OXlObzqMANruTd4N zmVBAZgx1VW6jLc8JZjQuJPSsd/a+bNr3BZeLV6lu4Pf1Yl2Log129EX1KWYiFFvPbIiq5M5 kOXTO8Eas4CaScCvAZ9jCMQCgK3pFqYgirwTgfwnPtxFxO/F3ZcS8jovza5khkSKL9JGq8Nk czDTruQ/oy0WUHdUr9uwEfiD9yPFOGqp4S6cISuzBMvaAiC5YGdUGXuPZKXLpnGSjkZswUzY d9BVSitRL5ldsQCg6GhDoEAeIhUC4SQnT9SOWkoDOSFRXZ+7+WIBGLiWMd+yKDdRG5RyP/8f 3tgGiB6cyuYfPDRGsELGjUaTUq3H2xZgIPfOwE0EU4xwFgEIAMsx+gDjgzAY4H1hPVXgoLK8 B93sTQFN9oC6tsb46VpxyLPfJ3T1A6Z6MVkLoCejKTJ3K9MUsBZhxIJ0hIyvzwI6aYJsnOew cCiCN7FeKJ/oA1RSUemPGUcIJwQuZlTOiY0OcQ5PFkV5YxMUX1F/aTYXROXgTmSaw0aC1Jpo w7Ss1mg4SIP/tR88/d1+HwkJDVW1RSxC1PWzGizwRv8eauImGdpNnseneO2BNWRXTJumAWDD pYxpGSsGHXuZXTPZqOOZpsHtInFyi5KRHSFyk2Xigzvh3b9WqhbgHHHE4PUVw0I5sIQt8hJq 5nH5dPqz4ITtCL9zjiJsExHuHKN3NZsAEQEAAcLAXwQYAQIACQUCU4xwFgIbDAAKCRCw3p3W KL8TL0P4B/9YWver5uD/y/m0KScK2f3Z3mXJhME23vGBbMNlfwbr+meDMrJZ950CuWWnQ+d+ Ahe0w1X7e3wuLVODzjcReQ/v7b4JD3wwHxe+88tgB9byc0NXzlPJWBaWV01yB2/uefVKryAf AHYEd0gCRhx7eESgNBe3+YqWAQawunMlycsqKa09dBDL1PFRosF708ic9346GLHRc6Vj5SRA UTHnQqLetIOXZm3a2eQ1gpQK9MmruO86Vo93p39bS1mqnLLspVrL4rhoyhsOyh0Hd28QCzpJ wKeHTd0MAWAirmewHXWPco8p1Wg+V+5xfZzuQY0f4tQxvOpXpt4gQ1817GQ5/Ed/wsDtBBgB CAAgFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrd8NACGwIAgQkQsN6d1ii/Ey92IAQZFggA HRYhBFMtsHpB9jjzHji4HoBcYbtP2GO+BQJa3fDQAAoJEIBcYbtP2GO+TYsA/30H/0V6cr/W V+J/FCayg6uNtm3MJLo4rE+o4sdpjjsGAQCooqffpgA+luTT13YZNV62hAnCLKXH9n3+ZAgJ RtAyDWk1B/0SMDVs1wxufMkKC3Q/1D3BYIvBlrTVKdBYXPxngcRoqV2J77lscEvkLNUGsu/z W2pf7+P3mWWlrPMJdlbax00vevyBeqtqNKjHstHatgMZ2W0CFC4hJ3YEetuRBURYPiGzuJXU pAd7a7BdsqWC4o+GTm5tnGrCyD+4gfDSpkOT53S/GNO07YkPkm/8J4OBoFfgSaCnQ1izwgJQ jIpcG2fPCI2/hxf2oqXPYbKr1v4Z1wthmoyUgGN0LPTIm+B5vdY82wI5qe9uN6UOGyTH2B3p hRQUWqCwu2sqkI3LLbTdrnyDZaixT2T0f4tyF5Lfs+Ha8xVMhIyzNb1byDI5FKCb
  • Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx
  • Delivery-date: Thu, 14 Feb 2019 06:03:49 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 14/02/2019 01:11, Andrew Cooper wrote:
> On 13/02/2019 21:08, Michael Labriola wrote:
>> On Wed, Feb 13, 2019 at 3:21 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
>> wrote:
>>> On 13/02/2019 20:15, Michael Labriola wrote:
>>>> On Wed, Feb 13, 2019 at 2:16 PM Konrad Rzeszutek Wilk
>>>> <konrad.wilk@xxxxxxxxxx> wrote:
>>>>> On Wed, Feb 13, 2019 at 01:38:21PM -0500, Michael Labriola wrote:
>>>>>> On Wed, Feb 13, 2019 at 1:16 PM Michael Labriola
>>>>>> <michael.d.labriola@xxxxxxxxx> wrote:
>>>>>>> On Wed, Feb 13, 2019 at 11:57 AM Konrad Rzeszutek Wilk
>>>>>>> <konrad.wilk@xxxxxxxxxx> wrote:
>>>>>>>> On Wed, Feb 13, 2019 at 09:09:32AM -0700, Jan Beulich wrote:
>>>>>>>>>>>> On 13.02.19 at 17:00, <michael.d.labriola@xxxxxxxxx> wrote:
>>>>>>>>>> On Wed, Feb 13, 2019 at 9:28 AM Jan Beulich <JBeulich@xxxxxxxx> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 13.02.19 at 15:10, <michael.d.labriola@xxxxxxxxx> wrote:
>>>>>>>>>>>> Ah, so this isn't necessarily Xen-specific but rather any 
>>>>>>>>>>>> paravirtual
>>>>>>>>>>>> guest?  That hadn't crossed my mind.  Is there an easy way to find 
>>>>>>>>>>>> out
>>>>>>>>>>>> if we're a pv guest in the need_swiotlb conditionals?
>>>>>>>>>>> There's xen_pv_domain(), but I think xen_swiotlb would be more to
>>>>>>>>>>> the point if the check is already to be Xen-specific. There's no 
>>>>>>>>>>> generic
>>>>>>>>>>> "is PV" predicate that I'm aware of.
>>>>>>>>>> Well, that makes doing conditional code right more difficult.  I
>>>>>>>>>> assume since there isn't a generic predicate, and PV isn't new, that
>>>>>>>>>> it's absence is by design?  To reign in the temptation to sprinkle
>>>>>>>>>> conditional code all over the kernel?  ;-)
>>>>>>>>> Well, with lguest gone, Xen is the only PV environment the kernel
>>>>>>>>> can run in, afaik at least. I guess to decide between the suggested
>>>>>>>>> options or the need for some abstracting macro (or yet something
>>>>>>>>> else), you'll really need to ask the driver maintainers. Or simply
>>>>>>>>> send a patch their way implementing one of them, and see what
>>>>>>>>> their reaction is.
>>>>>>>> Could you try this out and see if it works and I will send it out:
>>>>>>>>
>>>>>> *snip*
>>>>>>> Yes, that works for me.  However, I feel like the conditional should
>>>>>>> be in drm_get_max_iomem() instead of directly after it everywhere it's
>>>>>>> used...  and is just checking xen_pv_domain() enough?  Jan made it
>>>>>>> sound like there were possibly other PV cases that would also still
>>>>>>> need swiotlb.
>>>>>> How about this?  It strcmp's pv_info to see if we're bare metal, does
>>>>>> the comparison in a single place, moves the bit shifting comparison
>>>>>> into the function (simplifying the drm driver code), and renames the
>>>>>> function to more aptly describe what's going on.
>>>>> <nods> That looks much better.
>>>> Great!  Now the only question left is:  KVM, VMware, Xen PVH, Xen HVM,
>>>> and Xen PV all populate pv_info.  Do any of those other than Xen PV
>>>> *really* need swiotlb.  That's slightly over my head.  As written, my
>>>> patch would require swiotlb for all of them because I was attempting
>>>> to not be Xen-specific.
>>> Its far more complicated that "Xen PV requires swiotlb".
>>>
>>> I presume the underlying problem here is DRM being special and not
>>> DMA-mapping its buffers, and trying to DMA to a buffer crossing a 4k
>>> boundary?
>> Well, I don't 100% understand how all these things work...  but here's
>> what I do know.  There are a series of commits in v4.17 that try to
>> optimize the radeon and amdgpu drivers by skipping calls to
>> ttm_dma_populate() and ttm_dma_unpopulate() unless they're "really
>> needed".  The original commit determines if swiotlb is needed by
>> checking to see if the max io mapping address of system memory is over
>> the video card's accessing range.  I can no longer log into Gnome on a
>> Xen dom0 after upgrading my kernel to v4.20 because the code that's no
>> longer happening was actually needed in a paravirtualized environment.
> 
> But from the log you provided, your bug was space exhaustion in the
> swiotlb, no?
> 
>> So, I'm trying to get all my details straight so I can submit a patch
>> to fix it w/out saying anything factually incorrect.
> 
> The thing which is different between Xen PV guests and most others (all
> others(?), now that Lguest and UML have been dropped) is that what Linux
> thinks of as PFN $N isn't necessarily adjacent to PFN $N+1 in system
> physical address space.
> 
> Therefore, code which has a buffer spanning a page boundary can't just
> convert a pointer to the buffer into a physical address, and hand that
> address to a device.  You generally end up with either memory corruption
> (DMA hitting the wrong page allocated to the guest), or an IOMMU fault
> (DMA hitting a pages which isn't allocated to the guest).
> 
> Xen PV is very good at finding DMA bugs in drivers.  The way to resolve
> this is to fix the driver to use the proper DMA APIs - not to add even
> more magic corner cases.
> 
> In general, a lot of devices can do 4k scatter/gather, or end up making
> requests to buffers which fit within a single page, but the SWIOTLB does
> act as a mechanism of last resort.  It has a massive performance penalty
> (due to double buffering), and does have a tendency to fragment (due to
> asymmetric size requests).
> 
> However, there is one DMA mode (in the process of getting properly
> upstream, but has been used for several years by various downstreams)
> where IOVA == Linux's idea of contiguous PFN space, so you can do odd
> sized DMAs which cross page boundaries.
> 
> The point is that the DMA ops (and *only* the DMA ops, from a
> correctness standpoint) know how to convert PFNs into IO-virtual
> addresses for devices, because it may not be a 1:1 mapping.  Nothing
> else in the kernel can legitimately be making decisions like this.

Correct. Adding Christoph who might want to add something.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.