[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 17 Mar 2023 11:19:02 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qrwZFrlOOl7QipG+6ZRLOoT0wnO0FQRw1VdZC4Q0T+I=; b=P1b2x6cPompfdfIZ2+PIlnw1hNuc3Ebf2gJ+grzDV8KzwGTkP+eNEOdgRG4fZUAS7T9Pdfb8TSoQfwfHzQ9pO33i4fC/ex6ZABA0B1DFxkbNl8lz4erBVLhzy3kUrvvHE/edSG5F15ymRjQj2rCvAOaAD7oRKWTuStuzavI5BaLQd+Y2qT6RSe61KSY5Qp2JQ2ra3AuZ+mGEmpjyglrfQNSWNqRUfIQvFu+/VTTYN9JLXk/EZ1JMyMOnNLqVlGBgqvuj9OTiwHAm+tct+dmZxhEh5xuAkDu2vkkEYigJoqybD1n7DwdSIm+pVQD8v9J7+WR+KJo9Dkd7BTLST6cDhg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Traaexq6tH8oIH3Iqyf/fT8g02KfHDVv2sJk2Ba7ABEWL01LCLc3p5sBtBnzy2iYOfhFqJHkMwYWVcmYZveTssfTiSw6v8A7sHi4rFw/7UCuIwMjBcPr+WHg6tkU9f3XxcmmIhOGw++mwR1YKtx8q4cnZAHm9VDO0f4QY+5vljqi4sIxUSe8bjvbtV6zuO/07O3WG3jhI3d/VhnRs3+TTveSmmWzBQC0kTNU66QR5zYOhruhXTXJP6CmnjZOYybz+sVX4b0TYJiyKu2TgCanO7cVRYy6x85JAnZ6GjtRlKIuv5AFnQe0sGNCiDL0C//Tt8Yvfwl+QjrfIPT20qe5+Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Juergen Gross <jgross@xxxxxxxx>, Alex Deucher <alexdeucher@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Honglei Huang <honglei1.huang@xxxxxxx>, amd-gfx@xxxxxxxxxxxxxxxxxxxxx, dri-devel@xxxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Stewart Hildebrand <Stewart.Hildebrand@xxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Huang Rui <ray.huang@xxxxxxx>, Chen Jiqian <Jiqian.Chen@xxxxxxx>, Xenia Ragiadakou <burzalodowa@xxxxxxxxx>, Alex Deucher <alexander.deucher@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Julia Zhang <julia.zhang@xxxxxxx>, Christian König <christian.koenig@xxxxxxx>
  • Delivery-date: Fri, 17 Mar 2023 10:19:40 +0000
  • Ironport-data: A9a23:R48UGqj7qCZeBOFzUAWF6HVfX161yBAKZh0ujC45NGQN5FlHY01je htvXG6BbvrYMDPyL94nOdm38k1T6sXVn4BkGgA9+XowRX8b9cadCdqndUqhZCn6wu8v7q5Ex 55HNoSfdpBcolv0/ErF3m3J9CEkvU2wbuOgTrWCYmYpHlUMpB4J0XpLg/Q+jpNjne+3CgaMv cKai8DEMRqu1iUc3lg8sspvkzsy+qWi0N8klgZmP6sT5waBzyB94K83fsldEVOpGuG4IcbiL wrz5OnR1n/U+R4rFuSknt7TGqHdauePVeQmoiM+t5mK2nCulARrukoIHKN0hXNsoyeIh7hMJ OBl7vRcf+uL0prkw4zxWzEAe8130DYvFLXveRBTuuTLp6HKnueFL1yDwyjaMKVBktubD12i+ tQTFGoOdEmt39mEg7SJTMhm2+F8L+3kadZ3VnFIlVk1DN4AaLWaGeDmwIEd2z09wMdTAfzZe swVLyJ1awjNaAFOPVFRD48imOCvhT/0dDgwRFC9/PJrpTSMilEgluGybLI5efTTLSlRtlyfq W/cuXzwHzkRNcCFyCrD+XWp7gPKtXqjANpPTezkrZaGhnW1xGs6UC86UGeKnrq3k3OZffJja GkLr39GQa8asRbDosPGdwajvHeOsxoYWtxRO+438geAzuzT+QnxLmIDVD9aLts9qNUxWycpx 3eOhdriATEpu7qQIVqN+7HRoT6sNCw9KW4ZeTRCXQYD+8Pkoow4klTIVNkLOKexg838Hz392 XaGoTU1h7gJpccO2+Ow+lWvqzixvIrASgk54Rredm2g5wJ9IoWiYuSAy1nC7P9Gaq2ZQ1+pt X0I3cOZ6YgmF5ic0iyQSeMCNLis67CONzi0qVlrEpo6/jKh4Um/bJtQ6zFzIkRuGssccDqva 0jW0Stq+JJMNWG2K492Z468Ad4jy6TIHNHpEPvTa7JmZpl3aR/C/yx0Y0OU937inVJqkqwlP 5qfN8G2Ah4yBb5miTa7WeoZ+bsq3Twlg23JSJ33wg+kzb2GInmPRt8tOV6CK+83/IuAoR7J6 JBRLcaHxxhEU/H5em/Q64F7BVANK3c/CJGv7cxKf+iMCgNjFCcqDPq56bErdpFhnq9VvvzV5 Xz7UUhdoHL2gXDXJQiXYXBucpvgWJ9+qTQwOilEFVSnwX84eq608bwSMZAweNEP9vFnzPcyX fkMfcqoC/FDDD/A/lw1ZpnnrZd+dR2twwGJMiykbxA2epImTAvMkuIIZSPq/SgKSyay6803p uT60huBGMVfAQN/EMzRdfSjiUurumQQk/5zWE2OJcRPfELr885hLCmZYuILHvzg4C7rnlOyv zt6yz9CzQURi+fZKOX0uJ0=
  • Ironport-hdrordr: A9a23:rJlYmqFmvwIMCSJppLqFtpHXdLJyesId70hD6qkoc20sTiSZ// rPoB1p726OtN9xYgBWpTnkAsO9qBznhPpICOUqU4tKGTOWwVdAT7sSm7cKoQeQfBEWn9Q1vc wMH8dD4Z/LfD5HZK3BkWqF+qMbsby6GdeT9IXjJhlWLD2CIJsQlTuRQjzrb3FedU1+Hpw+G4 Ob5s1b4xSdWVl/VLXyOlA1G9HZodvFjZTnZgNDISUGxk2hsROEgYSKWiRw8C1uLw+mm9oZgB f4ey+V3NTcjxn6pyWsp1M6oflt6ajcIpUoPr35tiCmRw+c+zqAdcB9X7WZsHQvrPuy7UtCqq i/nz4we9l242ncOn64ugHs3Q6I6kda10Pf
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Mar 16, 2023 at 04:09:44PM -0700, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@xxxxxxxx> wrote:
> > > > 
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> > > > > > 
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > > 
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how 
> > > > > > > > > > can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > > 
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, 
> > > > > > > > > there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > > 
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > > 
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > > 
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > > 
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it 
> > > > > > > does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), 
> > > > > > > we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > > 
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > > 
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > > 
> > > > But the swiotlb isn't per device, but system global.
> > > 
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> > 
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA 
> > mask
> > won't use the swiotlb unless you have a buffer above guest physical address 
> > of
> > 16TB (so basically never).
> > 
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> > 
> > I don't think this is acceptable.
> 
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.

But this is already the case on PVH, swiotlb_xen won't be enabled.
swiotlb_xen is only enabled for PV domains, other domain types don't
enable it under any circumstance on x86.

> I think we should leave the global "swiotlb" setting alone. The global
> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
> have a way to deal with swiotlb/GPU incompatibilities.
> 
> We just have to avoid making things worse on Xen, and for that we just
> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
> swiotlb, then we have a good Linux configuration capable of handling the
> GPU properly.

Given that this patch is basically a non-functional change (because
the modified functions are only called for PV domains) I think we all
agree that swiotlb_xen should never be used on PVH, and native swiotlb
might be required depending on the DMA address restrictions of the
devices on the system.  So no change required.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.