[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document



On Thu, Feb 02, 2017 at 03:12:52PM -0800, Stefano Stabellini wrote:
> On Thu, 2 Feb 2017, Edgar E. Iglesias wrote:
> > On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> > > Hi Edgar,
> > > 
> > > On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > > >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > > >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > > >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> > > >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > > >>>>For generic host bridge, the initialization is inexistent. However 
> > > >>>>some host
> > > >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> > > >>>>configuring clocks. Given that Xen only requires to access the 
> > > >>>>configuration
> > > >>>>space, I was thinking to let DOM0 initialization the host bridge. 
> > > >>>>This would
> > > >>>>avoid to import a lot of code in Xen, however this means that we need 
> > > >>>>to
> > > >>>>know when the host bridge has been initialized before accessing the
> > > >>>>configuration space.
> > > >>>
> > > >>>
> > > >>>Yes, that's correct.
> > > >>>There's a sequence on the ZynqMP that involves assiging Gigabit 
> > > >>>Transceivers
> > > >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > > >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> > > >>>
> > > >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> > > >>>If so, bootloaders would have to know a head of time what devices
> > > >>>the GTs are supposed to be configured for.
> > > >>
> > > >>I've got further questions regarding the Gigabit Transceivers. You 
> > > >>mention
> > > >>they are shared, do you mean that multiple devices can use a GT at the 
> > > >>same
> > > >>time? Or the software is deciding at startup which device will use a 
> > > >>given
> > > >>GT? If so, how does the software make this decision?
> > > >
> > > >Software will decide at startup. AFAIK, the allocation is normally done
> > > >once but I guess that in theory you could design boards that could switch
> > > >at runtime. I'm not sure we need to worry about that use-case though.
> > > >
> > > >The details can be found here:
> > > >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > >
> > > >I suggest looking at pages 672 and 733.
> > > 
> > > Thank you for the documentation. I am trying to understand if we could 
> > > move
> > > initialization in Xen as suggested by Stefano. I looked at the driver in
> > > Linux and the code looks simple not many dependencies. However, I was not
> > > able to find where the Gigabit Transceivers are configured. Do you have 
> > > any
> > > link to the code for that?
> > 
> > Hi Julien,
> > 
> > I suspect that this setup has previously been done by the initial bootloader
> > auto-generated from design configuration tools.
> > 
> > Now, this is moving into Linux.
> > There's a specific driver that does that but AFAICS, it has not been 
> > upstreamed yet.
> > You can see it here:
> > https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> > 
> > DTS nodes that need a PHY can then just refer to it, here's an example from 
> > SATA:
> > &sata {
> >         phy-names = "sata-phy";
> >         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> > };
> > 
> > I'll see if I can find working examples for PCIe on the ZCU102. Then I'll 
> > share
> > DTS, Kernel etc.
> > 
> > If you are looking for a platform to get started, an option could be if I 
> > get you a build of
> > our QEMU that includes models for the PCIe controller, MSI and SMMU 
> > connections.
> > These models are friendly wrt. PHY configs and initialization sequences, it 
> > will
> > accept pretty much any sequence and still work. This would allow you to 
> > focus on
> > architectural issues rather than exact details of init sequences (which we 
> > can
> > deal with later).
> > 
> > 
> > 
> > > 
> > > This would also mean that the MSI interrupt controller will be moved in 
> > > Xen.
> > > Which I think is a more sensible design (see more below).
> > > 
> > > >>
> > > >>>>      - For all other host bridges => I don't know if there are host 
> > > >>>> bridges
> > > >>>>falling under this category. I also don't have any idea how to handle 
> > > >>>>this.
> > > >>>>
> > > >>>>>
> > > >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > > >>>>>and Xen is the one to provide the emulated host bridge, how are DomU 
> > > >>>>>PCI
> > > >>>>>config reads and writes supposed to work in details?
> > > >>>>
> > > >>>>I think I have answered to this question with my explanation above. 
> > > >>>>Let me
> > > >>>>know if it is not the case.
> > > >>>>
> > > >>>>>How is MSI configuration supposed to work?
> > > >>>>
> > > >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > > >>>>per-device) and the address of the doorbell. The linkage between the 
> > > >>>>LPI and
> > > >>>>"MSI" will be done through the ITS.
> > > >>>>
> > > >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > > >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are 
> > > >>>>mapped
> > > >>>>1:1.
> > > >>>>
> > > >>>>So in both case, I don't think it is necessary to trap MSI 
> > > >>>>configuration for
> > > >>>>DOM0. This may not be true if we want to handle other MSI controller.
> > > >>>>
> > > >>>>I have in mind the xilinx MSI controller (embedded in the host 
> > > >>>>bridge? [4])
> > > >>>>and xgene MSI controller ([5]). But I have no idea how they work and 
> > > >>>>if we
> > > >>>>need to support them. Maybe Edgar could share details on the Xilinx 
> > > >>>>one?
> > > >>>
> > > >>>
> > > >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, 
> > > >>>there's no
> > > >>>way to protect the MSI doorbells from mal-configured end-points 
> > > >>>raising malicious EventIDs.
> > > >>>So perhaps trapped config accesses from domUs can help by adding this 
> > > >>>protection
> > > >>>as drivers configure the device.
> > > >>>
> > > >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > > >>>out the EventID from a FIFO in the controller and injects a new IRQ 
> > > >>>into
> > > >>>the kernel.
> > > >>
> > > >>It might be early to ask, but how do you expect  MSI to work with DOMU 
> > > >>on
> > > >>your hardware? Does your MSI controller supports virtualization? Or are 
> > > >>you
> > > >>looking for a different way to inject MSI?
> > > >
> > > >MSI support in HW is quite limited to support domU and will require SW 
> > > >hacks :-(
> > > >
> > > >Anyway, something along the lines of this might work:
> > > >
> > > >* Trap domU CPU writes to MSI descriptors in config space.
> > > >  Force real MSI descriptors to the address of the door bell area.
> > > >  Force real MSI descriptors to use a specific device unique Event ID 
> > > > allocated by Xen.
> > > >  Remember what EventID domU requested per device and descriptor.
> > > >
> > > >* Xen or Dom0 take the real SPI generated when device writes into the 
> > > >doorbell area.
> > > >  At this point, we can read out the EventID from the MSI FIFO and map 
> > > > it to the one requested from domU.
> > > >  Xen or Dom0 inject the expected EventID into domU
> > > >
> > > >Do you have any good ideas? :-)
> > > 
> > > From my understanding your MSI controller is embedded in the hostbridge,
> > > right? If so, the MSIs would need to be handled where the host bridge will
> > > be initialized (e.g either Xen or DOM0).
> > 
> > Yes, it is.
> > 
> > > 
> > > From a design point of view, it would make more sense to have the MSI
> > > controller driver in Xen as the hostbridge emulation for guest will also
> > > live there.
> > > 
> > > So if we receive MSI in Xen, we need to figure out a way for DOM0 and 
> > > guest
> > > to receive MSI. The same way would be the best, and I guess non-PV if
> > > possible. I know you are looking to boot unmodified OS in a VM. This would
> > > mean we need to emulate the MSI controller and potentially xilinx PCI
> > > controller. How much are you willing to modify the OS?
> > 
> > Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> > things are very open and we could design with pretty much anything in mind.
> > 
> > Yes, we could perhaps include a very small model with most registers 
> > dummied.
> > Implementing the MSI read FIFO would allow us to:
> > 
> > 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> >    IRQ as on real HW.
> > 
> > 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> > 
> > 
> > 
> > > Regarding the MSI doorbell, I have seen it is configured by the software
> > > using a physical address of a page allocated in the RAM. When the PCI
> > > devices is writing into the doorbell does the access go through the SMMU?
> > 
> > That's a good question. On our QEMU model it does, but I'll have to dig a 
> > little to see if that is the case on real HW aswell.
> > 
> > > Regardless the answer, I think we would need to map the MSI doorbell page 
> > > in
> > > the guest. Meaning that even if we trap MSI configuration access, a guess
> > > could DMA in the page. So if I am not mistaken, MSI would be insecure in
> > > this case :/.
> > > 
> > > Or maybe we could avoid mapping the doorbell in the guest and let Xen
> > > receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize 
> > > the
> > > value and write into the real MSI doorbell. Not sure if it would works
> > > thought.
> > 
> > Yeah, this is a problem.
> > I'm not sure if SMMU aborts would work because I don't think we know the 
> > value of the data written when we take the abort.
> > Without the data, I'm not sure how we would distinguish between different 
> > MSI's from the same device.
> > 
> > Also, even if the MSI doorbell would be protected by the SMMU, all PCI 
> > devices are presented with the same AXI Master ID.
> 
> Does that mean that from the SMMU perspective you can only assign them
> all or none?

Unfortunately yes.


> > BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't 
> > it?
> > Or do you have ideas around that? Perhaps some PV way to request mappings 
> > for DMA?
> 
> No, we don't have anything like that. There are too many device specific
> ways to request DMAs to do that. For devices that cannot be effectively
> protected by IOMMU, (on x86) we support assignment but only in an
> insecure fashion.

OK, I see.

A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
PCI DMA devs could be locked in to only be able to access this mem + MSI 
doorbell.
Guests can still screw each other up but at least it becomes harder to 
read/write directly from each others OS memory.
It may not be worth the effort though....

Cheers,
Edgar




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.