[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC + Queries] Flow of PCI passthrough in ARM



On Wed, 1 Oct 2014, manish jaggi wrote:
> On 25 September 2014 15:57, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Thu, 25 Sep 2014, manish jaggi wrote:
> >> On 24 September 2014 19:40, Stefano Stabellini
> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> > CC'ing Matt and Dave at ARM for an opinions about device tree, SMMUs and
> >> > stream ids. See below.
> >> >
> >> > On Wed, 24 Sep 2014, manish jaggi wrote:
> >> >> On 22 September 2014 16:15, Stefano Stabellini
> >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> >> > On Thu, 18 Sep 2014, manish jaggi wrote:
> >> >> >> Hi,
> >> >> >> Below is the flow I am working on, Please provide your comments, I
> >> >> >> have a couple of queries as well..
> >> >> >>
> >> >> >> a) Device tree has smmu nodes and each smmu node has the mmu-master 
> >> >> >> property.
> >> >> >> In our Soc DT the mmu-master is a pcie node in device tree.
> >> >> >
> >> >> > Do you mean that both the smmu nodes and the pcie node have the
> >> >> > mmu-master property? The pcie node is the pcie root complex, right?
> >> >> >
> >> >> pci-node is the pcie root complex. pci node is the mmu master in smmu 
> >> >> node.
> >> >>
> >> >>   smmu1@0x8310,00000000 {
> >> >> ...
> >> >>
> >> >>                  mmu-masters = <&pcie1 0x100>;
> >> >>          };
> >> >>
> >> >> >> b) Xen parses the device tree and prepares a list which stores the 
> >> >> >> pci
> >> >> >> device tree node pointers. The order in device tree is mapped to
> >> >> >> segment number in subsequent calls. For eg 1st pci node found is
> >> >> >> segment 0, 2nd segment 1
> >> >> >
> >> >> > What's a segment number? Something from the PCI spec?
> >> >> > If you have several pci nodes on device tree, does that mean that you
> >> >> > have several different pcie root complexes?
> >> >> >
> >> >> yes.
> >> >> segment is the pci rc number.
> >> >> >
> >> >> >> c) During SMMU init the pcie nodes in DT are saved as smmu masters.
> >> >> >
> >> >> > At this point you should also be able to find via DT the stream-id 
> >> >> > range
> >> >> > supported by each SMMU and program the SMMU with them, assigning
> >> >> > everything to dom0.
> >> >> Currently pcie enumeration is not done in xen, it is done in dom0.
> >> >
> >> > Yes, but we don't really need to walk any PCIe busses in order to
> >> > program the SMMU, right? We only need the requestor id and the stream id
> >> > ranges. We should be able to get them via device tree.
> >> >
> >> Yes, but i have a doubt here
> >> Before booting dom0 for each smmu the mask in SMR can be set to enable
> >> stream ids to dom0.
> >> This can be fixed or read from device tree.
> >> There are 2 points here
> >> a) PCI bus enumeration
> >> b) Programming SMMU for dom0
> >> For (b) the enumeration is not required provided we set the mask
> >> So are you also saying that (a) should be done in Xen and not in dom0 ?
> >> If yes how would dom0 get to know about PCIe Eps , from its Device tree ?
> >
> > No, I think that doing (a) via PHYSDEVOP_pci_device_add is OK.
> > I am saying that we should consider doing (b) in Xen before booting
> > dom0.
> >
> >
> >> >> >> d) Dom0 Enumerates PCI devices, calls hypercall 
> >> >> >> PHYSDEVOP_pci_device_add.
> >> >> >>  - In Xen the SMMU iommu_ops add_device is called. I have implemented
> >> >> >> the add_device function.
> >> >> >> - In the add_device function
> >> >> >>  the segment number is used to locate the device tree node pointer of
> >> >> >> the pcie node which helps to find out the corresponding smmu.
> >> >> >> - In the same PHYSDEVOP the BAR regions are mapped to Dom0.
> >> >> >>
> >> >> >> Note: The current SMMU driver maps the complete Domain's Address 
> >> >> >> space
> >> >> >> for the device in SMMU hardware.
> >> >> >>
> >> >> >> The above flow works currently for us.
> >> >> >
> >> >> > It would be nice to be able to skip d): in a system where all dma 
> >> >> > capable
> >> >> > devices are behind smmus, we should be capable of booting dom0 without
> >> >> > the 1:1 mapping hack. If we do that, it would be better to program the
> >> >> > smmus before booting dom0. Otherwise there is a risk that dom0 is 
> >> >> > going
> >> >> > to start using these devices and doing dma before we manage to secure
> >> >> > the devices via smmus.
> >> >> >
> >> >> In our current case we are programming smmu in
> >> >> PHYSDEVOP_pci_device_add flow so before the domain 0 accesses the
> >> >> device it is mapped, otherwise xen gets a SMMU fault.
> >> >
> >> > Good.
> >> >
> >> >
> >> >> > Of course we can do that if there are no alternatives. But in our case
> >> >> > we should be able to extract the stream-ids from device tree and 
> >> >> > program
> >> >> > the smmus right away, right?  Do we really need to wait for dom0 to 
> >> >> > call
> >> >> > PHYSDEVOP_pci_device_add? We could just assign everything to dom0 for 
> >> >> > a
> >> >> > start.
> >> >> >
> >> >> We cannot get streamid from device tree as enumeration is done for the 
> >> >> same.
> >> >
> >> > I am not sure what the current state of the device tree spec is, but I
> >> > am pretty sure that the intention is to express stream id and requestor
> >> > id ranges directly in the dts so that the SMMU can be programmed right
> >> > away without walking the PCI bus.
> >> >
> >> >
> >> >> > I would like to know from the x86 guys, if this is really how it is
> >> >> > supposed to work on PVH too. Do we rely on PHYSDEVOP_pci_device_add to
> >> >> > program the IOMMU?
> >> >> >
> >> >> >
> >> >> I was waiting but no one has commented
> >> >
> >> > Me too. Everybody is very busy at the moment with the 4.5 release.
> >> >
> >> >
> >> >> >> Now when I call pci-assignable-add I see that the iommu_ops
> >> >> >> remove_device in smmu driver is not called. If that is not called the
> >> >> >> SMMU would still have the dom0 address space mappings for that device
> >> >> >>
> >> >> >> Can you please suggest the best place (kernel / xl-tools) to put the
> >> >> >> code which would call the remove_device in iommu_opps in the control
> >> >> >> flow from pci-assignable-add.
> >> >> >>
> >> >> >> One way I see is to introduce a DOMCTL_iommu_remove_device in
> >> >> >> pci-assignable-add / pci-detach and DOMCTL_iommu_add_device in
> >> >> >> pci-attach. Is that a valid approach  ?
> >> >> >
> >> >> > I am not 100% sure, but I think that before assigning a PCI device to
> >> >> > another guest, you are supposed to bind the device to xen-pciback (see
> >> >> > drivers/xen/xen-pciback, also see
> >> >> > http://wiki.xen.org/wiki/Xen_PCI_Passthrough). The pciback driver is
> >> >> > going hide the device from dom0 and as a consequence
> >> >> > drivers/xen/pci.c:xen_remove_device ends up being called, that issues 
> >> >> > a
> >> >> > PHYSDEVOP_pci_device_remove hypercall.
> >> >>
> >> >> xen_remove_device is not called at all. in pci-attach
> >> >> iommu_ops->assign_device is called.
> >> >> In Xen the nomenclature is confusing and no comments are there is 
> >> >> iommu.h
> >> >> iommu_ops.add_device is when dom0 issues hypercall
> >> >> iommu_ops.assign_dt_device is when xen attaches a device tree device to 
> >> >> dom0
> >> >> iommu_ops.assign_device is when xl pci-attach is called
> >> >> iommu_ops.reassign_device is called when xl pci-detach is called
> >> >>
> >> >> As of now we are able to assign devices to domU and driver in domU is
> >> >> running, we did some hacks like
> >> >> a) in xen pci front driver bus->msi is assigned to its msi_chip
> >> >>
> >> >> ---- pcifront_scan_root()
> >> >> ...
> >> >> b = pci_scan_bus_parented(&pdev->xdev->dev, bus,
> >> >>                   &pcifront_bus_ops, sd);
> >> >>     if (!b) {
> >> >>         dev_err(&pdev->xdev->dev,
> >> >>             "Error creating PCI Frontend Bus!\n");
> >> >>         err = -ENOMEM;
> >> >>         pci_unlock_rescan_remove();
> >> >>         goto err_out;
> >> >>     }
> >> >>
> >> >>     bus_entry->bus = b;
> >> >> +        msi_node = of_find_compatible_node(NULL,NULL, 
> >> >> "arm,gic-v3-its");
> >> >> +        if(msi_node) {
> >> >> +            b->msi = of_pci_find_msi_chip_by_node(msi_node);
> >> >> +            if(!b->msi) {
> >> >> +               printk(KERN_ERR"Unable to find bus->msi node \r\n");
> >> >> +               goto err_out;
> >> >> +            }
> >> >> +        }else {
> >> >> +               printk(KERN_ERR"Unable to find arm,gic-v3-its
> >> >> compatible node \r\n");
> >> >> +               goto err_out;
> >> >> +        }
> >> >
> >> > It seems to be that of_pci_find_msi_chip_by_node should be called by
> >> > common code somewhere else. Maybe people at linux-arm would know where
> >> > to suggest this initialization should go.
> >> >
> >> This is a workaround to attach a msi-controller to xen pcifront bus.
> >> We are avoiding the xen fronted ops for msi.
> >
> > I think I would need to see a proper patch series to really evaluate this 
> > change.
> >
> >
> >> >
> >> >> ----
> >> >>
> >> >> using this the ITS emulation code in xen is able to trap ITS command
> >> >> writes by driver.
> >> >> But we are facing a problem now, where your help is needed
> >> >>
> >> >> The StreamID is generated by segment: bus : device: number which is
> >> >> fed as DevID in ITS commands. In Dom0 the streamID is correctly
> >> >> generated but in domU the Stream ID for a passthrough device is
> >> >> 0:0:0:0 now when emulating this in Xen it is a problem as xen does not
> >> >> know how to get the physical stream id.
> >> >>
> >> >> (Eg: xl pci-attach 1 0001:00:05.0
> >> >> DomU has the device but in DomU the id is 0000:00:00.0.)
> >> >>
> >> >> Could you suggest how to go about this.
> >> >
> >> > I don't think that the ITS patches have been posted yet, so it is
> >> > difficult for me to understand the problem and propose a solution.
> >>
> >> In a simpler way, It is more of what the StreamID a driver running in
> >> domU sees. Which is programmed in the ITS commands.
> >> And how to map the domU  streamID to actual streamID in Xen when the
> >> ITS command write traps.
> >
> > Wouldn't it be possible to pass the correct StreamID to DomU via device
> > tree? Does it really need to match the PCI BDF?
> Device Tree provide static mapping, runtime attaching a device (using
> xl tools) to a domU is what I am working on.

As I wrote before it is difficult to answer without the patches and/or a
design document.

You should be able to specify StreamID ranges in Device Tree to cover a
bus. So you should be able to say that the virtual PCI bus in the guest
has StreamID [0-8] for slots [0-8]. Then in your example below you need
to make sure to insert the passthrough device in virtual slot 1 instead
of virtual slot 0.

I don't know if you were aware of this but you can already specify the
virtual slot number to pci-attach, see xl pci-attach --help

Otherwise you could let the frontend know the StreamID via xenbus: the
backend should know the correct StreamID for the device, it could just
add it to xenstore as a new parameter for the frontend.

Either way you should be able to tell the frontend what is the right
StreamID for the device.


> > Otherwise if the command trap into Xen, couldn't Xen do the translation?
> Xen does not know how to map the BDF in domU to actual streamID.
> 
> I had thought of adding a hypercall,  when xl pci-attach is called.
> PHYSDEVOP_map_streamid {
>     dom_id,
>     phys_streamid, //bdf
>     guest_streamid,
> }
> 
>  But I am not able to get correct BDF of domU.

I don't think that an hypercall is a good way to solve this.


> For instance the logs at 2 different place give diff BDFs
> 
> #xl pci-attach 1 '0002:01:00.1,permissive=1'
> 
> xen-pciback pci-1-0: xen_pcibk_export_device exporting dom 2 bus 1 slot 0 
> func 1
> xen_pciback: vpci: 0002:01:00.1: assign to virtual slot 1
> xen_pcibk_publish_pci_dev 0000:00:01.00
> 
> Code that generated print:
> static int xen_pcibk_publish_pci_dev(struct xen_pcibk_device *pdev,
>                                    unsigned int domain, unsigned int bus,
>                                    unsigned int devfn, unsigned int devid)
> {
>     ...
>         printk(KERN_ERR"%s %04x:%02x:%02x.%02x",__func__, domain, bus,
>                             PCI_SLOT(devfn), PCI_FUNC(devfn));
> 
> 
> While in xen_pcibk_do_op Print is:
> 
> xen_pcibk_do_op Guest SBDF=0:0:1.1 (this is output of lspci in domU)
> 
> Code that generated print:
> 
> void xen_pcibk_do_op(struct work_struct *data)
> {
>      ...
>         if (dev == NULL)
>                 op->err = XEN_PCI_ERR_dev_not_found;
>         else {
>         printk(KERN_ERR"%s Guest SBDF=%d:%d:%d.%d \r\n",__func__,
> op->domain, op->bus, op->devfn>>3, op->devfn&0x7);
> 
> 
> Stefano, I need your help in this

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.