[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support



On Tue, Jul 01, 2014 at 05:46:58PM +0800, Chen, Tiejun wrote:
> On 2014/7/1 17:12, Michael S. Tsirkin wrote:
> >On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote:
> >>On 2014/6/30 19:28, Michael S. Tsirkin wrote:
> >>>On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
> >>>>On 2014/6/30 17:55, Michael S. Tsirkin wrote:
> >>>>>On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
> >>>>>>On 2014/6/30 17:05, Michael S. Tsirkin wrote:
> >>>>>>>On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
> >>>>>>>>On 2014/6/30 14:48, Michael S. Tsirkin wrote:
> >>>>>>>>>On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
> >>>>>>>>>>On 2014/6/26 18:03, Paolo Bonzini wrote:
> >>>>>>>>>>>Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>- offsets 0x0000..0x0fff map to configuration space of the host 
> >>>>>>>>>>>>>MCH
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>Are you saying the config space in the video device?
> >>>>>>>>>>>
> >>>>>>>>>>>No, I am saying in a new BAR, or at some magic offset of an 
> >>>>>>>>>>>existing
> >>>>>>>>>>>MMIO BAR.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>As I mentioned previously, the IGD guy told me we have no any 
> >>>>>>>>>>unused a
> >>>>>>>>>>offset or BAR in the config space.
> >>>>>>>>>>
> >>>>>>>>>>And guy who are responsible for the native driver seems not be 
> >>>>>>>>>>accept to
> >>>>>>>>>>extend some magic offset of an existing MMIO BAR.
> >>>>>>>>>>
> >>>>>>>>>>In addition I think in a short time its not possible to migrate 
> >>>>>>>>>>i440fx to
> >>>>>>>>>>q35 as a PCIe machine of xen.
> >>>>>>>>>
> >>>>>>>>>That seems like a weak motivation.  I don't see a need to get 
> >>>>>>>>>something
> >>>>>>>>>merged upstream in a short time: this seems sure to miss 2.1,
> >>>>>>>>>so you have the time to make it architecturally sound.
> >>>>>>>>>"Making existing guests work" would be a better motivation.
> >>>>>>>>
> >>>>>>>>Yes.
> >>>>>>>
> >>>>>>>So focus on this then. Existing guests will probably work
> >>>>>>>fine on a newer chipset - likely better than on i440fx.
> >>>>>>>xen management tools need to do some work to support this?
> >>>>>>>That will just give everyone more long term benefits.
> >>>>>>>
> >>>>>>>If instead we create a hack that does not resemble
> >>>>>>>any existing hardware even remotely, what's the
> >>>>>>>chance that it will not break with any future
> >>>>>>>guest modification?
> >>>>>>>
> >>>>>>>
> >>>>>>>>>Isn't this possible with an mch chipset?
> >>>>>>>>
> >>>>>>>>If you're saying q35, I mean AFAIK we have no any plan to migrate to 
> >>>>>>>>this
> >>>>>>>>MCH in xen case.
> >>>>>>>
> >>>>>>>q35 or a newer chipset that's closer to what guests expect.
> >>>>>>>
> >>>>>>>>Additionally, I think I should follow this feature after
> >>>>>>>>q35 can work for xen scenario.
> >>>>>>>
> >>>>>>>What's stopping you?
> >>>>>>
> >>>>>>I mean if you want create an new machine based on q35, actually this is
> >>>>>>equal to start making xen to migrate to q35 now. Right? I can't image 
> >>>>>>how
> >>>>>>much effort should be done.
> >>>>>
> >>>>>I don't see why you don't try. Sounds like a more robust solution to me.
> >>>>
> >>>>As I think this is another requirement to us. I'm not sure if I have 
> >>>>enough
> >>>>time to touch this currently.
> >>>>
> >>>>>
> >>>>>>So this is a reason why I'm saying I'd like to follow this feature 
> >>>>>>after q35
> >>>>>>can work with xen completely.
> >>>>>
> >>>>>Then we'll end up with more configurations to support, and to what end?
> >>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>So could we do this step by step:
> >>>>>>>>>>
> >>>>>>>>>>#1 phase: We just cover current qemu-xen implementation based on 
> >>>>>>>>>>i44fx, so
> >>>>>>>>>>still provide that pseudo ISA bridge at 00:1f.0 as we already did.
> >>>>>>>>>
> >>>>>>>>>By the way there is no reason to put it at 00:1f.0 specifically I 
> >>>>>>>>>think.
> >>>>>>>>>So it seems simple: create a dummy device that gets device and
> >>>>>>>>>vendor id as properties. If you really like, add an option to get 
> >>>>>>>>>values
> >>>>>>>>
> >>>>>>>>Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
> >>>>>>>>passthrough: create pseudo intel isa bridge. There, we fake this 
> >>>>>>>>device just
> >>>>>>>>at 00:1f.0.
> >>>>>>>>But you guys don't like this, and shouldn't this be just this point we
> >>>>>>>>discussing now?
> >>>>>>>>
> >>>>>>>>If you guys agree that , everything is fine.
> >>>>>>>
> >>>>>>>Actually, this isn't what you did.
> >>>>>>>Don't tie it to xen, and don't tie it to 1f.
> >>>>>>>Just make it a simple stub pci device.
> >>>>>>>Whoever wants it, creates it.
> >>>>>>>
> >>>>>>>The thing I worry about, is the chance this will break going forward.
> >>>>>>>So you created a system with 2 isa bridges.
> >>>>>>
> >>>>>>I don't set class type to claim this is an ISA bridge, just a normal PCI
> >>>>>>device.
> >>>>>>+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
> >>>>>>*hdev)
> >>>>>>+{
> >>>>>>+    struct PCIDevice *dev;
> >>>>>>+
> >>>>>>+    char rid;
> >>>>>>+
> >>>>>>+    /* We havt to use a simple PCI device to fake this ISA bridge
> >>>>>>+     * to avoid making some confusion to BIOS and ACPI.
> >>>>>>+     */
> >>>>>>+    dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
> >>>>>>"pseudo-intel-pch-isa-bridge");
> >>>>>>+
> >>>>>>+    qdev_init_nofail(&dev->qdev);
> >>>>>>+
> >>>>>>+    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
> >>>>>>+    pci_config_set_device_id(dev->config, hdev->device_id);
> >>>>>>+
> >>>>>>+    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
> >>>>>>+
> >>>>>>+    pci_config_set_revision(dev->config, rid);
> >>>>>>+
> >>>>>>+    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
> >>>>>>+    return 0;
> >>>>>>+}
> >>>>>
> >>>>>Then I don't see how it's supposed to work.
> >>>>>Doesn't i915 look for an isa bridge?
> >>>>>
> >>>>>         /*
> >>>>>          * The reason to probe ISA bridge instead of Dev31:Fun0 is to
> >>>>>          * make graphics device passthrough work easy for VMM, that only
> >>>>>          * need to expose ISA bridge to let driver know the real 
> >>>>> hardware
> >>>>>          * underneath. This is a requirement from virtualization team.
> >>>>>          *
> >>>>>          * In some virtualized environments (e.g. XEN), there is 
> >>>>> irrelevant
> >>>>>          * ISA bridge in the system. To work reliably, we should scan 
> >>>>> trhough
> >>>>>          * all the ISA bridge devices and check for the first match, 
> >>>>> instead
> >>>>>          * of only checking the first one.
> >>>>>          */
> >>>>>         while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>>                 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
> >>>>>                         unsigned short id = pch->device & 
> >>>>> INTEL_PCH_DEVICE_ID_MASK;
> >>>>>                         dev_priv->pch_id = id;
> >>>>>
> >>>>>                         if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
> >>>>>                                 dev_priv->pch_type = PCH_IBX;
> >>>>>
> >>>>>etc
> >>>>>
> >>>>
> >>>>I already post this to mainline to change as follows:
> >>>>
> >>>>- while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>+ pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
> >>>>+ if (pch) {
> >>>
> >>>I see - responded on that mail - but I thought the point is to make
> >>>existing guests run? In fact you said so explicitly.
> >>>
> >>>
> >>>>Please refer to this,
> >>>>
> >>>>[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of
> >>>>check class type
> >>>>
> >>>>Linux Native guys would like to accept this. And actually Windows always 
> >>>>use
> >>>>devfn to detect this.
> >>>
> >>>In fact I see this:
> >>>
> >>>     linux 2.6.35-3.9 probes the 1st IA bridge
> >>>
> >>>         no idea how would you fix this.
> >>>         try changing default class for the main bridge?
> >>>
> >>>     linux 3.10 probes all ISA bridges
> >>>
> >>>         requires your stub to be the isa bridge?
> >>>
> >>>
> >>>I don't see why are driver guys so willing to do crazy things.  They
> >>>want to match specific device/vendor id pairs, why don't they do just
> >>>that? Why poke at class, random slot numbers etc etc?
> >>
> >>AFAIK what they did is from our previous incorrect requirement as I
> >>understand. So we need to correct this.
> >>
> >>Thanks
> >>Tiejun
> >
> >Since we can't fix existing guests, I would say
> 
> This shouldn't be a fix existing guest, and this is why I can send this
> before you guys accept GFX passthrough for qemu ML.
> 
> I think you can re-read that patch head description. 1f.0 can work under all
> scenarios including qemu-xen-traditonal.
> 
> And this is also expected by native Linux organically, and especially
> Windows always use devfn to detect PCH, this is not like current Linux. So
> here I just sync this to make sense.
> 
> Unless you'd like to make Linux specific to this point.
> 
> Tiejun

Why don't both windows and linux drivers look device up
by device and vendor id?
Same applies to MCH really.


> >get things working first, find a clean way for
> >new driver to work next.
> >
> >>>
> >>>
> >>>
> >>>>>
> >>>>>
> >>>>>>>This is already not something that exists on real hardware.
> >>>>>>>So it might break some guests that will get confused.
> >>>>>>>Maybe we are lucky and most guests see an unfamiliar device
> >>>>>>>and ignore it. It seems believable.
> >>>>>>>
> >>>>>>>But your MCH hack overrides registers in the pci host.
> >>>>>>
> >>>>>>We just try to write *one* register we already confirm this is safe 
> >>>>>>enough.
> >>>>>
> >>>>>This should go in code in form of comments:
> >>>>>document what this register does on 440fx
> >>>>>and why it's safe to override.
> >>>>>We don't see what you
> >>>>>confirmed off-line.
> >>>>
> >>>>That offset is one specific to IGD usage, not for i440fx common. This is 
> >>>>why
> >>>>we need to expose something in the host bridge. They're just introduced to
> >>>>support IGD.
> >>>>
> >>>>>
> >>>>>>Other register are read-only.
> >>>>>
> >>>>>Doesn't matter, need to document these as well.
> >>>>
> >>>>I think everything are covered in igd_pci_write()/igd_pci_write().
> >>>>
> >>>>>
> >>>>>>>Are we lucky and there's nothing in these registers
> >>>>>>>of interest to guests? This seems much more fragile.
> >>>>>>>So please poke at the spec, and compile the list
> >>>>>>>of registers you want to touch, figure out why they are
> >>>>>>>safe to override, and put this all in code comments.
> >>>>>>>
> >>>>>>>And the same thing that applies to the isa bridge
> >>>>>>
> >>>>>>We just expose its own vendor/device ids here. We don't access to change
> >>>>>>anything in the isa bridge.
> >>>>>>
> >>>>>>>applies here too. It should work without QEMU touching
> >>>>>>>hosts' hardware.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>from sysfs: device and vendor id are world readable, so just get them
> >>>>>>>>>directly and not through xen wrappers, this way you can open the 
> >>>>>>>>>files
> >>>>>>>>>RO and not RW.
> >>>>>>>>>You seem to poke at revision as well, I don't see
> >>>>>>>>>driver looking at that - strictly necessary?
> >>>>>>>>>If yes please patch host kernel to expose that info in sysfs,
> >>>>>>>>>though we can fall back on pci config if not there.
> >>>>>>>>>
> >>>>>>>>>MCH (bridge_dev) hacks in i915 are nastier.
> >>>>>>>>>To clean them up, we really have to have an explicit driver for this
> >>>>>>>>>bridge, not a pass-through device.  Long term, the right thing to do 
> >>>>>>>>>is
> >>>>>>>>>likely to extend host driver and expose the necessary information in
> >>>>>>>>>sysfs on host kernel.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>I'm a bit confused. Any sysfs should be filled by the associated PCIe
> >>>>>>>>device, shouldn't it? So qemu still need to emulate this PCIe device
> >>>>>>>>firstly, then set properties into sysfs.
> >>>>>>>
> >>>>>>>I am talking about getting host properties into qemu.
> >>>>>>>You don't want to give qemu R/W root access to host sysfs system
> >>>>>>>of the root bridge, that's not secure.
> >>>>>>
> >>>>>>igd_pci_read()
> >>>>>>        |
> >>>>>>        + xen_host_pci_get_block()
> >>>>>>                |
> >>>>>>                + xen_host_pci_config_read(()
> >>>>>>                        |
> >>>>>>                        +  pread()
> >>>>>>
> >>>>>>So shouldn't we already expose these information via sysfs?
> >>>>>
> >>>>>That's poking at sysfs of a real device, and after opening it RW.
> >>>>
> >>>>I don't think we can really write anything to those read-only sysfs
> >>>>interface even we open them as RW.
> >>>>
> >>>>Tiejun
> >>>>
> >>>>>
> >>>>>>>Avoiding read only access to filesystem is a good idea too, so it
> >>>>>>>should be possible to pass all parameters in as
> >>>>>>>device properties, and let whoever starts qemu
> >>>>>>>figure out what are reasonable values.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>#2 phase: Now, we will choose a capability ID that won't be 
> >>>>>>>>>>conflicting with
> >>>>>>>>>>others. To do this properly, we need to get one from PCI SIG group. 
> >>>>>>>>>>To have
> >>>>>>>>>>this workable and consistently validated, this method shouldn't be 
> >>>>>>>>>>virt
> >>>>>>>>>>specific. Then native driver should use the same method.
> >>>>>>>>>
> >>>>>>>>>You mean you will be able to talk sense into hardware guys?
> >>>>>>>>>I doubt that. If they could be convinced to make e.g. i915 base a
> >>>>>>>>
> >>>>>>>>We're negotiating this, so this is just our long term solution we 
> >>>>>>>>figure out
> >>>>>>>>currently.
> >>>>>>>>
> >>>>>>>>>proper BAR, why didn't they?
> >>>>>>>>
> >>>>>>>>We already have no any free BAR as we mentioned previously.
> >>>>>>>
> >>>>>>>I thought you were talking about modifying hardware?
> >>>>>>
> >>>>>>Yes.
> >>>>>>
> >>>>>>Tiejun
> >>>>>
> >>>>>And they can't figure out how to stick extra info in an existing BAR?
> >>>>>Oh well, that's hardware for you.
> >>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>So when xen work on
> >>>>>>>>>>q35 PCIe machine, we can walk this way.
> >>>>>>>>>
> >>>>>>>>>If you are emulating MCH anyway, pick one that is close
> >>>>>>>>>to what i915 driver expects. It would then work with existing
> >>>>>>>>
> >>>>>>>>Looks you guys prefer we create a new MCH anyway, right? But is it 
> >>>>>>>>necessary
> >>>>>>>>to create a new based on i440fx just for a little change?
> >>>>>>>>
> >>>>>>>>Thanks
> >>>>>>>>Tiejun
> >>>>>>>
> >>>>>>>You can inherit it. Maybe you are lucky and this happens to
> >>>>>>>work without conflicting with whatever other guests want to do.
> >>>>>>>But if you ask me, you are really just piling up hacks.
> >>>>>>>If some guest does not work on i440, you should just work on
> >>>>>>>emulating whatever it does work on.
> >>>>>>>That would have real value.
> >>>>>>>
> >>>>>>>>>devices, without new capability IDs.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>Anthony,
> >>>>>>>>>>
> >>>>>>>>>>Any comments to address this in xen case?
> >>>>>>>>>>
> >>>>>>>>>>Thanks
> >>>>>>>>>>Tiejun
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Xen-devel mailing list
> >>>>>Xen-devel@xxxxxxxxxxxxx
> >>>>>http://lists.xen.org/xen-devel
> >>>>>
> >>>>>
> >>>
> >
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.