[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support



On 2014/7/1 20:33, Michael S. Tsirkin wrote:
On Tue, Jul 01, 2014 at 05:46:58PM +0800, Chen, Tiejun wrote:
On 2014/7/1 17:12, Michael S. Tsirkin wrote:
On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote:
On 2014/6/30 19:28, Michael S. Tsirkin wrote:
On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
On 2014/6/30 17:55, Michael S. Tsirkin wrote:
On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
On 2014/6/30 17:05, Michael S. Tsirkin wrote:
On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
On 2014/6/30 14:48, Michael S. Tsirkin wrote:
On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
On 2014/6/26 18:03, Paolo Bonzini wrote:
Il 26/06/2014 11:18, Chen, Tiejun ha scritto:


- offsets 0x0000..0x0fff map to configuration space of the host MCH


Are you saying the config space in the video device?

No, I am saying in a new BAR, or at some magic offset of an existing
MMIO BAR.


As I mentioned previously, the IGD guy told me we have no any unused a
offset or BAR in the config space.

And guy who are responsible for the native driver seems not be accept to
extend some magic offset of an existing MMIO BAR.

In addition I think in a short time its not possible to migrate i440fx to
q35 as a PCIe machine of xen.

That seems like a weak motivation.  I don't see a need to get something
merged upstream in a short time: this seems sure to miss 2.1,
so you have the time to make it architecturally sound.
"Making existing guests work" would be a better motivation.

Yes.

So focus on this then. Existing guests will probably work
fine on a newer chipset - likely better than on i440fx.
xen management tools need to do some work to support this?
That will just give everyone more long term benefits.

If instead we create a hack that does not resemble
any existing hardware even remotely, what's the
chance that it will not break with any future
guest modification?


Isn't this possible with an mch chipset?

If you're saying q35, I mean AFAIK we have no any plan to migrate to this
MCH in xen case.

q35 or a newer chipset that's closer to what guests expect.

Additionally, I think I should follow this feature after
q35 can work for xen scenario.

What's stopping you?

I mean if you want create an new machine based on q35, actually this is
equal to start making xen to migrate to q35 now. Right? I can't image how
much effort should be done.

I don't see why you don't try. Sounds like a more robust solution to me.

As I think this is another requirement to us. I'm not sure if I have enough
time to touch this currently.


So this is a reason why I'm saying I'd like to follow this feature after q35
can work with xen completely.

Then we'll end up with more configurations to support, and to what end?




So could we do this step by step:

#1 phase: We just cover current qemu-xen implementation based on i44fx, so
still provide that pseudo ISA bridge at 00:1f.0 as we already did.

By the way there is no reason to put it at 00:1f.0 specifically I think.
So it seems simple: create a dummy device that gets device and
vendor id as properties. If you really like, add an option to get values

Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
passthrough: create pseudo intel isa bridge. There, we fake this device just
at 00:1f.0.
But you guys don't like this, and shouldn't this be just this point we
discussing now?

If you guys agree that , everything is fine.

Actually, this isn't what you did.
Don't tie it to xen, and don't tie it to 1f.
Just make it a simple stub pci device.
Whoever wants it, creates it.

The thing I worry about, is the chance this will break going forward.
So you created a system with 2 isa bridges.

I don't set class type to claim this is an ISA bridge, just a normal PCI
device.
+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
*hdev)
+{
+    struct PCIDevice *dev;
+
+    char rid;
+
+    /* We havt to use a simple PCI device to fake this ISA bridge
+     * to avoid making some confusion to BIOS and ACPI.
+     */
+    dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
"pseudo-intel-pch-isa-bridge");
+
+    qdev_init_nofail(&dev->qdev);
+
+    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
+    pci_config_set_device_id(dev->config, hdev->device_id);
+
+    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
+
+    pci_config_set_revision(dev->config, rid);
+
+    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
+    return 0;
+}

Then I don't see how it's supposed to work.
Doesn't i915 look for an isa bridge?

         /*
          * The reason to probe ISA bridge instead of Dev31:Fun0 is to
          * make graphics device passthrough work easy for VMM, that only
          * need to expose ISA bridge to let driver know the real hardware
          * underneath. This is a requirement from virtualization team.
          *
          * In some virtualized environments (e.g. XEN), there is irrelevant
          * ISA bridge in the system. To work reliably, we should scan trhough
          * all the ISA bridge devices and check for the first match, instead
          * of only checking the first one.
          */
         while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
                 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
                         unsigned short id = pch->device & 
INTEL_PCH_DEVICE_ID_MASK;
                         dev_priv->pch_id = id;

                         if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
                                 dev_priv->pch_type = PCH_IBX;

etc


I already post this to mainline to change as follows:

-       while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
+       pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
+       if (pch) {

I see - responded on that mail - but I thought the point is to make
existing guests run? In fact you said so explicitly.


Please refer to this,

[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of
check class type

Linux Native guys would like to accept this. And actually Windows always use
devfn to detect this.

In fact I see this:

     linux 2.6.35-3.9 probes the 1st IA bridge

         no idea how would you fix this.
         try changing default class for the main bridge?

     linux 3.10 probes all ISA bridges

         requires your stub to be the isa bridge?


I don't see why are driver guys so willing to do crazy things.  They
want to match specific device/vendor id pairs, why don't they do just
that? Why poke at class, random slot numbers etc etc?

AFAIK what they did is from our previous incorrect requirement as I
understand. So we need to correct this.

Thanks
Tiejun

Since we can't fix existing guests, I would say

This shouldn't be a fix existing guest, and this is why I can send this
before you guys accept GFX passthrough for qemu ML.

I think you can re-read that patch head description. 1f.0 can work under all
scenarios including qemu-xen-traditonal.

And this is also expected by native Linux organically, and especially
Windows always use devfn to detect PCH, this is not like current Linux. So
here I just sync this to make sense.

Unless you'd like to make Linux specific to this point.

Tiejun

Why don't both windows and linux drivers look device up
by device and vendor id?

Its easy to understand because the same video device can work under different PCH, but we need to do different initialization based on different PCH.

Thanks
Tiejun

Same applies to MCH really.


get things working first, find a clean way for
new driver to work next.






This is already not something that exists on real hardware.
So it might break some guests that will get confused.
Maybe we are lucky and most guests see an unfamiliar device
and ignore it. It seems believable.

But your MCH hack overrides registers in the pci host.

We just try to write *one* register we already confirm this is safe enough.

This should go in code in form of comments:
document what this register does on 440fx
and why it's safe to override.
We don't see what you
confirmed off-line.

That offset is one specific to IGD usage, not for i440fx common. This is why
we need to expose something in the host bridge. They're just introduced to
support IGD.


Other register are read-only.

Doesn't matter, need to document these as well.

I think everything are covered in igd_pci_write()/igd_pci_write().


Are we lucky and there's nothing in these registers
of interest to guests? This seems much more fragile.
So please poke at the spec, and compile the list
of registers you want to touch, figure out why they are
safe to override, and put this all in code comments.

And the same thing that applies to the isa bridge

We just expose its own vendor/device ids here. We don't access to change
anything in the isa bridge.

applies here too. It should work without QEMU touching
hosts' hardware.



>from sysfs: device and vendor id are world readable, so just get them
directly and not through xen wrappers, this way you can open the files
RO and not RW.
You seem to poke at revision as well, I don't see
driver looking at that - strictly necessary?
If yes please patch host kernel to expose that info in sysfs,
though we can fall back on pci config if not there.

MCH (bridge_dev) hacks in i915 are nastier.
To clean them up, we really have to have an explicit driver for this
bridge, not a pass-through device.  Long term, the right thing to do is
likely to extend host driver and expose the necessary information in
sysfs on host kernel.



I'm a bit confused. Any sysfs should be filled by the associated PCIe
device, shouldn't it? So qemu still need to emulate this PCIe device
firstly, then set properties into sysfs.

I am talking about getting host properties into qemu.
You don't want to give qemu R/W root access to host sysfs system
of the root bridge, that's not secure.

igd_pci_read()
        |
        + xen_host_pci_get_block()
                |
                + xen_host_pci_config_read(()
                        |
                        +  pread()

So shouldn't we already expose these information via sysfs?

That's poking at sysfs of a real device, and after opening it RW.

I don't think we can really write anything to those read-only sysfs
interface even we open them as RW.

Tiejun


Avoiding read only access to filesystem is a good idea too, so it
should be possible to pass all parameters in as
device properties, and let whoever starts qemu
figure out what are reasonable values.



#2 phase: Now, we will choose a capability ID that won't be conflicting with
others. To do this properly, we need to get one from PCI SIG group. To have
this workable and consistently validated, this method shouldn't be virt
specific. Then native driver should use the same method.

You mean you will be able to talk sense into hardware guys?
I doubt that. If they could be convinced to make e.g. i915 base a

We're negotiating this, so this is just our long term solution we figure out
currently.

proper BAR, why didn't they?

We already have no any free BAR as we mentioned previously.

I thought you were talking about modifying hardware?

Yes.

Tiejun

And they can't figure out how to stick extra info in an existing BAR?
Oh well, that's hardware for you.




So when xen work on
q35 PCIe machine, we can walk this way.

If you are emulating MCH anyway, pick one that is close
to what i915 driver expects. It would then work with existing

Looks you guys prefer we create a new MCH anyway, right? But is it necessary
to create a new based on i440fx just for a little change?

Thanks
Tiejun

You can inherit it. Maybe you are lucky and this happens to
work without conflicting with whatever other guests want to do.
But if you ask me, you are really just piling up hacks.
If some guest does not work on i440, you should just work on
emulating whatever it does work on.
That would have real value.

devices, without new capability IDs.


Anthony,

Any comments to address this in xen case?

Thanks
Tiejun





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel







_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.