[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On 2024/6/19 17:49, Jan Beulich wrote: > On 19.06.2024 10:51, Chen, Jiqian wrote: >> On 2024/6/19 16:06, Jan Beulich wrote: >>> On 19.06.2024 09:53, Chen, Jiqian wrote: >>>> On 2024/6/18 16:55, Jan Beulich wrote: >>>>> On 18.06.2024 08:57, Chen, Jiqian wrote: >>>>>> On 2024/6/17 22:52, Jan Beulich wrote: >>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote: >>>>>>>> The gsi of a passthrough device must be configured for it to be >>>>>>>> able to be mapped into a hvm domU. >>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes >>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list, >>>>>>>> and the handler of irq_desc is not set, then when passthrough a >>>>>>>> device, setting ioapic affinity and vector will fail. >>>>>>>> >>>>>>>> To fix above problem, on Linux kernel side, a new code will >>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to >>>>>>>> register gsi when dom0 is PVH. >>>>>>>> >>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above >>>>>>>> purpose. >>>>>>>> >>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx> >>>>>>>> Signed-off-by: Huang Rui <ray.huang@xxxxxxx> >>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx> >>>>>>>> --- >>>>>>>> The code link that will call this hypercall on linux kernel side is as >>>>>>>> follows: >>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@xxxxxxx/ >>>>>>> >>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, >>>>>>> unaddressed >>>>>>> one here: >>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own >>>>>>> operation, I think it'll also want/need explaining why what is >>>>>>> sufficient for >>>>>>> Dom0 alone isn't sufficient when pass-through comes into play." >>>>>> I have modified the commit message to describe why GSIs are not >>>>>> registered can cause passthrough not work, according to this v9 comment. >>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin >>>>>> list, and the handler of irq_desc is not set, then when passthrough a >>>>>> device, setting ioapic affinity and vector will fail." >>>>>> What description do you want me to add? >>>>> >>>>> What I'd first like to have clarification on (i.e. before putting it in >>>>> the description one way or another): How come Dom0 alone gets away fine >>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it >>>>> perhaps that it just so happened that for Dom0 things have been working >>>>> on systems where it was tested, but the call should in principle have been >>>>> there in this case, too [1]? That (to me at least) would make quite a >>>>> difference for both this patch's description and us accepting it. >>>> Oh, I think I know what's your concern now. Thanks. >>>> First question, why gsi of device can work on PVH dom0: >>>> Because when probe a driver to a normal device, it will call linux kernel >>>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> >>>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> >>>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> >>>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered. >>>> Second question, why gsi of passthrough can't work on PVH dom0: >>>> Because when assign a device to be passthrough, it uses pciback to probe >>>> the device, and it calls pcistub_probe, but in all callstack of >>>> pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the >>>> function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when >>>> the gsi is unmasked, so that the gsi can't work for passthrough device. >>> >>> And why exactly would the fake IRQ handler not be set up by pciback? Its >>> setting up ought to lead to those same IO-APIC RTE writes that Xen >>> intercepts. >> Because isr_on is not set, when xen_pcibk_control_isr is called, it will >> return due to " !dev_data->isr_on". So that fake IRQ handler aren't >> installed. > > I'm afraid I don't follow you here. Quoting from the function: > > enable = dev_data->enable_intx; > > /* Asked to disable, but ISR isn't runnig */ > if (!enable && !dev_data->isr_on) > return; > > I.e. we bail if the request was to _disable_ and there is no ISR. I mean, after debugging the pcistub_probe callstack: pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr(dev, 1 /*reset device*/) and in xen_pcibk_control_isr code: if (reset) { dev_data->enable_intx = 0; dev_data->ack_intr = 0; } enable = dev_data->enable_intx; /* Asked to disable, but ISR isn't runnig */ if (!enable && !dev_data->isr_on) return; "reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, so it return here. It can't reach following code to install irq handle. > > I can't exclude though that command_write()'s logic to set ->enable_intx > is insufficient. But in the common case one would surely expect at least > one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest. > IOW at some point I'd expect xen_pcibk_control_isr() to be called with > the second argument 0 and with ->enable_intx set. I only see xen_pcibk_control_isr(dev, 0) is called in xen_pcibk_do_one_op, but xen_pcibk_do_one_op is not called during assigning a device to be passthrough. > >> And it seems isr_on is set through driver sysfs " irq_handler_state" for a >> level device that is to be shared with guest and the IRQ is shared with the >> initial domain. > > The sysfs interface is, according to my reading of the description > of the commit introducing it, merely for debugging / recovery purposes. > (It also looks to me as if this was partly broken: If one would use it, > thus clearing ->isr_on, a subsequent disable request would take exactly > that early bailing path quoted above, with nothing removing the IRQ > handler.) > > That description also talks about both an edge vs level distinction in > behavior and one for shared vs non-shared, but neither in that commit > nor in present code I can spot any respective checks. Otherwise I could > understand that there are cases where the necessary information isn't > propagated to Xen. > > Jan -- Best regards, Jiqian Chen.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |