[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Issues Booting DomU on TI DRA72 Chip



On 07/09/2015 11:29 AM, Julien Grall wrote:
Hi,

On 09/07/2015 15:55, Brandon Perez wrote:
On 07/09/2015 03:34 AM, Julien Grall wrote:
 > These rangesets list the interrupt and I/O memory assigned to the
guest
 > (i.e passthrough).
 >
 > In the case of the timer interrupt (physical and virtual), they are
 > owned by Xen and Xen will emulated it for each guest.
 >
 >  From the logs the LRs are empty (see VCPU_LR) and no interrupts are
 > currently pending.
 >
 > I would add some printk in function gic_handle_irq in Linux
 > (drivers/irqchip/irq-gic.c, assuming you are using GICv2) to see which
 > IRQ is coming. FYI, the virt timer PPI is 27 and phys timer 30.
 >

    I added that printk statement, and I never see any physical or
virtual timer interrupts being delivered to the guest. I believe the
guest is using the virtual timer, because when I do a more full register
dump with trace32, I see that CNTV_CVAL is programmed to a non-zero
value, but CNTP_CVAL is 0.

Do you receive other interrupts in the guest?


    I also noticed the following line during the Dom0 guest boot:

(XEN) Generic Timer IRQ: phys=35 hyp=31 virt=32 Freq: 6147 KHz

    Is this expected to be different from the expected interrupt ID's
for ARM core timers (these numbers are all offset by +5)?

The timer IRQs for DOM0 and the guest may be different. DOM0 will use
the same as the hardware (i.e the one in your log), while guests have a
their interrupt defined in the guest layout (see
xen/include/public/arch-arm.h).

    Correct me if I'm wrong, but it seems like gic_handle_irq() is not
the control path taken by the PPI timers. It looks like the functions
"vtimer_interrupt()" and "timer_interrupt()" (xen/arch/arm/time.c)
handle those interrupts.

I think you are mixing two different things here:
     - gic_handle_irq is the Linux interrupt handler for GICv2 where
every IRQ is coming (PPIs, SPIs...)
         - vtimer_interrupt and timer_interrupt are Xen handlers which
will take of the timer IRQs of the hardware.

When the hardware fire a virtual interrupt it will be:
     1) Received by Xen via gic_interrupt
     2) do_IRQ will dispatch the IRQ
     3) virtual_interrupt will be called has it has been registered as
callback for the virtual timer
     4) The interrupt will be injected to the guest using the guest
virtual interrupt number. I.e the LRs will be setup
     5) The guest will receive the interrupt
     6) gic_handle_irq will dispatch the interrupt


    Also, based on the following lines in xen/arch/arm/irq.c: 224-226,
it seems like that the guests cannot be delivered PPI's from the
function "do_IRQ()":

     /* the irq cannot be a PPI, we only support delivery of SPIs to
      * guests */
      vgic_vcpu_inject_spi(info->d, info->virq);

This call is only used for IRQ owned by the domain.

In the case of the timer IRQs, they are owned by Xen and therefore not
directly injected to the guest.


    Also, how do the functions in xen/arch/arm/vtimer.c fit into this
picture? Would they be the ones responsible for delivering the vtimer
interrupt to the guest?

The file vtimer.c contains anything related to the guest timer:
emulation of the physical timer and context/restore of both virtual and
physical timer.

The injection of the timer interrupts will be done in different place:
     - virtual: this is done in vtimer_interrupt (xen/arch/arm/timer.c)
when the domain is running. If the domain is not running, we create a
timer and may inject an interrupt if the timer has expired (see
virt_timer_expired in xen/arch/arm/vtimer.c).
     - physical: the timer is completely emulated. The injection of the
interrupt is done in phys_timer_expired (xen/arch/arm/vtimer.c).

You can add printk in those place to check whether the interrupt is
injected to the guest or not. It may give you an insight of whether the
timer has been correctly setup by the guest or not.

Note that you may want to only print when it's not domain 0 to avoid log
pollution.

     Yes, I'm using the 3.14 kernel for both dom0 and domU.
Unfortunately, I need the 3.14 kernel, but if these problems persist, I
may try to update to a newer mainline kernel.

May I ask you to provide
     - the .config used to build the kernel
     - a link to the git repo containing the branch (I wasn't able to
find it on your first mail)

You also said you had local changes in both Linux and Xen. Can you tell
us what kind of changes?

Regards,


Hi Julien,

Thanks, for the clarifications on my questions on interrupt/timer code flow in Xen. That made it a lot clearer. I figured out what is causing the bug, and it turned out to be one of the local changes I had to Xen.

On the DRA72x device, there are a large number of peripheral devices, more than can be handled by the SPI on the GIC. Therefore, there's an internal mechanism that allows the routing of these peripherals to the interrupt lines on the GIC.

Unfortunately, this means that the "interrupt" property of devices in the device tree do not contain the interrupt ID of the device, but rather contain the index (ID) of the peripheral on the board. In the Linux kernel, the peripherals are mapped to SPI IRQ lines as they are needed. So, given the same device tree and configuration settings, the mapping will always be the same, but this cannot be easily inferred from the device tree. The mapping is not static, but once the interrupts are assigned, they do not change.

Naturally, Xen is unaware of this interrupt ID remapping, so it maps interrupts prematurely, or assertions fail, as the "interrupt ID" found in the device tree is larger than the total number of GIC lines. To work-around this, I made the mapping of peripherals fully static, so Xen would accept the default configuration given to it by the bootloader, and the Linux kernel would not to attempt to create a new mapping from the devices in the device tree.

The change I made essentially involved adding an extra layer of translation to "gic_irq_xlate()", where if the platform has defined irq remap function, then it performs this remapping.

This lead to an issue because I assumed that all devices in the device tree were peripheral devices, tied to SPIs (thus requiring translation). PPIs, naturally, are independent of this peripheral remapping, since they are on the physical processor. Thus, there was a timer node in the device tree, and I was incorrectly translating these interrupt numbers, treating the like SPIs, instead of PPIs. So, I just had to make the simple change of performing the extra layer of translation only if it was an SPI.

Then, since none of the timer interrupt functions were properly mapped to the correct IRQ lines, the timer interrupts were not handled properly, which showed up in my guest kernel boot as the Guest getting stuck in a hrtimer sleep forever, since no virtual timer interrupts were ever received.

For the sake of completeness, I've attached my .config file for the kernel. It's a standard kernel configuration, with some extra parameters for Xen as outlined in [1]. The git repo is at git://git.omapzoom.org/kernel/omap.git, and the branch is "android-3.14-6AL.1.0".

[1] http://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/OMAP5432_uEVM

Brandon


Attachment: .config
Description: Text document

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.