[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Question about running Xen on NVIDIA Jetson-TK1
Hi, Meng: Julien is correct-- a coworker and I are working on support for Tegra SoCs, and we've made pretty good progress; there's work yet to be done, but we have dom0 and guests booting on the Jetson TK1, Jetson TX1, and the Google Pixel C. We hope to get a patch set out soon-- unfortunately, our employer has to take some time to verify that everything's okay to be open-sourced, so I can't send out our work-in-progress just yet. We'll have an RFC patchset out soon, I hope! There are two main hardware differences that cause Tegra SoCs to have trouble with Xen: - The primary interrupt controller for those systems isn't a single GIC, as Xen expects. Instead, there's an NVIDIA Legacy Interrupt Controller (LIC, or ICTLR) that gates all peripheral interrupts before passing them to a standard GICv2. This interrupt controller has to be programmed to ensure Xen can receive interrupts from the hardware (e.g. serial), programmed to ensure that interrupts for pass-through devices are correctly unmasked, and virtualized so dom0 can program the "sections" related to interrupts not being routed to Xen or to a domain for hardware passthrough. - The serial controller on the Tegra SoCs doesn't behave in the same was as most NS16550-compatibles; it actually adheres to the NS16550 spec a little more rigidly than most compatible controllers. A coworker (Chris Patterson, cc'd) figured out what was going on; from what I understand, most 16550s generate the "transmit ready" interrupt once, when the device first can accept new FIFO entries. Both the original 16550 and the Tegra implementation generate the "transmit ready" interrupt /continuously/ when there's space available in the FIFO, slewing the CPU with a stream of constant interrupts. What you're seeing is likely a symptom of the first difference. In your logs, you see messages that indicate Xen is having trouble correctly routing IRQ that are parented by the legacy interrupt controller: > irq 0 not connected to primary controller.Connected to > /interrupt-controller@60004000 The issue here is that Xen is currently explicitly opting not to route legacy-interrupt-controller interrupts, as they don't belong to the primary GIC. As a result, these interrupts never make it to dom0. The logic that needs to be tweaked is here: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/domain_build.c;h=00dc07af637b67153d33408f34331700dff84f93;hb=HEAD#l1137 We re-write this logic in our forthcoming patch-set to be more general. As an interim workaround, you might opt to rewrite that logic so LIC interrupts (which have an interrupt-parent compatible with "tegra124-ictlr", in your case) can be routed by Xen, as well. Off the top of my head, a workaround might look like: /* * Don't map IRQ that have no physical meaning * ie: IRQ whose controller is not the GIC */ - if ( rirq.controller != dt_interrupt_controller ) +if ( (rirq.controller != dt_interrupt_controller) && (!dt_device_is_compatible(rirq.controller, "tegra124-ictlr") ) Of course, that's off-the-cuff code I haven't tried, but hopefully it should help to get you started. --Kyle On Mon, May 16, 2016 at 3:39 PM, Meng Xu <xumengpanda@xxxxxxxxx> wrote: > Hi Julien, > > On Mon, May 16, 2016 at 1:33 PM, Julien Grall <julien.grall@xxxxxxx> wrote: >> (CC Kyle who is also working on Tegra?) >> >> Hi Meng, >> >> Many people are working on Nvidia platform with different issues :/. I have >> CCed another person which IIRC is also working on it. > > Sure. It's good to know others are also interested in this platform. > It will be more useful to fix it... :-) > >> >> On 16/05/16 17:33, Meng Xu wrote: >>> >>> On Mon, May 16, 2016 at 7:33 AM, Julien Grall <julien.grall@xxxxxxx> >>> wrote: >>>> >>>> >>>> On 15/05/16 20:35, Meng Xu wrote: >>>>> >>>>> >>>>> I'm trying to run Xen on NVIDIA Jetson TK1 board. (Right now, Xen does >>>>> not support the Jetson board officially. But I'm thinking it may be >>>>> very interesting and useful to see it happens, since it has GPU inside >>>>> which is quite popular in automotive.) >>>>> >>>>> Now I encountered some problem to boot dom0 in Xen environment. I want >>>>> to debug the issues and maybe fix the issues, but I'm not so sure how >>>>> I should debug the issue more efficiently. I really appreciate it if >>>>> you advise me a little bit about the method of how to fix the issue. >>>>> :-) >>>>> >>>>> ---Below is the details---- >>>>> >>>>> I noticed the Dushyant from IBM also tried to run Xen on the Jetson >>>>> board. (http://www.gossamer-threads.com/lists/xen/devel/422519). I >>>>> used the same Linux kernel (Jan Kiszka's development tree - >>>>> http://git.kiszka.org/linux.git/, branch queues/assorted) and Ian's >>>>> Xen repo. with the hack for Jetson board. I can see the dom0 kernel >>>>> can boot to some extend and then "stall/spin" before the dom0 kernel >>>>> fully boot up. >>>>> >>>>> In order to figure out the possible issue, I boot the exact same Linux >>>>> kernel in native environment on one CPU and collected the boot log >>>>> information in [1]. I also boot the same Linux kernel as dom0 in Xen >>>>> environment and collected the boot log information in [2]. >>>>> >>>>> In Xen environment, dom0 hangs after the following message >>>>> [ 10.541010] NET: Registered protocol family 10 >>>>> 6mip6: Mobile IPv6 >>>>> [ 10.542510] mi >>>>> >>>>> In native environment, the kernel has the following log after >>>>> initializing NET. >>>>> [ 2.934693] NET: Registered protocol family 10 >>>>> [ 2.940611] mip6: Mobile IPv6 >>>>> [ 2.943645] sit: IPv6 over IPv4 tunneling driver >>>>> [ 2.951303] NET: Registered protocol family 17 >>>>> [ 2.955800] NET: Registered protocol family 15 >>>>> [ 2.960257] can: controller area network core (rev 20120528 abi 9) >>>>> [ 2.966617] NET: Registered protocol family 29 >>>>> [ 2.971098] can: raw protocol (rev 20120528) >>>>> [ 2.975384] can: broadcast manager protocol (rev 20120528 t) >>>>> [ 2.981088] can: netlink gateway (rev 20130117) max_hops=1 >>>>> [ 2.986734] Bluetooth: RFCOMM socket layer initialized >>>>> [ 2.991979] Bluetooth: RFCOMM ver 1.11 >>>>> [ 2.995757] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 >>>>> [ 3.001109] Bluetooth: BNEP socket layer initialized >>>>> [ 3.006089] Bluetooth: HIDP (Human Interface Emulation) ver 1.2 >>>>> [ 3.012052] Bluetooth: HIDP socket layer initialized >>>>> [ 3.017894] Registering SWP/SWPB emulation handler >>>>> [ 3.029675] tegra-pcie 1003000.pcie-controller: 2x1, 1x1 >>>>> configuration >>>>> [ 3.036586] +3.3V_SYS: supplied by +VDD_MUX >>>>> [ 3.040857] +3.3V_LP0: supplied by +3.3V_SYS >>>>> [ 3.045509] +1.35V_LP0(sd2): supplied by +5V_SYS >>>>> [ 3.050201] +1.05V_RUN_AVDD: supplied by +1.35V_LP0(sd2) >>>>> [ 3.057131] tegra-pcie 1003000.pcie-controller: probing port 0, using >>>>> 2 lanes >>>>> [ 3.066479] tegra-pcie 1003000.pcie-controller: Slot present pin >>>>> change, signature: 00000008 >>>>> >>>>> I'm suspecting that my dom0 kernel hangs when it tries to initialize >>>>> "can: controller area network core ". However, from Dushyant's post at >>>>> http://www.gossamer-threads.com/lists/xen/devel/422519, it seems >>>>> Dushyant's dom0 kernel hangs when it tries to initialize pci_bus. (The >>>>> linux config I used may be different form Dushyant's. That could be >>>>> the reason.) >>>>> >>>>> Right now, the system just hangs and has no output indicating what the >>>>> problem could be. Although there are a lot of error message before the >>>>> system hangs, I'm not that sure if I should start with solving all of >>>>> those error messages. Maybe some errors can be ignored? >>>>> >>>>> My questions are: >>>>> 1) Do you have suggestion on how to see more information about the >>>>> reason why the dom0 hangs? >>>> >>>> >>>> >>>> Have you tried to dump the registers using Xen console (CTLR-x 3 times >>>> then 0) and see where it get stucks? >>> >>> >>> >>> I tried to type CTLR -x 3 times and then 0, nothing happens... :-( >>> Just to confirm, once the system got stuck, I directly type Ctrl-x for >>> three times on the host's screen. Am I correct? >> >> >> Sorry, I forgot the default way to switch console is CTLR-a three times. >> >> On my configuration I modified the default character to avoid issue with >> screen. >> > > Ah-ha, typing "Ctrl -a, a" for three times, I can switch to the Xen > console now. :-) > > Thank you for the clarification. :-) > >>> >>> Maybe the serial console is not correctly set up? >> >> >> It's likely to be a problem with the serial drivers in driver. Can you tried >> CTLR-a before the kernel is booting? >> >>> The serial console configuration I used is as follows, could you have >>> a quick look to see if it's because I configure the serial >>> incorrectly? >> >> >> I am not familiar with the Nvidia board. If the serial configuration is >> working for baremetal, then it should work for Xen. >> >>> >>> I used screen program to attach to the serial port. >>> The command I used is $screen /dev/ttyUSB0 115200n8 on the host machine. >>> >>> On the board, I set up the device tree's /chosen node as follows: >>> # >>> fdt print /chosen >>> >>> chosen { >>> >>> xen,xen-bootargs = "console=dtuart dtuart=serial0 >>> dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 >>> dom0_vcpus_pin maxcpus=1"; >>> >>> bootargs = "console=dtuart dtuart=serial0 dom0_mem=512M >>> loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin >>> maxcpus=1"; >> >> >> FIY, this property is not necessary. >> >>> module { >>> >>> bootargs = "console=hvc0 console=tty1 earlyprintk=xen >>> root=/dev/mmcblk0p1 rw rootwait"; >> >> >> Can you try to add "clk_ignore_unused" on the Linux command line? > > Yes. After trying it, it gives me some different,but more useful log > information. > > In the kernel booting log, it first shows this message: > > ---start of the message---- > > [ 10.607251] Waiting for root device /dev/mmcblk0p1... > > ... /* I omit some other unrelated message */ > > [ 5.347354] sdhci-tegra 700b0400.sdhci: Got WP GPIO > > 3mmc0: Unknown controller version (3). You may experience problems. > > [ 5.347464] mmc0: Unknown controller version (3). You may > experience problems. > > sdhci-tegra 700b0400.sdhci: No vmmc regulator found > > [ 5.347647] sdhci-tegra 700b0400.sdhci: No vmmc regulator found > > 3mmc0: Unknown controller version (3). You may experience problems. > > [ 5.347933] mmc0: Unknown controller version (3). You may > experience problems. > > sdhci-tegra 700b0600.sdhci: No vmmc regulator found > > [ 5.348099] sdhci-tegra 700b0600.sdhci: No vmmc regulator found > > sdhci-tegra 700b0600.sdhci: No vqmmc regulator found > > [ 5.348162] sdhci-tegra 700b0600.sdhci: No vqmmc regulator found > > 4mmc0: Invalid maximum block size, assuming 512 bytes > > [ 5.348222] mmc0: Invalid maximum block size, assuming 512 bytes > > 6mmc0: SDHCI controller on 700b0600.sdhci [700b0600.sdhci] using ADMA 64-bit > > [ 5.395208] mmc0: SDHCI controller on 700b0600.sdhci > [700b0600.sdhci] using ADMA 64-bit > > 6usbcore: registered new interface driver usbhid > > ---end of the message---- > > It seems that mmc0 is not correctly recognized by dom0. > > Later the dom0 kernel keeps printing this message: > > [ 426.405136] mmc0: Timeout waiting for hardware interrupt. > > Dom0 actually hangs here, because it cannot read the eMMC device. :-( > > Is it because the device tree is not properly recreated by Xen? > Do you happen to know how to fix this issue or have some idea about > how to fix it? I can have a look at it. > > BTW, dom0 didn't recognize the mmc1 controller either. > > Thank you very much for your time and help in this! :-) > > Best Regards, > > Meng > ----------- > Meng Xu > PhD Student in Computer and Information Science > University of Pennsylvania > http://www.cis.upenn.edu/~mengxu/ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |