  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, bercarug@xxxxxxxxxx
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Wed, 25 Jul 2018 15:41:11 +0200
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, David Woodhouse <dwmw2@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, abelgun@xxxxxxxxxx
On 25/07/18 15:35, Roger Pau Monné wrote:
> On Wed, Jul 25, 2018 at 01:06:43PM +0300, bercarug@xxxxxxxxxx wrote:
>> On 07/24/2018 12:54 PM, Jan Beulich wrote:
>>>>>> On 23.07.18 at 13:50, <bercarug@xxxxxxxxxx> wrote:
>>>> For the last few days, I have been trying to get a PVH dom0 running,
>>>> however I encountered the following problem: the system seems to
>>>> freeze after the hypervisor boots, the screen goes black. I have tried to
>>>> debug it via a serial console (using Minicom) and managed to get some
>>>> more Xen output, after the screen turns black.
>>>> I mention that I have tried to boot the PVH dom0 using different kernel
>>>> images (from 4.9.0 to 4.18-rc3), different Xen  versions (4.10, 4.11, 
>>>> 4.12).
>>>> Below I attached my system / hypervisor configuration, as well as the
>>>> output captured through the serial console, corresponding to the latest
>>>> versions for Xen and the Linux Kernel (Xen staging and Kernel from the
>>>> xen/tip tree).
>>>> [...]
>>>> (XEN) [VT-D]iommu.c:919: iommu_fault_status: Fault Overflow
>>>> (XEN) [VT-D]iommu.c:921: iommu_fault_status: Primary Pending Fault
>>>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:14.0] fault addr 
>>>> 8deb3000, iommu reg = ffff82c00021b000
> Can you figure out which PCI device is 00:14.0?
>>>> (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
>>>> (XEN) print_vtd_entries: iommu #0 dev 0000:00:14.0 gmfn 8deb3
>>>> (XEN) root_entry[00] = 1021c60001
>>>> (XEN) context[a0] = 2_1021d6d001
>>>> (XEN) l4[000] = 9c00001021d6c107
>>>> (XEN) l3[002] = 9c00001021d3e107
>>>> (XEN) l2[06f] = 9c000010218c0107
>>>> (XEN) l1[0b3] = 8000000000000000
>>>> (XEN) l1[0b3] not present
>>>> (XEN) Dom0 callback via changed to Direct Vector 0xf3
>>> This might be a hint at a missing RMRR entry in the ACPI tables, as
>>> we've seen to be the case for a number of systems (I dare to guess
>>> that 0000:00:14.0 is a USB controller, perhaps one with a keyboard
>>> and/or mouse connected). You may want to play with the respective
>>> command line option ("rmrr="). Note that "iommu_inclusive_mapping"
>>> as you're using it does not have any meaning for PVH (see
>>> intel_iommu_hwdom_init()).
>>> Jan
>> Hello,
>> Following Roger's advice, I rebuilt Xen (4.12) using the staging branch and
>> I managed to get a PVH dom0 starting. However, some other problems appeared:
>> 1) The USB devices are not usable anymore (keyboard and mouse), so the
>> system is only accessible through the serial port.
> Can you boot with iommu=debug and see if you get any extra IOMMU
> information on the serial console?
>> 2) I can run any usual command in dom0, but the ones involving xl (except
>> for xl info) will make the system run out of memory very fast. Eventually,
>> when there is no more free memory available, the OOM killer begins removing
>> processes until the system auto reboots.
>> I attached a file containing the output of a lsusb, as well as the output of
>> xl info and xl list -l.
>> After xl list -l, the “free -m” commands show the available memory
>> decreasing.
>> Each command has a timestamp appended, so it can be seen how fast the
>> available memory decreases.
>> I removed much of the process killing logs and kept the last one, since they
>> were following the same pattern.
>> Dom0 still appears to be of type PV (output of xl list -l), however during
>> boot, the following messages were displayed: “Building a PVH Dom0” and
>> “Booting paravirtualized kernel on Xen PVH”.
>> I mention that I had to add “workaround_bios_bug” in GRUB_CMDLINE_XEN for
>> iommu to get dom0 running.
> It seems to me like your ACPI DMAR table contains errors, and I
> wouldn't be surprised if those also cause the USB devices to
> malfunction.
>> What could be causing the available memory loss problem?
> That seems to be Linux aggressively ballooning out memory, you go from
> 7129M total memory to 246M. Are you creating a lot of domains?

This might be related to the tools thinking dom0 is a PV domain.


