Re: [Xen-devel] segfault in xl create for HVM with PCI passthrough

To keep the thread together I am again submitting the relevant parts of my last answer (which due to an error on my part originally went out to Ian only and I only forward it to the list afterwards which resulted in an out-of-thread appeareance) together with the (new) results of my gdb excercise. Sorry for any confusion this may(might have) cause(d).

Am 28.10.14 um 17:04 schrieb Ian Campbell:
With regards to gdb: I can certainly run the command under gdb after
including debug support to the executables - that's no big deal.
I would, however, ask for your advice as to what I need to recompile
with debugger support? Is xen-tools (which includes xl) sufficient

I think just the Xen bits would be sufficient, at least to start with.

would you think that I also need to include debug support for gcc as the
library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to
belong to the gcc package? Or is this library a red herring that just
works as the catch-all code getting and finally handling the segfault?

I'd recommend ignoring it for now, in the event that the backtrace from
just the xen bits suggests a gcc issue that might change. My money right
now is on it being a xen issue though.

After recompiling xen-tools with gdb debug support I started the following command:
# gdb --args /usr/sbin/xl create pfsense -c

Please find the command's screen output after its start up to the segfault including the output of the bt command after the segfault in the attached document named "create".

Furthermore I did the same for the destroy command:
# gdb --args /usr/sbin/xl destroy pfsense

The output of this command is in the attached document named "destroy".

I haven't got much experience with gdb yet so I am unable to interpret the outcome of either. Also if there's more/different stuff required, please advise me what to do next. Tx.

pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]

You say in $subject that the failure is with PCI, is that because you've
tried an HVM domain without and it is ok, or is it just that all your
HVM domains happen to have passthrough enabled?
I haven't tried HVM domains without PCI passthrough (but PV domains w/o
PCI passthrough and they did not segfault) so far as all my HVM domains
require PCI devices (either at least a network card for pfsense - in
actual facts it's more than one that's being passed through - or a SATA
controller for my second HVM which is used as a storage VM).

The VM doesn't need to be fully functional, it just needs to boot
without crashing the toolstack. Just running your existing VM with the
pci line commented out would be useful.
Before re-compiling the xen-tools I made a quick test as you suggested
and commented out the pci line from my config file ... and the boot menu showed up (which it did not before when the segfault happened). I did not boot the pfsense vm any further as this might lead to a change in my configuration due to missing devices, but to me this at first sight seemed to indicate that is has to do with the PCI passthrough functionality.
Although as I did not want to boot the machine (and "xl shutdown" did
not work, not even with -F) I then decided to
    xl destroy pfsense
and that printed a segmentation fault message (in both the shell window
where I started the command from and the console window where the
boot-menu was shown) despite no PCI devices being passed through.

To also check PCI passthrough with a PV domain: I added a pci device to
a config file for a PV domain and started that with
    xl create voip -c
The boot menu appeared without issues. I then also tried
    xl destroy voip
from another window and that issued the following error messages in the
shell window (without using any -vvv option):

# xl destroy voip
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
Segmentation fault

The "Segmentation fault" message also appeared in both the console
window for the domU and the shell window.

This all seems a bit strange to me at the moement, but I am sure with
your help we will arrive at the grounds of this.

Thanks and regards Atom

Attachment: create
Description: Text document

Attachment: destroy
Description: Text document

