[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough



Am 05.11.14 um 10:45 schrieb Ian Campbell:
On Tue, 2014-11-04 at 18:30 +0100, Atom2 wrote:
Am 04.11.14 um 17:31 schrieb Ian Campbell:
On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
Am 04.11.14 um 16:44 schrieb Ian Campbell:
On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
Sadly it looks like your version of valgrind doesn't know how to handle
the hypercalls made by the Xen toolstack, which means it produces a lot
of unrelated noise.

You seem to be using valgrind 3.9.0, which lacked knowledge of some of
the HVM related hypercalls that weren't added until 3.10.0. It's
probably not worth pursuing this angle any further (unless it is utterly
trivial to pull in the new version).
Many thanks again for your quick answers.

You were right, I used valgrind-3.9.0 which is the latest stable version for gentoo. 3.10.0 is available under unstable and it was indeed trivial to pull that in instead. The unrelated noise seems to have disappeared, so attached please find the output of running
        # valgrindd xl create -F -c pfsense

The strange thing was: No segfault at the start, but obviously also issues with passing through the PCI devices as evidenced by the same error messages you flagged below. Also the boot menu now showed up and I was able to boot the domain - but, as expected by the error message, no network devices have been passed through. Even a

        # xl shutdown -F pfsense
        Shutting down domain 2
        PV control interface not available: sending ACPI power button event.
        #

from another ssh connection to dom0 worked (no segfault message in that session) and as such the attached file 'valgrind.out' contains the complete screen output of the valgrind session from start to finnish. However, towards the end of that file (line 235) you'll see a SEGFAULT message from valgrind. I hope you can make some sense out of that ... or should I rerun with some options to valgrind (like the ones mentioned in the output):
        --leak-check=full
        -v

To me, it looks as if something is broken with the PCI passthrough stuff and that has started with 4.3.3. Strangely however, valgrind seems to work around that issue insofar that no segfault happens. Is there any explanation of the different behaviour between native execution of xl and starting xl under valgrind's control?

In any case, I am positive that there hasn't been any change to the hardware of the system, not even a slot change of an add-on card. So I have no clue why the system after the upgrade misbehaves.

Apart from the valgrind output there is a new message from libxl:
         libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 
0000:04:00.0 cannot be assigned - no IOMMU?
which suggests that it isn't passing things through (this might be
fallout from valgrind not understanding things) and no segfault.

OOI what does "xl create -F ..." do without valgrind (I'm wondering if
-F is responsible for the change in behaviour).
I tried that as well:

        vm-host auto [526] # xl create -F -c pfsense
        Parsing config from pfsense
        xc: info: VIRTUAL MEMORY ARRANGEMENT:
          Loader:        0000000000100000->00000000001c12a4
          Modules:       0000000000000000->0000000000000000
          TOTAL:         0000000000000000->000000001f800000
          ENTRY ADDRESS: 0000000000100000
        xc: info: PHYSICAL MEMORY ALLOCATION:
          4KB PAGES: 0x0000000000000200
          2MB PAGES: 0x00000000000000fb
          1GB PAGES: 0x0000000000000000
        Segmentation fault
        vm-host auto [527] # xl list
        Name                        ID   Mem VCPUs      State   Time(s)
        Domain-0                     0  4094     8     r-----     451.5
        pfsense                      1   512     1     --p---       0.0
        vm-host auto [528] # xl destroy pfsense
        Segmentation fault
        vm-host auto [529] # xl list
        Name                        ID   Mem VCPUs      State   Time(s)
        Domain-0                     0  4096     8     r-----     452.1
        vm-host auto [529] #

and, as you can see, again had the segfault and the same status of the domU as back at the time when the issues started (i.e. paused - which you explained as being normal after a start).

Thanks Atom2

Attachment: valgrind.out
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.