[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [xen-unstable-smoke test] 118229: regressions - FAIL



flight 118229 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118229/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64                   6 xen-build                fail REGR. vs. 118219

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt      1 build-check(1)               blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386  1 build-check(1)         blocked n/a
 build-amd64-libvirt           1 build-check(1)               blocked  n/a
 test-armhf-armhf-xl          13 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          14 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-xsm      13 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      14 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  66bf4ef04869548128b70d8d371ec992189a6a1c
baseline version:
 xen                  56498d2cf9d3c5f7d3d894a89f7d66ed81548e01

Last test of basis   118219  2018-01-19 01:01:22 Z    0 days
Testing same since   118226  2018-01-19 11:02:00 Z    0 days    2 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  George Dunlap <george.dunlap@xxxxxxxxxx>
  Jan Beulich <jbeulich@xxxxxxxx>
  Julien Grall <julien.grall@xxxxxxxxxx>
  Paul Durrant <paul.durrant@xxxxxxxxxx>
  Roger Pau Monné <roger.pau@xxxxxxxxxx>
  Tim Deegan <tim@xxxxxxx>

jobs:
 build-arm64-xsm                                              pass    
 build-amd64                                                  fail    
 build-armhf                                                  pass    
 build-amd64-libvirt                                          blocked 
 test-armhf-armhf-xl                                          pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-amd64-amd64-xl-qemuu-debianhvm-i386                     blocked 
 test-amd64-amd64-libvirt                                     blocked 


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 66bf4ef04869548128b70d8d371ec992189a6a1c
Author: Paul Durrant <paul.durrant@xxxxxxxxxx>
Date:   Fri Jan 19 11:17:30 2018 +0100

    x86/hvm: re-work viridian APIC assist code
    
    It appears there is a case where Windows enables the APIC assist
    enlightenment[1] but does not use it. This scenario is perfectly valid
    according to the documentation, but causes the state machine in Xen to
    become confused leading to a domain_crash() such as the following:
    
    (XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0
          build: 1db0
    (XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
    (XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe
    (XEN) domain_crash called from viridian.c:452
    (XEN) Domain 4 (vcpu#0) crashed on cpu#1:
    
    The following sequence of events is an example of how this can happen:
    
     - On return to guest vlapic_has_pending_irq() finds a bit set in the IRR.
     - vlapic_ack_pending_irq() calls viridian_start_apic_assist() which latches
       the vector, sets the bit in the ISR and clears it from the IRR.
     - The guest then processes the interrupt but EOIs it normally, therefore
       clearing the bit in the ISR.
     - On next return to guest vlapic_has_pending_irq() calls
       viridian_complete_apic_assist(), which discovers the assist bit still set
       in the shared page and therefore leaves the latched vector in place, but
       also finds another bit set in the IRR.
     - vlapic_ack_pending_irq() is then called but, because the ISR is was
       cleared by the EOI, another call is made to viridian_start_apic_assist()
       and this then calls domain_crash() because it finds the latched vector
       has not been cleared.
    
    Having re-visited the code I also conclude that Xen's implementation of the
    enlightenment is currently wrong and we are not properly following the
    specification.
    
    The specification says:
    
    "The hypervisor sets the Â?No EOI requiredÂ? bit when it injects a virtual
     interrupt if the following conditions are satisfied:
    
     - The virtual interrupt is edge-triggered, and
     - There are no lower priority interrupts pending.
    
     If, at a later time, a lower priority interrupt is requested, the
     hypervisor clears the Â?No EOI requiredÂ? such that a subsequent EOI causes
     an intercept.
     In case of nested interrupts, the EOI intercept is avoided only for the
     highest priority interrupt. This is necessary since no count is maintained
     for the number of EOIs performed by the OS. Therefore only the first EOI
     can be avoided and since the first EOI clears the Â?No EOI RequiredÂ? bit,
     the next EOI generates an intercept."
    
    Thus it is quite legitimate to set the "No EOI required" bit and then
    subsequently take a higher priority interrupt without clearing the bit.
    Thus the avoided EOI will then relate to that subsequent interrupt rather
    than the highest priority interrupt when the bit was set. Hence latching
    the vector when setting the bit is not entirely useful and somewhat
    misleading.
    
    This patch re-works the APIC assist code to simply track when the "No EOI
    required" bit is set and test if it has been cleared by the guest (i.e.
    'completing' the APIC assist), thus indicating a 'missed EOI'. Missed EOIs
    need to be dealt with in two places:
    
     - In vlapic_has_pending_irq(), to avoid comparing the IRR against a stale
       ISR, and
     - In vlapic_EOI_set() because a missed EOI for a higher priority vector
       should be dealt with before the actual EOI for the lower priority
       vector.
    
    Furthermore, because the guest is at liberty to ignore the "No EOI required"
    bit (which lead the crash detailed above) vlapic_EOI_set() must also make
    sure the bit is cleared to avoid confusing the state machine.
    
    Lastly the previous code did not properly emulate an EOI if a missed EOI
    was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
    the ISR. The new code instead calls vlapic_EOI_set().
    
    [1] See section 10.3.5 of Microsoft's "Hypervisor Top Level Functional
        Specification v5.0b".
    
    NOTE: The changes to the save/restore code are safe because the layout
          of struct hvm_viridian_vcpu_context is unchanged and the new
          interpretation of the (previously so named) vp_assist_vector field
          as the boolean pending flag maintains the correct semantics.
    
    Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

commit 48a933ee590e2fdfa240484ebda4f76096277d7e
Author: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Date:   Fri Jan 19 11:16:58 2018 +0100

    x86/efi: fix build with linkers that support both coff-x86-64 and pe-x86-64
    
    When using a linker that supports both formats the following error
    will be triggered:
    
    efi/buildid.o: file not recognized: File format is ambiguous
    efi/buildid.o: matching formats: coff-x86-64 pe-x86-64
    
    Solve this by specifying the efi/buildid.o format to pe-x86-64.
    
    Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
    Reviewed-by: Doug Goldstein <cardoe@xxxxxxxxxx>

commit 97207ddd3b2bbbf6e723d8c5f2a93592a1cf5d81
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Fri Jan 19 11:16:10 2018 +0100

    x86/shadow: widen reference count
    
    Utilize as many of the bits available in the union as possible, without
    (just to be on the safe side) colliding with any of the bits outside of
    PGT_type_mask.
    
    Note that the first and last hunks of the xen/include/asm-x86/mm.h
    change are merely code motion.
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Acked-by: Tim Deegan <tim@xxxxxxx>
    Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

commit 7867181b2ad63f0d2f1ba97598e577538b83882f
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Fri Jan 19 11:14:42 2018 +0100

    x86/PoD: correctly handle non-order-0 decrease-reservation requests
    
    p2m_pod_decrease_reservation() at the moment only returns a boolean
    value: true for "nothing more to do", false for "something more to do".
    If it returns false, decrease_reservation() will loop over the entire
    range, calling guest_remove_page() for each page.
    
    Unfortunately, in the case p2m_pod_decrease_reservation() succeeds
    partially, some of the memory in the range will be not-present; at which
    point guest_remove_page() will return an error, and the entire operation
    will fail.
    
    Fix this by:
    1. Having p2m_pod_decrease_reservation() return exactly the number of
       gpfn pages it has handled (i.e., replaced with 'not present').
    2. Making guest_remove_page() return -ENOENT in the case that the gpfn
       in question was already empty (and in no other cases).
    3. When looping over guest_remove_page(), expect the number of -ENOENT
       failures to be no larger than the number of pages
       p2m_pod_decrease_reservation() removed.
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx>
    Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
    Acked-by: Julien Grall <julien.grall@xxxxxxxxxx>

commit 75c47ae9b63483ac404ea7e4a28cb5fb1d989ef8
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Fri Jan 19 11:09:55 2018 +0100

    x86/HVM: make explicit that hvm_print_line() does output only
    
    On input "c" being 0xff should already have the effect of bailing early
    (due to the isprint()), but let's rather make this explicit. Also
    convert the BUG_ON() to an ASSERT() (nothing fatal happens in the
    function if this is violated), at the same time extending what is being
    checked.
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
(qemu changes not included)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.