Xen project Mailing List

[Xen-devel] [xen-4.9-testing test] 144723: regressions - trouble: fail/pass/starved

To: xen-devel@xxxxxxxxxxxxxxxxxxxx, osstest-admin@xxxxxxxxxxxxxx

From: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>

Date: Thu, 12 Dec 2019 10:21:21 +0000

Delivery-date: Thu, 12 Dec 2019 10:21:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

flight 144723 xen-4.9-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144723/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-win7-amd64 15 guest-saverestore.2 fail REGR. vs. 144545 test-amd64-amd64-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail REGR. vs. 144545 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail blocked in 144545 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail blocked in 144545 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144545 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 144545 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144545 test-amd64-i386-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail like 144545 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail like 144545 test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt 13 migrate-support-check fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-seattle 13 migrate-support-check fail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-check fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-check fail never pass test-arm64-arm64-xl 13 migrate-support-check fail never pass test-arm64-arm64-xl 14 saverestore-support-check fail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-check fail never pass test-amd64-amd64-libvirt 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass test-armhf-armhf-xl 13 migrate-support-check fail never pass test-armhf-armhf-xl 14 saverestore-support-check fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-check fail never pass test-armhf-armhf-libvirt 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass test-arm64-arm64-xl-thunderx 2 hosts-allocate starved n/a version targeted for testing: xen 43ab30b13fe8b1d5f92a9ad2ca7d61f4c77b6cac baseline version: xen 8d1ee9f2c473fec54b5018c01ad556d7afd62c17 Last test of basis 144545 2019-12-05 12:05:32 Z 6 days Testing same since 144723 2019-12-11 15:10:41 Z 0 days 1 attempts ------------------------------------------------------------ People who touched revisions under test: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> George Dunlap <george.dunlap@xxxxxxxxxx> Jan Beulich <jbeulich@xxxxxxxx> Julien Grall <julien@xxxxxxx> Kevin Tian <kevin.tian@xxxxxxxxx> jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-prev pass build-i386-prev pass build-amd64-pvops pass build-arm64-pvops pass build-armhf-pvops pass build-i386-pvops pass test-xtf-amd64-amd64-1 pass test-xtf-amd64-amd64-2 pass test-xtf-amd64-amd64-3 pass test-xtf-amd64-amd64-4 pass test-xtf-amd64-amd64-5 pass test-amd64-amd64-xl pass test-arm64-arm64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm pass test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-debianhvm-i386-xsm pass test-amd64-i386-xl-qemut-debianhvm-i386-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-arm64-arm64-xl-xsm pass test-amd64-i386-xl-xsm pass test-amd64-amd64-qemuu-nested-amd fail test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64 pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemut-ws16-amd64 fail test-amd64-i386-xl-qemut-ws16-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-armhf-armhf-xl-arndale pass test-amd64-amd64-xl-credit1 pass test-arm64-arm64-xl-credit1 pass test-armhf-armhf-xl-credit1 pass test-amd64-amd64-xl-credit2 pass test-arm64-arm64-xl-credit2 pass test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck pass test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-qemuu-nested-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-livepatch pass test-amd64-i386-livepatch pass test-amd64-amd64-migrupgrade pass test-amd64-i386-migrupgrade pass test-amd64-amd64-xl-multivcpu pass test-armhf-armhf-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-libvirt-pair pass test-amd64-i386-libvirt-pair pass test-amd64-amd64-amd64-pvgrub pass test-amd64-amd64-i386-pvgrub pass test-amd64-amd64-pygrub pass test-amd64-amd64-xl-qcow2 pass test-armhf-armhf-libvirt-raw pass test-amd64-i386-xl-raw pass test-amd64-amd64-xl-rtds pass test-armhf-armhf-xl-rtds fail test-arm64-arm64-xl-seattle pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-amd64-xl-shadow pass test-amd64-i386-xl-shadow pass test-arm64-arm64-xl-thunderx starved test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-xl-vhd pass ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. ------------------------------------------------------------ commit 43ab30b13fe8b1d5f92a9ad2ca7d61f4c77b6cac Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Date: Wed Dec 11 15:54:19 2019 +0100 AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables update_paging_mode() has multiple bugs: 1) Booting with iommu=debug will cause it to inform you that that it called without the pdev_list lock held. 2) When growing by more than a single level, it leaks the newly allocated table(s) in the case of a further error. Furthermore, the choice of default level for a domain has issues: 1) All HVM guests grow from 2 to 3 levels during construction because of the position of the VRAM just below the 4G boundary, so defaulting to 2 is a waste of effort. 2) The limit for PV guests doesn't take memory hotplug into account, and isn't dynamic at runtime like HVM guests. This means that a PV guest may get RAM which it can't map in the IOMMU. The dynamic height is a property unique to AMD, and adds a substantial quantity of complexity for what is a marginal performance improvement. Remove the complexity by removing the dynamic height. PV guests now get 3 or 4 levels based on any hotplug regions in the host. This only makes a difference for hardware which previously had all RAM below the 512G boundary, and a hotplug region above. HVM guests now get 4 levels (which will be sufficient until 256TB guests become a thing), because we don't currently have the information to know when 3 would be safe to use. The overhead of this extra level is not expected to be noticeable. It costs one page (4k) per domain, and one extra IO-TLB paging structure cache entry which is very hot and less likely to be evicted. This is XSA-311. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Acked-by: Jan Beulich <jbeulich@xxxxxxxx> master commit: b4f042236ae0bb6725b3e8dd40af5a2466a6f971 master date: 2019-12-11 14:55:32 +0100 commit 55bd90db577c9e0d2248fc654274d8a2c207ccf0 Author: George Dunlap <george.dunlap@xxxxxxxxxx> Date: Wed Dec 11 15:53:39 2019 +0100 x86/mm: relinquish_memory: Grab an extra type ref when setting PGT_partial The PGT_partial bit in page->type_info holds both a type count and a general ref count. During domain tear-down, when free_page_type() returns -ERESTART, relinquish_memory() correctly handles the general ref count, but fails to grab an extra type count when setting PGT_partial. When this bit is eventually cleared, type_count underflows and triggers the following BUG in page_alloc.c:free_domheap_pages(): BUG_ON((pg[i].u.inuse.type_info & PGT_count_mask) != 0); As far as we can tell, this page underflow cannot be exploited any any other way: The page can't be used as a pagetable by the dying domain because it's dying; it can't be used as a pagetable by any other domain since it belongs to the dying domain; and ownership can't transfer to any other domain without hitting the BUG_ON() in free_domheap_pages(). (steal_page() won't work on a page in this state, since it requires PGC_allocated to be set, and PGC_allocated will already have been cleared.) Fix this by grabbing an extra type ref if setting PGT_partial in relinquish_memory. This is part of XSA-310. Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> Acked-by: Jan Beulich <jbeulich@xxxxxxxx> master commit: 66bdc16aeed8ddb2ae724adc5ea6bde0dea78c3d master date: 2019-12-11 14:55:08 +0100 commit 173e805a1dd7dc05cc6d53e04cdaab5b6a8f302a Author: George Dunlap <george.dunlap@xxxxxxxxxx> Date: Wed Dec 11 15:53:15 2019 +0100 x86/mm: alloc/free_lN_table: Retain partial_flags on -EINTR When validating or de-validating pages (in alloc_lN_table and free_lN_table respectively), the `partial_flags` local variable is used to keep track of whether the "current" PTE started the entire operation in a "may be partial" state. One of the patches in XSA-299 addressed the fact that it is possible for a previously-partially-validated entry to subsequently be found to have invalid entries (indicated by returning -EINVAL); in which case page->partial_flags needs to be set to indicate that the current PTE may have the partial bit set (and thus _put_page_type() should be called with PTF_partial_set). Unfortunately, the patches in XSA-299 assumed that once put_page_from_lNe() returned -ERESTART on a page, it was not possible for it to return -EINTR. This turns out to be true for alloc_lN_table() and free_lN_table, but not for _get_page_type() and _put_page_type(): both can return -EINTR when called on pages with PGT_partial set. In these cases, the pages PGT_partial will still be set; failing to set partial_flags appropriately may allow an attacker to do a privilege escalation similar to those described in XSA-299. Fix this by always copying the local partial_flags variable into page->partial_flags when exiting early. NB that on the "get" side, no adjustment to nr_validated_entries is needed: whether pte[i] is partially validated or entirely un-validated, we want nr_validated_entries = i. On the "put" side, however, we need to adjust nr_validated_entries appropriately: if pte[i] is entirely validated, we want nr_validated_entries = i + 1; if pte[i] is partially validated, we want nr_validated_entries = i. This is part of XSA-310. Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> master commit: 4e70f4476c0c543559f971faecdd5f1300cddb0a master date: 2019-12-11 14:54:43 +0100 commit 248f22e0b67f4bd22d8175f770d02f52ca780a64 Author: George Dunlap <george.dunlap@xxxxxxxxxx> Date: Wed Dec 11 15:52:55 2019 +0100 x86/mm: Set old_guest_table when destroying vcpu pagetables Changeset 6c4efc1eba ("x86/mm: Don't drop a type ref unless you held a ref to begin with"), part of XSA-299, changed the calling discipline of put_page_type() such that if put_page_type() returned -ERESTART (indicating a partially de-validated page), subsequent calls to put_page_type() must be called with PTF_partial_set. If called on a partially de-validated page but without PTF_partial_set, Xen will BUG(), because to do otherwise would risk opening up the kind of privilege escalation bug described in XSA-299. One place this was missed was in vcpu_destroy_pagetables(). put_page_and_type_preemptible() is called, but on -ERESTART, the entire operation is simply restarted, causing put_page_type() to be called on a partially de-validated page without PTF_partial_set. The result was that if such an operation were interrupted, Xen would hit a BUG(). Fix this by having vcpu_destroy_pagetables() consistently pass off interrupted de-validations to put_old_page_type(): - Unconditionally clear references to the page, even if put_page_and_type failed - Set old_guest_table and old_guest_table_partial appropriately While here, do some refactoring: - Move clearing of arch.cr3 to the top of the function - Now that clearing is unconditional, move the unmap to the same conditional as the l4tab mapping. This also allows us to reduce the scope of the l4tab variable. - Avoid code duplication by looping to drop references on guest_table_user This is part of XSA-310. Reported-by: Sarah Newman <srn@xxxxxxxxx> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> master commit: ececa12b2c4c8e4433e4f9be83f5c668ae36fe08 master date: 2019-12-11 14:54:13 +0100 commit ec229c22656c82ed2acfa99c75e693435f36b094 Author: George Dunlap <george.dunlap@xxxxxxxxxx> Date: Wed Dec 11 15:52:24 2019 +0100 x86/mm: Don't reset linear_pt_count on partial validation "Linear pagetables" is a technique which involves either pointing a pagetable at itself, or to another pagetable the same or higher level. Xen has limited support for linear pagetables: A page may either point to itself, or point to another page of the same level (i.e., L2 to L2, L3 to L3, and so on). XSA-240 introduced an additional restriction that limited the "depth" of such chains by allowing pages to either *point to* other pages of the same level, or *be pointed to* by other pages of the same level, but not both. To implement this, we keep track of the number of outstanding times a page points to or is pointed to another page table, to prevent both from happening at the same time. Unfortunately, the original commit introducing this reset this count when resuming validation of a partially-validated pagetable, dropping some "linear_pt_entry" counts. On debug builds on systems where guests used this feature, this might lead to crashes that look like this: Assertion 'oc > 0' failed at mm.c:874 Worse, if an attacker could engineer such a situation to occur, they might be able to make loops or other abitrary chains of linear pagetables, leading to the denial-of-service situation outlined in XSA-240. This is XSA-309. Reported-by: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> master commit: 7473efd12fb7a6548f5303f1f4c5cb521543a813 master date: 2019-12-11 14:10:27 +0100 commit e879bfe73ad76412764f12f80bf0b3710c52ab88 Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Date: Wed Dec 11 15:51:11 2019 +0100 x86/vtx: Work around SingleStep + STI/MovSS VMEntry failures See patch comment for technical details. Concerning the timeline, this was first discovered in the aftermath of XSA-156 which caused #DB to be intercepted unconditionally, but only in its SingleStep + STI form which is restricted to privileged software. After working with Intel and identifying the problematic vmentry check, this workaround was suggested, and the patch was posted in an RFC series. Outstanding work for that series (not breaking Introspection) is still pending, and this fix from it (which wouldn't have been good enough in its original form) wasn't committed. A vmentry failure was reported to xen-devel, and debugging identified this bug in its SingleStep + MovSS form by way of INT1, which does not involve the use of any privileged instructions, and proving this to be a security issue. This is XSA-308 Reported-by: Håkon Alstadheim <hakon@xxxxxxxxxxxxxxxxxx> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Acked-by: Kevin Tian <kevin.tian@xxxxxxxxx> master commit: 1d3eb8259804e5bec991a3462d69ba6bd80bb40e master date: 2019-12-11 14:09:30 +0100 commit ce126c91a3d18b9a87f58e713708b1b963e00610 Author: Jan Beulich <jbeulich@xxxxxxxx> Date: Wed Dec 11 15:50:20 2019 +0100 x86+Arm32: make find_next_{,zero_}bit() have well defined behavior These functions getting used with the 2nd and 3rd arguments being equal wasn't well defined: Arm64 reliably returns the value of the 2nd argument in this case, while on x86 for bitmaps up to 64 bits wide the return value was undefined (due to the undefined behavior of a shift of a value by the number of bits it's wide) when the incoming value was 64. On Arm32 an actual out of bounds access would happen when the size/offset value is a multiple of 32; if this access doesn't fault, the return value would have been sufficiently correct afaict. Make the functions consistently tolerate the last two arguments being equal (and in fact the 3rd argument being greater or equal to the 2nd), in favor of finding and fixing all the use sites that violate the original more strict assumption. This is XSA-307. Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Acked-by: Julien Grall <julien@xxxxxxx> master commit: 7442006b9f0940fb36f1f8470a416ec836e0d2ce master date: 2019-12-11 14:06:18 +0100 commit 4b6942703dc8072fffa0d8f75169f304746abe2b Author: Jan Beulich <jbeulich@xxxxxxxx> Date: Wed Dec 11 15:49:49 2019 +0100 AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page Unmapping a page which has never been mapped should be a no-op (note how it already is in case there was no root page table allocated). There's in particular no need to grow the number of page table levels in use, and there's also no need to allocate intermediate page tables except when needing to split a large page. Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Reviewed-by: Paul Durrant <paul@xxxxxxx> Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> master commit: ad591454f069647c36a7daaa9ec23384c0263f0b master date: 2019-11-12 11:08:34 +0100 (qemu changes not included) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.