[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL



Hi Jan,

On 6/4/19 8:06 AM, Jan Beulich wrote:
On 03.06.19 at 19:15, <anthony.perard@xxxxxxxxxx> wrote:
On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
The same error cannot be reproduced on laxton*. Looking at the test history,
it looks like qemu-upstream-4.12-testing flight has run successfully a few
times on rochester*. So we may have fixed the error in Xen 4.12.

Potential candidates would be:
    - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier 
on"
    - f60658c6ae "xen/arm: Stop relocating Xen"

Ian, is it something the bisector could automatically look at?
If not, I will need to find some time and borrow the board to bisect the
issues.

I attempted to do that bisection myself, and the first commit that git
wanted to try, a common commit to both branches, boots just fine.

Thanks for doing this!

One thing that, for now, completely escapes me: How come the
main 4.11 branch has progressed fine, but the qemuu one has
got stalled like this?

Because Xen on Arm today does not fully respect the Arm Arm when modifying the page-tables. This may result to TLB conflict and break of coherency.


It turns out that the first commit that fails to boot on rochester is
   e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
(even with the "eb8acba82a xen: Fix backport of .." applied)

Now that's particularly odd a regression candidate. It doesn't
touch any Arm code at all (nor does the fixup commit). And the
common code changes don't look "risky" either; the one thing that
jumps out as the most likely of all the unlikely candidates would
seem to be the xen/common/efi/boot.c change, but if there was
a problem there then the EFI boot on Arm would be latently
broken in other ways as well. Plus, of course, you say that the
same change is no problem on 4.12.

Of course the commit itself could be further "bisected" - all
changes other than the introduction of cmdline_strcmp() are
completely independent of one another.

I think this is just a red-herring. The commit is probably modifying enough the layout of Xen that TLB conflict will appear.

Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of the potential cause of TLB conflict.

I haven't suggested a backport of this patch so far, because there are still TLB conflict possible within the function modified. It might also be possible that it exposes more of TLB conflict as more work in Xen is needed (see my MM-PARTn series).

I don't know whether backporting this patch is worth it compare to the risk it introduces.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.