[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regressed XSA-286, was [xen-unstable test] 161917: regressions - FAIL



On 13.05.2021 22:15, Andrew Cooper wrote:
> On 13/05/2021 04:56, osstest service owner wrote:
>> Tests which are failing intermittently (not blocking):
>>  test-xtf-amd64-amd64-3 92 xtf/test-pv32pae-xsa-286 fail in 161909 pass in 
>> 161917
> 
> While noticing the ARM issue above, I also spotted this one by chance. 
> There are two issues.
> 
> First, I have reverted bed7e6cad30 and edcfce55917.  The XTF test is
> correct, and they really do reintroduce XSA-286.  It is a miracle of
> timing that we don't need an XSA/CVE against Xen 4.15.

I have to admit that from the description in the revert (on top of
what you say here) it does not really become clear to me what is
wrong with _either_ of these changes:

"The TLB flushing is for Xen's correctness, not the guest's."

XSA-286 was solely about guest correctness, which was broken by Xen's
behavior. Hence we're still only talking about guest observable
behavior here.

"The text in c/s bed7e6cad30 is technically correct, from the guests
 point of view, but clearly false as far as XSA-286 is concerned."

As a result I also don't understand this, nor the actual reason why
you did revert both, rather than just ...

"That said, it is edcfce55917 which introduced the regression, which
 demonstrates that the reasoning is flawed."

... this. Furthermore you merely state an observation here, without
going into any detail as to what's wrong with the reasoning, and
hence why it is the change that's wrong and the test that's correct
(and no issue elsewhere). Don't get me wrong - I'm not excluding
you're right, but you fail to explain things properly. I can't see
how avoiding a flush for a page table which isn't hooked up anywhere
(and which hence isn't accessible via lookups through the linear
page tables) can have caused a problem (except perhaps uncover an
issue, e.g. a missing flush, elsewhere). Nor can I see how the XTF
test would trigger the flush avoidance, as it doesn't play with
free floating page tables. Plus this change affects 64-bit guests
as much as 32-bit ones, yet no (apparent) regression could be seen
there.

Similarly for the other change: Since only guest perspective matters,
the flush ought to be fine to defer until the guest actually reloads
CR3; until then using either the stale or updated linear page tables
is acceptable, and guests need to not rely on either, just like would
be the case on bare metal (and there it's even stronger: an OS can
rely upon the prior page tables to continue to be used, as the PDPTEs
get reloaded _only_ during CR3 loads; mimicking this for PV would be
not exactly trivial, I think). And I notice that the XTF test
exercises an L3 entry update without a subsequent CR3 write, which
is wrong for PAE. (I therefore suspect it is bed7e6cad30 which has
caused the test failure, not edcfce55917 as you have said in the
description of the revert.)

> Given that I was unhappy with the changes in the first place, I don't
> particularly want to see an attempt to resurrect them.  I did not find
> the claim that they were a perf improvement in the first place very
> convincing, and the XTF test demonstrates that the reasoning about their
> safety was incorrect.

Interesting: Where did you voice your unhappiness? All I can find on
that entire series' thread is a reply of yours on a post-commit-
message remark regarding a comment you had introduced with the 286
fix. All other discussion there was between Roger and me.

Additionally I don't see why you treated this as an emergency and
reverted without posting a patch and getting an ack.

> Second, the unexplained OSSTest behaviour.
> 
> When I repro'd this on pinot1, test-pv32pae-xsa-286 failing was totally
> deterministic and repeatable (I tried 100 times because the test is a
> fraction of a second).
> 
> From the log trawling which Ian already did, the first recorded failure
> was flight 160912 on April 11th.  All failures (12, but this number is a
> few flights old now) were on pinot*.
> 
> What would be interesting to see is whether there have been any passes
> on pinot since 160912.
> 
> I can't see any reason why the test would be reliable for me, but
> unreliable for OSSTest, so I'm wondering whether it is actually
> reliable, and something is wrong with the stickiness heuristic.

Isn't (un)reliability of this test, besides the sensitivity to IRQs
and context switches, tied to hardware behavior, in particular TLB
capacity and replacement policy? Aiui the test has

    xtf_success("Success: Probably not vulnerable to XSA-286\n");

for the combination of all of these reasons.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.