[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 162771: regressions - FAIL



On 14.06.21 13:58, Jan Beulich wrote:
On 14.06.2021 08:41, Juergen Gross wrote:
On 14.06.21 04:21, osstest service owner wrote:
flight 162771 xen-unstable real [real]
flight 162783 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/162771/
http://logs.test-lab.xenproject.org/osstest/logs/162783/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 
162533
   test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 
162533
   test-amd64-amd64-i386-pvgrub 17 guest-localmigrate       fail REGR. vs. 
162533
   test-amd64-amd64-amd64-pvgrub 17 guest-localmigrate      fail REGR. vs. 
162533

Hmm, this is rather unfortunate.

Those last 2 tests failed due to commit 7bd8989ab77b6ade3b, but just
reverting that patch doesn't seem right to me either.

The Linux kernel has a bug here: it will initially set max_pfn in the
shared_info page to the size of the p2m_list (so my reasoning for above
patch was wrong in this case), but when growing the p2m_list (e.g. due
to ballooning or grant mapping), it will store a real pfn number in
max_pfn. But even this pfn might be wrong, as only the pfn leading to
allocation of a new p2m page will be stored in max_pfn, any higher new
pfn having its p2m entry in the new p2m page won't result in a new
max_pfn entry.

As a result I think the only sane handling would be to assume the
max_pfn value read from the shared_info page is really a pfn.

This would be contrary to the public interface header having

     /*
      * Number of valid entries in the p2m table(s) anchored at
      * pfn_to_mfn_frame_list_list and/or p2m_vaddr.
      */
     unsigned long max_pfn;

IOW the name containing "max" is misleading (should be "num" or
alike), but there's no room imo to change this.

Oh, how nice! :-(

I have let myself been fooled by the correctly named max_pfn field in
libxenguest, together with the wrong usage in the kernel.


This
value should be adjusted to specify the last pfn of the related p2m
page, and the resulting last p2m page should be tolerated to not be
valid.

Another variant would be to just revert above commit and modify the
semantics of max_pfn in the shared_info page to really mean max_pfn+1.

But that's what it is already, according to the comment. Are you
suggesting there was code prior to the change you've quoted that
already violated this (in Xen or the tool stack, that is, not
the issue you suggest has been present in Linux)?

Reading the comment would have helped.

So a plain revert is the way to go.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.