[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack



On Fri, 2015-09-04 at 01:40 +0100, Wei Liu wrote:
> Hi David
> 
> This issue is exposed by the introduction of migration v2. The symptom is that
> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> many pages.

FWIW my adhoc tests overnight gave me:

37858: b953c0d234bc72e8489d3bf51a276c5c4ec85345 v4.1            Fail
37862: 39a8804455fb23f09157341d3ba7db6d7ae6ee76 v4.0            Fail
37860: bfa76d49576599a4b9f9b7a71f23d73d6dcff735 v3.19           Fail

37872: e36f014edff70fc02b3d3d79cead1d58f289332e v3.19-rc7       Fail
37866: 26bc420b59a38e4e6685a73345a0def461136dce v3.19-rc6       Fail
37868: ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc v3.19-rc5       Fail
37864: eaa27f34e91a14cdceed26ed6c6793ec1d186115 v3.19-rc4       Fail *
37867: b1940cd21c0f4abdce101253e860feff547291b0 v3.19-rc3       Pass *
37865: b7392d2247cfe6771f95d256374f1a8e6a6f48d6 v3.19-rc2       Pass

37863: 97bf6af1f928216fd6c5a66e8a57bfa95a659672 v3.19-rc1       Pass

37861: b2776bf7149bddd1f4161f14f79520f17fc1d71d v3.18           Pass

I have set the adhoc bisector working on the ~200 commits between rc3 and
rc4. It's running in the Citrix instance (which is quieter) so the interim
results are only visible within our network at http://osstest.xs.citrite.ne
t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
-xl..html.

So far it has confirmed the basis fail and it is now rechecking the basis
pass.

Slightly strange though is:
$ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ 
include/xen/
$

i.e. there are no relevant seeming xen commits in that range. Maybe the
last one of this is more relevant?

$ git log --grep=[xX][eE][nN] --oneline v3.19-rc3..v3.19-rc4 -- 
bdec419 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
07ff890 xen-netback: fixing the propagation of the transmit shaper timeout
132978b x86: Fix step size adjustment during initial memory mapping
$

I don't think this particular issue is prone to false positives (i.e.
passing when it should fail) and the bisector has reconfirmed the fail case
already, so I think it is unlikely that the bisector is going to come back
and say it can't find a reliable basis for running.

Which might mean we have two issues, some as yet unknown issue between
v3.19-rc3 and -rc4 and the issue you have observed with the number of pages
the toolstack thinks it should be working on, which is masked by the
unknown issue (and could very well be a toolstack bug exposed by a change
in Linux, not a Linux bug at all).

I'm going to leave the bisector going, hopefully it'll tell us something
interesting in whatever it fingers...

Ian.


> 
> Note that all guests have 512MB memory, which means they have 131072 
> pages.
> 
> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> 
>    xc: detail: max_pfn 0x1ffff, p2m_frames 256
>    ...
>    xc: detail: Memory: 2048/131072    1%
>    ...
> 
> However in both 4.1 [0] [1] the number of pages are quite wrong.
> 
> 4.1 32 bit:
> 
>    xc: detail: max_pfn 0xfffff, p2m_frames 1024
>    ...
>    xc: detail: Memory: 11264/1048576    1%
>    ...
> 
> It thinks it has 4096MB memory.
> 
> 4.1 64 bit:
> 
>    xc: detail: max_pfn 0x3ffff, p2m_frames 512
>    ...
>    xc: detail: Memory: 3072/262144    1%
>    ...
> 
> It thinks it has 1024MB memory.
> 
> The total number of pages is determined in libxc by calling
> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> hypervisor. And that value is clearly touched by Linux in some way.
> 
> I now think this is a bug in Linux kernel. The biggest suspect is the
> introduction of linear P2M.  If you think this is a bug in toolstack,
> please let me know.
> 
> I don't know why 4.1 64 bit [0] can still be successfully restored. I
> don't have handy setup to experiment. The restore path doesn't show
> enough information to tell anything. The thing I worry about is that
> migration v2 somehow make the guest bigger than it should be. But that's
> another topic.
> 
> 
> Wei.
> 
> [0] 4.1 kernel 64 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-amd64
> -xl/16.ts-guest-saverestore.log
> 
> [1] 4.1 kernel 32 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-i386
> -xl/14.ts-guest-saverestore.log
> 
> [2] 3.14 kernel 64 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-amd64
> -xl/16.ts-guest-saverestore.log
> 
> [3] 3.14 kernel 32 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-i386
> -xl/16.ts-guest-saverestore.log

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.