[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 18851: regressions - FAIL



On 05/09/13 13:20, Jan Beulich wrote:
>>>> On 05.09.13 at 13:24, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
>> On 04/09/13 11:41, Ian Jackson wrote:
>>> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>>>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
>>> ...
>>>>> I'm not sure why my osstest push gate didn't catch this, but the
>>>>> regression is indeed caused by the change from Jeremy's old tree to
>>>>> Linux 3.10.y.
>>>
>>> It appears that the push gate didn't catch it because it's host
>>> specific, and it got lucky and didn't run a test on that host.
>>>
>>>> So how do we want to deal with that? Linux maintainers - any
>>>> chance you could help out? The staging tree having been stuck
>>>> for over a week is certainly less than ideal...
>>>
>>> David Vrabel pointed out that more modern kernels have a different
>>> interpretation of things like "dom0_mem=256M", and can waste lots and
>>> lots of actual memory on pointless bookkeeping for future expansion
>>> (which the kernel envisages but we do not).
>>>
>>> I have changed it to "dom0_mem=256M,max:256M".  I got a push of this
>>> change at "Wed, 4 Sep 2013 03:50:14 +0100".  I don't think any of the
>>> test runs yet reported have used this change.
>>
>> Woodlouse's e820 as seen by the kernel looks like:
>>
>> [    0.000000] e820: BIOS-provided physical RAM map:
>> [    0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
>> [    0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved
>> [    0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable
>> [    0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9
>> [    0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data
>> [    0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS
>> [    0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved
>> [    0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved
>> [    0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>> [    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved
>> [    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
>> [    0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved
>> [    0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable
>> [    0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable
>> [    0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
>>
>> That last reserved entry I think confuses the early setup and it does
>> odd things like:
>>
>> [    0.000000] Set 266338518 page(s) to 1-1 mapping
>>
>> Possibly relevant kernel thread here:
>>
>> http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html 
>>
>> I note that the e820 as seen by Xen does not have this reserved region
>>
>> (XEN) Xen-e820 RAM map:
>> (XEN)  0000000000000000 - 000000000009a800 (usable)
>> (XEN)  000000000009a800 - 00000000000a0000 (reserved)
>> (XEN)  00000000000e6000 - 0000000000100000 (reserved)
>> (XEN)  0000000000100000 - 00000000d7f90000 (usable)
>> (XEN)  00000000d7f9e000 - 00000000d7fa0000 type 9
>> (XEN)  00000000d7fa0000 - 00000000d7fae000 (ACPI data)
>> (XEN)  00000000d7fae000 - 00000000d7fe0000 (ACPI NVS)
>> (XEN)  00000000d7fe0000 - 00000000d7fee000 (reserved)
>> (XEN)  00000000d7ff0000 - 00000000d8000000 (reserved)
>> (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
>> (XEN)  00000000fec00000 - 00000000fec03000 (reserved)
>> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
>> (XEN)  00000000ff700000 - 0000000100000000 (reserved)
>> (XEN)  0000000100000000 - 0000000228000000 (usable)
>>
>> So it must be being added by Xen?
> 
> Yes - see d838ac25 ("x86: don't allow Dom0 access to the HT
> address range"). But that's the case on all AMD systems, and
> I thought it wasn't just woodlouse that's an AMD one - Ian?
> 
> In any event - how can the kernel side code make _any_
> assumptions on what is or is not in the E820 table? I've
> recently seen logs from a system where reserved (MMIO)
> blocks appear right below the 1Tb (or maybe it was even 16Tb)
> boundary, without Xen inserting them.
> 
> I would certainly be willing to revert that patch for the time
> being if we have reasons to believe this helps, but only as long
> as it is clear that the kernel needs fixing, and that I'll want this
> back before 4.4 goes out. Do we have baseline (8a7769b4)
> test results including the new kernel, with part of it run on
> woodlouse?

This looks like a red herring.  Having poked about in woodlouse it looks
like something is screwy with interrupts.  The tg3 cards aren't using
MSI and the USB controller is using edge not level handlers.  Another
machine with the same chipset is happily using MSIs.

Malcolm (Cc) has some suggestions for things to try.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.