[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system



Mukesh Rathor wrote:
> On Fri, 18 Dec 2009 07:02:55 +0000
> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> 
>> On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@xxxxxxxxxx> wrote:
>>
>>> The other fix I thought of was to change INITIAL_JIFFIES to
>>> something sooner.
>>>
>>> Would appreciate any help, I don't understand xen time management
>>> well.
>> This isn't really Xen time code, but unchanged Linux time code. I
>> don't know which tree you quoted the code from -- 2.6.18 has similar
>> but not identical. Anyway, I suggest try using the jiffy-comparison
>> macros from <linux/jiffies.h>: time_before(), time_after(), etc.
>> These are designed to work even when jiffies wraps. Feel free to send
>> patch(es) for that, if you test that out and it works okay.
>>
>>  -- Keir
>>
> 
> Ok, I came up with the following patch. Jeremy, can you please take a
> look also, and comment on my fix since I noticed you've got the same 
> issue in your tree. Here's a summary for your benefit:
> 
> init/calibrate.c :  calibrate_delay_direct():
> 
>                 start_jiffies = get_jiffies_64();
>                 while (get_jiffies_64() <= (start_jiffies + tick_divider)) {
>                         pre_start = start;
>                         read_current_timer(&start);
>                 }
> 

Linux time code explicitly forces jiffies (32-bit) to wrap soon after boot to 
prevent other kernel code from making assumptions about jiffies wrap.  In your 
case, I'm guessing that the scrubbing delay is causing a sufficient number of 
timer interrupts to be delayed (queued up) that it is forcing the jiffies to 
wrap earlier in the boot path than expected.  

As Keir suggests, the correct solution is probably to use the time_before/after 
macros appropriately.

The proposed code avoids the problem by accessing jiffies_64 instead.

> if first ever timer interrupt comes after start_jiffies is set, dom0 boot 
> may hang if delta in timer_interrupt() is so huge that it causes jiffies 
> to wrap. It appears delta is very large when memory is more than 512GB on
> certain boxes causing wrap around.
> 
> why is delta in dom0->timer_interrupt() related to memory on system? 
> Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it
> appears lot of page scurbbing results in huge delta on first tick.

The problem here may be that timers are running in the domain while the vcpu is 
not.

Steve


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.