[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...



>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>Sent: Tuesday, June 01, 2010 5:31 PM
>To: Jiang, Yunhong; Xu, Jiajun; xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0:
>#a3e7c7...
>
>On 01/06/2010 08:43, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>
>> For issue 2, CPU panic when running cpu offline, it should comes from the
>> periodic_timer.
>>
>> When a CPU is pull down, cpu_disable_scheduler will remove the single shot
>> timer, but the periodic_timer is not migrated.
>> After the vcpu is scheduled on another pCPU later, and then schedule out from
>> that new pcpu, the stop_timer(&prev->periodic_timer) will try to access the
>> per_cpu strucutre, whic still poiting to the offlined CPU's per_cpu area and
>> will cause trouble. This should be caused by the per_cpu changes.
>
>Which xen-unstable changeset are you testing? All timers should be
>automatically migrated off a dead CPU and onto CPU0 by changeset 21424. Is
>that not working okay for you?

We are testing on 21492.

After more investigation, the root cause is the periodic_timer is stopped 
before take_cpu_down (in schedule()), so that it is not covred by 21424.
When v->periodic_period ==0, next vcpu's p_timer is not updated by the 
schedule(), thus, later in next schedule round, it will cause trouble for 
stop_timer().

With following small patch, it works, but I'm not sure if this is good solution.

--jyh

diff -r 96917cf25bf3 xen/common/schedule.c
--- a/xen/common/schedule.c     Fri May 28 10:54:07 2010 +0100
+++ b/xen/common/schedule.c     Wed Jun 02 15:18:56 2010 +0800
@@ -893,7 +893,10 @@ static void vcpu_periodic_timer_work(str
     ASSERT(!active_timer(&v->periodic_timer));
 
     if ( v->periodic_period == 0 )
+    {
+        v->periodic_timer.cpu = smp_processor_id();
         return;
+    }
 
     periodic_next_event = v->periodic_last_event + v->periodic_period;
 





>
> -- Keir
>
>> I try to migrate the periodic_timer also when cpu_disable_scheduler() and
>> seems it works. (comments the migration in cpu_disable_scheudler will trigger
>> the printk).
>> Seems on your side, the timer will always be triggered before schedule out?
>>
>> --jyh
>>
>> diff -r 96917cf25bf3 xen/common/schedule.c
>> --- a/xen/common/schedule.c Fri May 28 10:54:07 2010 +0100
>> +++ b/xen/common/schedule.c Tue Jun 01 15:35:21 2010 +0800
>> @@ -487,6 +487,15 @@ int cpu_disable_scheduler(unsigned int c
>>                  migrate_timer(&v->singleshot_timer, cpu_mig);
>>              }
>>
>> +/*
>> +            if ( v->periodic_timer.cpu == cpu )
>> +            {
>> +                int cpu_mig = first_cpu(c->cpu_valid);
>> +                if ( cpu_mig == cpu )
>> +                    cpu_mig = next_cpu(cpu_mig, c->cpu_valid);
>> +                migrate_timer(&v->periodic_timer, cpu_mig);
>> +            }
>> +*/
>>              if ( v->processor == cpu )
>>              {
>>                  set_bit(_VPF_migrating, &v->pause_flags);
>> @@ -505,7 +514,10 @@ int cpu_disable_scheduler(unsigned int c
>>               * all locks.
>>               */
>>              if ( v->processor == cpu )
>> +            {
>> +                printk("we hit the EAGAIN here\n");
>>                  ret = -EAGAIN;
>> +            }
>>          }
>>      }
>>      return ret;
>> @@ -1005,6 +1017,11 @@ static void schedule(void)
>>
>>      perfc_incr(sched_ctx);
>>
>> +    if (prev->periodic_timer.cpu != smp_processor_id() &&
>> !cpu_online(prev->periodic_timer.cpu))
>> +    {
>> +        printk("I'm now at cpu %x, timer's cpu is %x\n", smp_processor_id(),
>> prev->periodic_timer.cpu);
>> +    }
>> +
>>      stop_timer(&prev->periodic_timer);
>>
>>      /* Ensure that the domain has an up-to-date time base. */
>>
>>
>>
>> --jyh
>>
>>> -----Original Message-----
>>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Keir Fraser
>>> Sent: Tuesday, May 25, 2010 5:15 PM
>>> To: Xu, Jiajun; xen-devel@xxxxxxxxxxxxxxxxxxx
>>> Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0:
>>> #a3e7c7...
>>>
>>> On 25/05/2010 10:13, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:
>>>
>>>>>>> 1. xen hypervisor hang when create guest on 32e platform
>>>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617
>>>>>
>>>>> The bug occurs each time when I created the guest. I have attached the
>>>>> serial
>>>>> output on the bugzilla.
>>>>
>>>> I haven't been able to reproduce this.
>>>>
>>>>>>> 2. CPU panic when running cpu offline
>>>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616
>>>>>
>>>>> Xen will panic when I offline cpu each time. The log is also attached on
>>>>> the
>>>>> bugzilla.
>>>>
>>>> Nor this. I even installed 32-bit Xen to match your environment more
>>>> closely.
>>>
>>> I'm running xen-unstable:21447 by the way. I ran 64-bit Xen for testing (1)
>>> above, and both 64-bit and 32-bit Xen for testing (2).
>>>
>>> K.
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.