[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks



Hi Keir, Haitao, Eddie:

Here is the patch for timekeeping.
There are three components.

1. Increase missed ticks threshold to 100 seconds. Per the comment in the patch,
   I have seen scheduling delays of 5 seconds in a test lasting 30 minutes.
   The result is an immediate error of 5 seconds in guest time that
   persists.

2.  In pt_timer_fn for no_missed_tick_accounting option, only
   increment pt->pending_intr_nr if its found to be zero.
   This deals with the case where the guest has interrupts disabled
   for longer than a tick period.

3.  Call pt_process_missed_ticks() unconditionally and place the
   test for no_missed_tick_accounting inside pt_process_missed_ticks().
   This returns the calculation for the next timer expiration
   to the original method. The original method is important
   as it sets up a theoretical time space at t=0 where expirations
   occur only at n*period, where n is any integer. This, in turn, removes
   rounding errors.

Accuracy tests:

A.  32bit Linux guest (Item '1' above relevant)
   Test: 2 64bit Linux guests and 1 32bit Linux guest stacked.
         8 vcpus for each guest.
           2 physical processors on the node.
         All vcpus heavily loaded with work.
         Test duration: 3 hours.

       Result: clock error < .02%.
(Before this patch guest time was 16 seconds slow in a 3.5 minute test.)

B.  64bit Linux guest (Items '2' and '3' above relevant)
   Test: 2 64bit Linux guests.
         8 vcpus for each guest.
         2 physical processors on the node.
         All vcpus heavily loaded with work.
         Test duration: 3 hours.

       Result: clock error  .02%.
       (Before this patch clock error was 1.3%. The contributions of
   items '2' and '3' above to the improvement in accuracy are .4%
   and .9% respectively.

Of course, these tests are without ntpd running in the guest.
Ntpd requires a local clock error of < .05%.
With the .02% accuracy, one can used ntpd to synchronize the clock.


thanks,
Dave


Keir Fraser wrote:

I'm going to apply Haitao's no_missed_ticks_fix.patch. If you think fixes
are needed apart from that, please provide a unified diff.

-- Keir

On 30/10/07 21:16, "Dave Winchell" <dwinchell@xxxxxxxxxxxxxxx> wrote:

Keir,

Here are my comments on your change.
I've attached an updated vpt.c with the changes.

1. For the no_missed_tick_accounting method, we still need the update
   to pt->scheduled taking into account the time that has elapsed when
   missed ticks are calculated. missed_ticks (pt_process_missed_ticks)
   is called from pt_restore_timer() and pt_timer_fn() and, thus, its
   easiest to put the check for no_missed_tick_accounting method in
   missed_ticks() itself.

2. In pt_timer_fn you don't want to increment pending_intr_nr beyond 1
   for no_missed_tick_accounting option.

thanks,
Dave


Keir Fraser wrote:

Applied as c/s 16274. Please take a look and make sure the mode works
as you expect.

-- Keir

On 30/10/07 14:28, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:

   Hi, Keir,

   This patch adds a new timer mode, in which no missed ticks is
   calculated. This can be used with latest x86_64 linux guest, since
   it can pick up missed ticks themselves.

    <<no_missed_ticks.patch>>

   Best Regards
   Haitao Shan

   ------------------------------------------------------------------------
   _______________________________________________
   Xen-devel mailing list
   Xen-devel@xxxxxxxxxxxxxxxxxxx
   http://lists.xensource.com/xen-devel


*** vpt.c.new.c 2007-10-30 16:45:26.000000000 -0400
--- vpt.c 2007-10-30 15:30:57.000000000 -0400
***************
*** 57,73 ****
         return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1; ! ! if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
!  if ( missed_ticks > 1000 )
!      {
!   /* TODO: Adjust guest time together */
!   pt->pending_intr_nr++;
!      }
!  else
!      {
!   pt->pending_intr_nr += missed_ticks;
!      }
     }
pt->scheduled += missed_ticks * pt->period;
--- 57,70 ----
         return;
missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
!     if ( missed_ticks > 1000 )
!     {
!         /* TODO: Adjust guest time together */
!         pt->pending_intr_nr++;
!     }
!     else
!     {
!         pt->pending_intr_nr += missed_ticks;
     }
pt->scheduled += missed_ticks * pt->period;
***************
*** 120,126 ****
list_for_each_entry ( pt, head, list )
     {
!  pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
--- 117,124 ---- list_for_each_entry ( pt, head, list )
     {
!         if ( !mode_is(v->domain, no_missed_tick_accounting) )
!             pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
***************
*** 135,151 ****
pt_lock(pt); ! if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
!  if(!pt->pending_intr_nr)
!      pt->pending_intr_nr++;
!     }
!     else
!  pt->pending_intr_nr++;
if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
!  pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
--- 133,152 ---- pt_lock(pt); ! pt->pending_intr_nr++; if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
!         if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
!         {
!             pt_process_missed_ticks(pt);
!         }
!         else if ( (NOW() - pt->scheduled) >= 0 )
!         {
!             pt->pending_intr_nr++;
!             pt->scheduled = NOW() + pt->period;
!         }
         set_timer(&pt->timer, pt->scheduled);
     }
/*
* vpt.c: Virtual Platform Timer
*
* Copyright (c) 2006, Xiaowei Yang, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along
with
* this program; if not, write to the Free Software Foundation, Inc., 59
Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/

#include <xen/time.h>
#include <asm/hvm/support.h>
#include <asm/hvm/vpt.h>
#include <asm/event.h>

#define mode_is(d, name) \
   ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)

static void pt_lock(struct periodic_time *pt)
{
   struct vcpu *v;

   for ( ; ; )
   {
       v = pt->vcpu;
       spin_lock(&v->arch.hvm_vcpu.tm_lock);
       if ( likely(pt->vcpu == v) )
           break;
       spin_unlock(&v->arch.hvm_vcpu.tm_lock);
   }
}

static void pt_unlock(struct periodic_time *pt)
{
   spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock);
}

static void pt_process_missed_ticks(struct periodic_time *pt)
{
   s_time_t missed_ticks;

   if ( pt->one_shot )
       return;

   missed_ticks = NOW() - pt->scheduled;
   if ( missed_ticks <= 0 )
       return;

   missed_ticks = missed_ticks / (s_time_t) pt->period + 1;

   if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
if ( missed_ticks > 1000 )
  {
/* TODO: Adjust guest time together */
pt->pending_intr_nr++;
  }
else
  {
pt->pending_intr_nr += missed_ticks;
  }
   }

   pt->scheduled += missed_ticks * pt->period;
}

static void pt_freeze_time(struct vcpu *v)
{
   if ( !mode_is(v->domain, delay_for_missed_ticks) )
       return;

   v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v);
}

static void pt_thaw_time(struct vcpu *v)
{
   if ( !mode_is(v->domain, delay_for_missed_ticks) )
       return;

   if ( v->arch.hvm_vcpu.guest_time == 0 )
       return;

   hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time);
   v->arch.hvm_vcpu.guest_time = 0;
}

void pt_save_timer(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   if ( test_bit(_VPF_blocked, &v->pause_flags) )
       return;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
       stop_timer(&pt->timer);

   pt_freeze_time(v);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void pt_restore_timer(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
pt_process_missed_ticks(pt);
       set_timer(&pt->timer, pt->scheduled);
   }

   pt_thaw_time(v);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

static void pt_timer_fn(void *data)
{
   struct periodic_time *pt = data;

   pt_lock(pt);

   if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {
if(!pt->pending_intr_nr)
  pt->pending_intr_nr++;
   }
   else
pt->pending_intr_nr++;

   if ( !pt->one_shot )
   {
       pt->scheduled += pt->period;
pt_process_missed_ticks(pt);
       set_timer(&pt->timer, pt->scheduled);
   }

   vcpu_kick(pt->vcpu);

   pt_unlock(pt);
}

void pt_update_irq(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;
   uint64_t max_lag = -1ULL;
   int irq = -1;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
       if ( !is_isa_irq_masked(v, pt->irq) && pt->pending_intr_nr &&
            ((pt->last_plt_gtime + pt->period_cycles) < max_lag) )
       {
           max_lag = pt->last_plt_gtime + pt->period_cycles;
           irq = pt->irq;
       }
   }

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);

   if ( is_lvtt(v, irq) )
   {
       vlapic_set_irq(vcpu_vlapic(v), irq, 0);
   }
   else if ( irq >= 0 )
   {
       hvm_isa_irq_deassert(v->domain, irq);
       hvm_isa_irq_assert(v->domain, irq);
   }
}

static struct periodic_time *is_pt_irq(
   struct vcpu *v, struct hvm_intack intack)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;
   struct RTCState *rtc = &v->domain->arch.hvm_domain.pl_time.vrtc;
   int vector;

   list_for_each_entry ( pt, head, list )
   {
       if ( !pt->pending_intr_nr )
           continue;

       if ( is_lvtt(v, pt->irq) )
       {
           if ( pt->irq != intack.vector )
               continue;
           return pt;
       }

       vector = get_isa_irq_vector(v, pt->irq, intack.source);

       /* RTC irq need special care */
       if ( (intack.vector != vector) ||
            ((pt->irq == 8) && !is_rtc_periodic_irq(rtc)) )
           continue;

       return pt;
   }

   return NULL;
}

void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
{
   struct periodic_time *pt;
   time_cb *cb;
   void *cb_priv;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   pt = is_pt_irq(v, intack);
   if ( pt == NULL )
   {
       spin_unlock(&v->arch.hvm_vcpu.tm_lock);
       return;
   }

   if ( pt->one_shot )
   {
       pt->enabled = 0;
       list_del(&pt->list);
   }
   else
   {
       pt->pending_intr_nr--;
       if ( mode_is(v->domain, no_missed_tick_accounting) )
           pt->last_plt_gtime = hvm_get_guest_time(v);
       else
           pt->last_plt_gtime += pt->period_cycles;
   }

   if ( mode_is(v->domain, delay_for_missed_ticks) &&
        (hvm_get_guest_time(v) < pt->last_plt_gtime) )
       hvm_set_guest_time(v, pt->last_plt_gtime);

   cb = pt->cb;
   cb_priv = pt->priv;

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);

   if ( cb != NULL )
       cb(v, cb_priv);
}

void pt_reset(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
   {
       pt->pending_intr_nr = 0;
       pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
       pt->scheduled = NOW() + pt->period;
       set_timer(&pt->timer, pt->scheduled);
   }

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void pt_migrate(struct vcpu *v)
{
   struct list_head *head = &v->arch.hvm_vcpu.tm_list;
   struct periodic_time *pt;

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   list_for_each_entry ( pt, head, list )
       migrate_timer(&pt->timer, v->processor);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void create_periodic_time(
   struct vcpu *v, struct periodic_time *pt, uint64_t period,
   uint8_t irq, char one_shot, time_cb *cb, void *data)
{
   destroy_periodic_time(pt);

   spin_lock(&v->arch.hvm_vcpu.tm_lock);

   pt->enabled = 1;
   pt->pending_intr_nr = 0;

   /* Periodic timer must be at least 0.9ms. */
   if ( (period < 900000) && !one_shot )
   {
       gdprintk(XENLOG_WARNING,
                "HVM_PlatformTime: program too small period %"PRIu64"\n",
                period);
       period = 900000;
   }

   pt->period = period;
   pt->vcpu = v;
   pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu);
   pt->irq = irq;
   pt->period_cycles = (u64)period * cpu_khz / 1000000L;
   pt->one_shot = one_shot;
   pt->scheduled = NOW() + period;
   /*
    * Offset LAPIC ticks from other timer ticks. Otherwise guests which use
    * LAPIC ticks for process accounting can see long sequences of process
    * ticks incorrectly accounted to interrupt processing.
    */
   if ( is_lvtt(v, irq) )
       pt->scheduled += period >> 1;
   pt->cb = cb;
   pt->priv = data;

   list_add(&pt->list, &v->arch.hvm_vcpu.tm_list);

   init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
   set_timer(&pt->timer, pt->scheduled);

   spin_unlock(&v->arch.hvm_vcpu.tm_lock);
}

void destroy_periodic_time(struct periodic_time *pt)
{
   if ( !pt->enabled )
       return;

   pt_lock(pt);
   pt->enabled = 0;
   list_del(&pt->list);
   pt_unlock(pt);

   /*
    * pt_timer_fn() can run until this kill_timer() returns. We must do this
    * outside pt_lock() otherwise we can deadlock with pt_timer_fn().
    */
   kill_timer(&pt->timer);
}



diff -r 2717128cbdd1 xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c    Wed Oct 31 10:07:42 2007 +0000
+++ b/xen/arch/x86/hvm/vpt.c    Thu Nov 01 16:19:06 2007 -0400
@@ -57,14 +57,23 @@ static void pt_process_missed_ticks(stru
         return;
 
     missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
-    if ( missed_ticks > 1000 )
-    {
-        /* TODO: Adjust guest time together */
-        pt->pending_intr_nr++;
-    }
-    else
-    {
-        pt->pending_intr_nr += missed_ticks;
+
+    if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {
+       /* When many guests are stacked on a node and the total number of vcpus
+        * running loads is 10 timed the number of physical processors,
+        * scheduling delays of 5 seconds are common. So set the threshold at
+        * 100 seconds to be safe. It would be reasonable to remove this
+        * check against threshold completely.
+        */
+       if ( missed_ticks > 100000 )
+           {
+               /* TODO: Adjust guest time together */
+               pt->pending_intr_nr++;
+           }
+       else
+           {
+               pt->pending_intr_nr += missed_ticks;
+           }
     }
 
     pt->scheduled += missed_ticks * pt->period;
@@ -117,15 +126,7 @@ void pt_restore_timer(struct vcpu *v)
 
     list_for_each_entry ( pt, head, list )
     {
-        if ( !mode_is(v->domain, no_missed_tick_accounting) )
-        {
-            pt_process_missed_ticks(pt);
-        }
-        else if ( (NOW() - pt->scheduled) >= 0 )
-        {
-            pt->pending_intr_nr++;
-            pt->scheduled = NOW() + pt->period;
-        }
+       pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
 
@@ -140,13 +141,14 @@ static void pt_timer_fn(void *data)
 
     pt_lock(pt);
 
-    pt->pending_intr_nr++;
+    if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) || 
!pt->pending_intr_nr ) {
+       pt->pending_intr_nr++;
+    }
 
     if ( !pt->one_shot )
     {
         pt->scheduled += pt->period;
-        if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) )
-            pt_process_missed_ticks(pt);
+       pt_process_missed_ticks(pt);
         set_timer(&pt->timer, pt->scheduled);
     }
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.