[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/6] x86/time: implement tsc as clocksource



On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>> On 24.08.16 at 14:43, <joao.m.martins@xxxxxxxxxx> wrote:
>> This patch introduces support for using TSC as platform time source
>> which is the highest resolution time and most performant to get (~20
>> nsecs).
> 
> Is this indeed still the case with the recently added fences?
Hmm, may be not as fast with the added fences, But definitely the fastest time
source. If you prefer I can remove the mention to time taken.

> 
>> Though there are also several problems associated with its
>> usage, and there isn't a complete (and architecturally defined)
>> guarantee that all machines will provide reliable and monotonic TSC in
>> all cases (I believe Intel to be the only that can guarantee that?) For
>> this reason it's set with less priority when compared to HPET unless
>> adminstrator changes "clocksource" boot option to "tsc". Initializing
>> TSC clocksource requires all CPUs up to have the tsc reliability checks
>> performed. init_xen_time is called before all CPUs are up, and so we
>> would start with HPET at boot time, and switch later to TSC. The switch
>> then happens on verify_tsc_reliability initcall that is invoked when all
>> CPUs are up. When attempting to initialize TSC we also check for time
>> warps and if it has invariant TSC. And in case none of these conditions
>> are met,
> 
> DYM "in case any of these conditions is not met"?
Yeah I will change it. My english isn't at my best these days.

> 
>> we keep the clocksource that was previously initialized on
>> init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no
>> deep C-states, it might not always be the case, so we're conservative
>> and allow TSC to be used as platform timer only with invariant TSC.
>>
>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>> constant-TSC calibration fast path") updates to cpu time use local
>> stamps, which means platform timer is only used to seed the initial cpu
>> time. With clocksource=tsc there is no need to be in sync with another
>> clocksource, so we reseed the local/master stamps to be values of TSC
>> and update the platform time stamps accordingly. Time calibration is set
>> to 1sec after we switch to TSC, thus these stamps are reseeded to also
>> ensure monotonic returning values right after the point we switch to
>> TSC. This is also to avoid the possibility of having inconsistent
>> readings in this short period (i.e. until calibration fires).
>>
>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> ---
>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>
>> Changes since v2:
>>  - Suggest "HPET switching to TSC" only as an example as otherwise it
>>  would be misleading on platforms not having one.
> 
> What does this refer to? The description still unconditionally talks
> about HPET.
I remember adding "For example" in the beginning of the sentence. This was 
clearly a
distraction on my end (you also found another small mistake in the changelog of 
patch
3, despite having addressed the comment).

> 
>>  - Change init_tsctimer to skip all the tests and assume it's called
>>  only on reliable TSC conditions and no warps observed. Tidy
>>  initialization on verify_tsc_reliability as suggested by Konrad.
> 
> Which means that contrary to what you say in the cover letter
> there's nothing to prevent it from being used when CPU hotplug
> is possible.
Probably the cover letter wasn't completely clear on this. I mention that I 
split it
between Patch 2 and 5 (intent was for easier review), and you can see that I add
check in patch 5, plus preventing online of CPUs too.

> I don't think you should skip basic sanity checks, and
> place entirely on the admin the burden of knowing whether the
> option is safe to use.
Neither do I think it should be skipped. We mainly don't duplicate these 
checks. In
the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no 
warps
are observed. So putting that in init_tsctimer would just duplicate the 
previously
done checks. Or would you still prefer as done in previous version i.e. all 
checks in
both init_tsctimer and verify_tsc_reliability?


>> +static struct platform_timesource __initdata plt_tsc =
>> +{
>> +    .id = "tsc",
>> +    .name = "TSC",
>> +    .read_counter = read_tsc,
>> +    .counter_bits = 64,
>> +    /* Not called by init_platform_timer as it is not on the plt_timers 
>> array */
>> +    .init = init_tsctimer,
> 
> I consider this comment misleading: It took me several rounds of
> looking until I understood that you don't mean to say the pointer
> gets filled here only to not leave a NULL one around. I'd prefer if
> you removed it.
> 
OK.

>> @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void)
>>      unsigned int i;
>>      s64 rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         (strcmp(opt_clocksource, "tsc")) )
> 
> Stray parentheses around a function call.
Fixed.

> 
>> @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void)
>>                     __func__);
>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>          }
>> +        else if ( !strcmp(opt_clocksource, "tsc") )
>> +        {
>> +            int cpu;
> 
> unsigned int
Fixed.

> 
>> +            if ( try_platform_timer(&plt_tsc) <= 0 )
>> +                return 0;
> 
> If you merged this into the if() above you could avoid this extra
> return path, which in turn lowers the risk of something getting
> added to the tail of the function and then not being taken care of
> here.
Good point, Fixed.

> 
>> +            /*
>> +             * Platform timer has changed and CPU time will only be updated
>> +             * after we set again the calibration timer, which means we 
>> need to
>> +             * seed again each local CPU time. At this stage TSC is known 
>> to be
>> +             * reliable i.e. monotonically increasing across all CPUs so 
>> this
>> +             * lets us remove the skew between platform timer and TSC, since
>> +             * these are now effectively the same.
>> +             */
>> +            for_each_online_cpu( cpu )
> 
> Please decide whether you consider for_each_online_cpu a
> control keyword like item. If you do, a blank is missing prior to the
> opening parenthesis. If you don't, the blanks inside the parentheses
> need to be dropped.
I will go with the control keyword like style.

>> +            {
>> +                struct cpu_time *t = &per_cpu(cpu_time, cpu);
>> +
>> +                t->stamp.local_tsc = boot_tsc_stamp;
>> +                t->stamp.local_stime = 0;
>> +                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
>> +                t->stamp.master_stime = t->stamp.local_stime;
>> +            }
> 
> Without any synchronization, how "good" are the chances that
> this updating would race with something the other CPUs do?

I assumed that at this stage init calls are still being invoked that we update 
all
cpus time infos, but maybe it's a misconception. I can have this part in one 
function
and have it done on_selected_cpus() and wait until all cpu time structures get
updated. That perhaps would be better?

> 
>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>  
>>      preinit_pit();
>>      tmp = init_platform_timer();
>> +    plt_tsc.frequency = tmp;
> 
> How can this be the TSC frequency? init_platform_timer()
> specifically skips the TSC. And I can see no other place where
> plt_tsc.frequency gets initialized. Am I overlooking something?
> 
That's the calibrated TSC frequency. Before I was setting pts->frequency in
init_tsctimer through a temporary variable called tsc_freq. So I thought I 
could just
drop the variable and set plt_tsc directly. The difference though from previous
versions is that since commit 9334029 this value is returned from platform time
source init() and calibrated against platform timer, instead of always against 
PIT.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.