[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/5] x86/time: implement tsc as clocksource



On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>> On 14.09.16 at 19:37, <joao.m.martins@xxxxxxxxxx> wrote:
>> This patch introduces support for using TSC as platform time source
>> which is the highest resolution time and most performant to get.
>> Though there are also several problems associated with its usage, and
>> there isn't a complete (and architecturally defined) guarantee that
>> all machines will provide reliable and monotonic TSC in all cases (I
>> believe Intel to be the only that can guarantee that?) For this reason
>> it's set with less priority when compared to HPET unless adminstrator
>> changes "clocksource" boot option to "tsc".
> 
> In the following sentence you removed the exclusive mentioning
> of HPET, but above you don't. Furthermore I don't think this
> sentence is in line with what the patch does: There's no priority
> given to it, and it won't be used at all when not requested on the
> command line.
You're right, let me change this sentence to be:

For this reason it's not used unless administrator changes "clocksource" boot
option to "tsc".

> 
>> Initializing TSC
>> clocksource requires all CPUs up to have the tsc reliability checks
>> performed. init_xen_time is called before all CPUs are up, so for
>> example we would start with HPET (or ACPI, PIT) at boot time, and
>> switch later to TSC. The switch then happens on verify_tsc_reliability
>> initcall that is invoked when all CPUs are up. When attempting to
>> initialize TSC we also check for time warps and if it has invariant
>> TSC. Note that while we deem reliable a CONSTANT_TSC with no deep
>> C-states, it might not always be the case, so we're conservative and
>> allow TSC to be used as platform timer only with invariant TSC.
>> Additionally we check if CPU Hotplug isn't meant to be performed on
>> the host which will either be when max vcpus and num_present_cpu are
>> the same. This is because a newly hotplugged CPU may not satisfy the
>> condition of having all TSCs synchronized - so when having tsc
>> clocksource being used we allow offlining CPUs but not onlining any
>> ones back. Finally we prevent TSC from being used as clocksource on
>> multiple sockets because it isn't guaranteed to be invariant. Further
>> relaxing of this last requirement is added in a separate patch, such
>> that we allow vendors with such guarantee to use TSC as clocksource.
>> In case any of these conditions is not met, we keep the clocksource
>> that was previously initialized on init_xen_time.
>>
>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>> constant-TSC calibration fast path") updates to cpu time use local
>> stamps, which means platform timer is only used to seed the initial
>> cpu time.  With clocksource=tsc there is no need to be in sync with
>> another clocksource, so we reseed the local/master stamps to be values
>> of TSC and update the platform time stamps accordingly. Time
>> calibration is set to 1sec after we switch to TSC, thus these stamps
>> are reseeded to also ensure monotonic returning values right after the
>> point we switch to TSC. This is also to avoid the possibility of
>> having inconsistent readings in this short period (i.e. until
>> calibration fires).
> 
> And within this one second, which may cover some of Dom0's
> booting up, it is okay to have inconsistencies?
It's not okay which is why I am removing this possibility when switching to TSC.
The inconsistencies in those readings (if I wasn't adjusting) would be because
we would be using (in that 1-sec) those cpu time tuples calculated by the
previous calibration or platform time initialization (while still was HPET,
ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
change it to "remove the possibility" in this last sentence?

> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -475,6 +475,50 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +
>> +/*
>> + * Called in verify_tsc_reliability() under reliable TSC conditions
>> + * thus reusing all the checks already performed there.
>> + */
>> +static s64 __init init_tsc(struct platform_timesource *pts)
>> +{
>> +    u64 ret = pts->frequency;
>> +
>> +    if ( nr_cpu_ids != num_present_cpus() )
>> +    {
>> +        printk(XENLOG_INFO "TSC: CPU Hotplug intended\n");
>> +        ret = 0;
>> +    }
>> +
>> +    if ( nr_sockets > 1 )
>> +    {
>> +        printk(XENLOG_INFO "TSC: Not invariant across sockets\n");
>> +        ret = 0;
>> +    }
>> +
>> +    if ( !ret )
>> +        printk(XENLOG_INFO "TSC: Not setting it as clocksource\n");
> 
> I think this last message is redundant with the former two. But since
> I also think that info level is too low for those earlier ones, perhaps
> keeping the latter (at info or even debug level) would be okay, once
> the other got bumped to warning level.
Makes sense and one can infer that message from the lack of "Switched to ..."
Let me change the first two into warning level and the last one as debug level
as you're suggesting.

> 
>> +static struct platform_timesource __initdata plt_tsc =
>> +{
>> +    .id = "tsc",
>> +    .name = "TSC",
>> +    .read_counter = read_tsc,
>> +    .counter_bits = 63,
> 
> Please add a brief comment explaining why this is not 64.
OK.

> 
>> @@ -604,7 +667,9 @@ static u64 __init init_platform_timer(void)
>>      unsigned int i;
>>      s64 rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         strcmp(opt_clocksource, "tsc") )
> 
> No real need to split this if() across two lines.
OK.

> 
>> +static void __init try_platform_timer_tail(void)
>> +{
>> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>> +    plt_overflow(NULL);
>> +
>> +    platform_timer_stamp = plt_stamp64;
>> +    stime_platform_stamp = NOW();
>> +
>> +    if ( !clocksource_is_tsc() )
>> +        init_percpu_time();
> 
> This isn't really dependent on whether TSC is used as clocksource,
> but solely on the point in time at which the call gets made, is it? If
> so, I think an explicit boolean function parameter (named e.g. "late")
> would be better than abusing the predicate here.

Correct, I will introduce this boolean parameter. Not that is critical but
probably add an likely(...) there too, since the late case only happens for
clocksource=tsc ?

> 
>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>          }
>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>> +                  (try_platform_timer(&plt_tsc) > 0) )
>> +        {
>> +            /*
>> +             * Platform timer has changed and CPU time will only be updated
>> +             * after we set again the calibration timer, which means we 
>> need to
>> +             * seed again each local CPU time. At this stage TSC is known 
>> to be
>> +             * reliable i.e. monotonically increasing across all CPUs so 
>> this
>> +             * lets us remove the skew between platform timer and TSC, since
>> +             * these are now effectively the same.
>> +             */
>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>> +
>> +            /* Finish platform timer switch. */
>> +            try_platform_timer_tail();
>> +
>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>> +                   freq_string(plt_src.frequency));
> 
> This message should have the same log level as the one at the end
> of init_platform_timer().
Agreed, but at the end of init_platform_timer there is a plain printk with an
omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
or, instead add XENLOG_INFO to one printk at the end of init_platform_timer() ?

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.