[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/5] x86/time: implement tsc as clocksource



On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>> On 19.09.16 at 19:54, <joao.m.martins@xxxxxxxxxx> wrote:
>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>> On 19.09.16 at 18:11, <joao.m.martins@xxxxxxxxxx> wrote:
>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@xxxxxxxxxx> wrote:
>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>> calibration fires).
>>>>>
>>>>> And within this one second, which may cover some of Dom0's
>>>>> booting up, it is okay to have inconsistencies?
>>>> It's not okay which is why I am removing this possibility when switching 
>>>> to TSC.
>>>> The inconsistencies in those readings (if I wasn't adjusting) would be 
>>>> because
>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>> previous calibration or platform time initialization (while still was HPET,
>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and 
>>>> instead
>>>> change it to "remove the possibility" in this last sentence?
>>>
>>> Let's not do the 2nd step before the 1st, which is the question of
>>> what happens prior to and what actually changes at this first
>>> calibration (after 1 sec).
>> The first calibration won't change much - this 1-sec was meant when having
>> nop_rendezvous which is the first time platform timer would be used to set 
>> local
>> cpu_time (will adjust the mention above as it's misleading for the reader as 
>> it
>> doesn't refer to this patch).
> 
> So what makes it that it actually _is_ nop_rendezvous after that one
> second? (And yes, part of this may indeed be just bad placement of
> the description, as iirc nop_rendezvous gets introduced only in a later
> patch.)
Because with nop_rendezvous we will be using the platform timer to get a
monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
solves both ends of the problem, with nop_rendezvous until it is first
calibration fixes it, and without nop_rendezvous to remove the latch adjustment
from initial platform timer.

>> Though reseeding the cpu times to boot_tsc_stamp
>> prior to calibration (*without* the time latch values from previous platform
>> timer) we can ensure NOW/get_s_time or calls to update_vcpu_system_time() 
>> will
>> see monotonically increasing values. Otherwise keeping the previous ones,
>> calibration would just add up local TSC delta and any existing divergence
>> wouldn't be solved. On a later patch when we set the stable bit (with nop
>> rendezvous), if these cpu times weren't adjusted guests/xen would still see
>> small divergence between CPUs until the first calibration was fired (after we
>> switched to TSC).
> 
> Right. But my concern is regarding the time window _between_
> switching to TSC and running calibration the first time afterwards.
> Part of this indeed gets taken care of by the re-seeding (which,
> other than one could imply from what you say above, doesn't get
> done [immediately] prior to calibration, but upon switching to TSC).
Exactly.

> But is re-seeding without switching to nop_rendezvous really
> sufficient (or in other words, can the patches really be broken up
> like this)?
I think it is sufficient, unless I am missing out something. In my opinion what
requires it to be correct is that the tuple that is getting (re-)written to cpu
time has a matching stamp + stime and corresponds to the same time reference as
the next written after. Unless your doubts might originate from how
std_rendezvous currently updates the time infos but even there with your series
we fortunately add up to cpu_time with a monotonic delta (with a matching stamp
and stime tuple) and thus having the monotonic property even without
nop_rendezvous. I recall doing testing/validation of this patch alone and didn't
observe any divergences, but I must say that the most significant portion of
testing was with the whole series.

>>>>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>>>>          }
>>>>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>>>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>>>>> +        {
>>>>>> +            /*
>>>>>> +             * Platform timer has changed and CPU time will only be 
>>>>>> updated
>>>>>> +             * after we set again the calibration timer, which means we 
>>>>>> need to
>>>>>> +             * seed again each local CPU time. At this stage TSC is 
>>>>>> known to be
>>>>>> +             * reliable i.e. monotonically increasing across all CPUs 
>>>>>> so this
>>>>>> +             * lets us remove the skew between platform timer and TSC, 
>>>>>> since
>>>>>> +             * these are now effectively the same.
>>>>>> +             */
>>>>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 
>>>>>> 1);
>>>>>> +
>>>>>> +            /* Finish platform timer switch. */
>>>>>> +            try_platform_timer_tail();
>>>>>> +
>>>>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>>>>> +                   freq_string(plt_src.frequency));
>>>>>
>>>>> This message should have the same log level as the one at the end
>>>>> of init_platform_timer().
>>>> Agreed, but at the end of init_platform_timer there is a plain printk with 
>>>> an
>>>> omitted log level. Or do you mean to remove XENLOG_INFO from this printk 
>>>> above
>>>> or, instead add XENLOG_INFO to one printk at the end of 
>>>> init_platform_timer() ?
>>>
>>> Well, info would again be too low a level for my taste. Hence either
>>> remove the level here (slightly preferred from my pov), or make both
>>> warning.
>> As your preference goes towards without the log level, I will re-introduce 
>> back
>> without it. Although I would find clearer to use printk with a log level as 
>> it
>> was advised in earlier reviews.
>>
>> NB: My suggestion of info as level is because my usual line of thought is to 
>> see
>> warning as something potentially erroneous that user should be warned about, 
>> and
>> error as being an actual error.
> 
> I understand and mostly agree. There are a few cases though (and
> we're dealing with one here imo) where the absence of a log level is
> better: We want these messages present in the log by default (which
> they wouldn't be if they were info), but they're also not really
> warnings. Perhaps simply for documentation purposes we could add
> XENLOG_DEFAULT (expanding to an empty string), but that's not
> something to be done in this patch.
I see, thanks for the clarification.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.