On 09/21/09 17:11, Dan Magenheimer wrote:
>>> Yes, I neglected an important pre-condition. ASSUME the first
>>> rdtscp on pcpu-A gets a version mismatch so that it must fetch
>>> the parameters again. Then: the vcpu switches pcpu TWICE
>>> from pcpu-A to pcpu-B and back to pcpu-A and does rdtscp
>>> each time on pcpu-A but reads one or more pvclock parameters
>>> (that are too big to be encoded in TSC_AUX) on pcpu-B.
>>>
>>> I agree that this is vanishingly low probability but on
>>> a pcpu-oversubscribed machine I think it only takes one
>>> vcpu-to-pcpu reschedule and then a poorly timed interrupt that
>>> causes the vcpu to be unscheduled, and then later rescheduled
>>> on the original processor.
>>>
>>>
>> Sure. It just has to keep iterating until it gets consistency. If it
>> iterates too long (10 times? 100? 1000?) it should give up and assume
>> something is inherently broken.
>>
> No, I'm not talking about iteration. In the scenario I'm
> trying to describe, the version number hasn't changed on
> pcpu-A so the algorithm doesn't iterate.
>
Well, not "change" so much as "not updated". If the program keeps doing
a rdtsc which shows that its local copy of the parameters is out of
date, but its attempts to get up-to-date parameters keeps failing
(because it keeps migrating cpus), then it will keep iterating without
converging. Specifically, the algorithm would be:
u64 tsc, time_ns;
u32 aux;
unsigned int version, cpu;
again:
rdtscp(&tsc, &aux);
cpu = aux >> 24; /* physical cpu */
version = aux & ((1 << 24) - 1);
/* At this point tsc and cpu+version are all fetched
atomically and consistent, so context switch doesn't
matter here; apply_fixup is not dependent on currently
executing cpu. */
/* note that this prob. needs some local synchronization if
the usermode program is multithreaded... */
if (unlikely(version != pvclockinfo[cpu].version)) {
struct pvclock info;
int curcpu; /* again, physical cpu */
/* Always fetches current cpu parameters,
and tells us which cpu it is for. If we
switched cpus since the rdtscp we won't end
up updating the out-of-date info we detected
but that doesn't matter because... */
curcpu = get_new_pvclock_info(&info);
pvclockinfo[curcpu] = info;
/* ...we repeat assuming that we're almost certainly
still on the same cpu when we do rdtscp again */
goto again;
}
time_ns = apply_fixup(tsc, &pvclockinfo[cpu]);
get_new_pvclock_info() can either be a syscall, hypercall or some other
mechanism which
can get a good atomic snapshot of the params along with cpu number from
a shared memory region.
> I realized after I sent this that I'm not really sure
> I understand the pvclock implementation, particularly
> under what circumstances the version number changes
> or doesn't. And if this is different in any way
> than the versions you are proposing that the app
> would see. So I'm not positive we are considering
> the same cases.
>
The pvclock algorithm only changes the version if the either the tsc
offset or scale have changed. In the standard pvclock algorithm, a vcpu
sees its own pvclock version change if
either the pcpu undergoes some change which affects the tsc, *or* if the
vcpu gets scheduled on a new pcpu (which could have different
offset/scale).
In the case we're talking about above, the code isn't pinned to a
particular pcpu or vcpu (as it is usermode code with no real control
over the kernel or xen schedulers), so it has to cope with preempt at
any point. That's simplified by having the tsc and metadata fetch
atomic, so it can revalidate its parameters every time it fetches the
tsc. In that case, Xen need only update its internal version numbers
when there's an actual change to the tsc's offset/scale without regard
to vcpu scheduling. (And of course if the offset/scale end up being
constant, then it will never need to update the offset, and usermode
will only ever end up fetching it once per cpu.)
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|