[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC

> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: Friday, May 28, 2010 1:04 AM
> To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen-
> devel@xxxxxxxxxxxxxxxxxxx); Ian Pratt
> Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC
> On 28/05/2010 07:29, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
> >> It is impossible to meet that level of TSC consistency when doing
> >> physical-add, without emulating all guest TSCs. We may need to add
> that as
> >> an option, at least, to keep a small class of apps that care (like
> Oracle's
> >> DB, we assume) happy.
> >
> > So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?.
> > When CPU_hotadd, we should at least warning if that option is not
> set, am I
> > right?
> Xen-unstable:21469.

Well, although it's better than nothing, it seems pretty
lame to only put an advisory warning in xen's log about a
condition that may possibly affect many guest OS's and
applications with hard to identify symptoms/failures, and
possibly randomly at some point in time that may be
days/weeks/months after the event occurs.  Consider a cloud
service provider for example.

The advantage of turning hot-add-cpu off by default
is that, if it is turned on at boot-time, TSC emulation
can always be enabled for all guests at guest boot
and the condition never arises.

Are there any other questionable conditions that might
arise from hot-adding physical CPUs?  For example (my
favorite), are any order>0 allocations required?  Or
what if the hot-added cpu results in mixed generations
(e.g. a Nehalem is added to an all-Westmere system,
where the apps are using AES instructions)?  Anything

In other words, maybe it would be nice to be able
to rule out other special dynamic checks for hot-add
cpus that aren't done for simultaneously-reset cpus?
Requiring a boot option to allow hot-add physical CPUs
might make a future nasty support problem a lot easier.

> "Undetectable" by Dan's definition means undetectable by
> a multi-threaded app on a multi-vcpu guest. Any detected
> warp would therefore be a problem.

This is actually Linux's definition, a requirement
for selecting tsc as Linux's default clocksource,
and measured by the same algorithm in Xen and Linux.

Linux is a bit more flexible than apps in that, if
Linux detects a problem, it can fallback from using
tsc as the clocksource to some other clocksource.
But it remains to be seen how well this will work
in a virtual environment, where there are a number
of conditions that a bare-metal OS can detect
that a virtualized guest OS (or an app running
on a physical or virtualized OS) cannot.

But to summarize, IMHO, correctness comes first,
performance second, and functionality that might
be needed on only a small fraction of systems
comes third.  I think enterprise customers dependent
on Xen would agree.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.