[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: > On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: > >> On Jan 6, 2015 4:01 AM, "Paolo Bonzini" <pbonzini@xxxxxxxxxx> wrote: > >> > > >> > > >> > > >> > On 06/01/2015 09:42, Paolo Bonzini wrote: > >> > > > > Still confused. So we can freeze all vCPUs in the host, then > >> > > > > update > >> > > > > pvti 1, then resume vCPU 1, then update pvti 0? In that case, we > >> > > > > have > >> > > > > a problem, because vCPU 1 can observe pvti 0 mid-update, and KVM > >> > > > > doesn't increment the version pre-update, and we can return > >> > > > > completely > >> > > > > bogus results. > >> > > > Yes. > >> > > But then the getcpu test would fail (1->0). Even if you have an ABA > >> > > situation (1->0->1), it's okay because the pvti that is fetched is the > >> > > one returned by the first getcpu. > >> > > >> > ... this case of partial update of pvti, which is caught by the version > >> > field, if of course different from the other (extremely unlikely) that > >> > Andy pointed out. That is when the getcpus are done on the same vCPU, > >> > but the rdtsc is another. > >> > > >> > That one can be fixed by rdtscp, like > >> > > >> > do { > >> > // get a consistent (pvti, v, tsc) tuple > >> > do { > >> > cpu = get_cpu(); > >> > pvti = get_pvti(cpu); > >> > v = pvti->version & ~1; > >> > // also acts as rmb(); > >> > rdtsc_barrier(); > >> > tsc = rdtscp(&cpu1); > >> > >> Off-topic note: rdtscp doesn't need a barrier at all. AIUI AMD > >> specified it that way and both AMD and Intel implement it correctly. > >> (rdtsc, on the other hand, definitely needs the barrier beforehand.) > >> > >> > // control dependency, no need for rdtsc_barrier? > >> > } while(cpu != cpu1); > >> > > >> > // ... compute nanoseconds from pvti and tsc ... > >> > rmb(); > >> > } while(v != pvti->version); > >> > >> Still no good. We can migrate a bunch of times so we see the same CPU > >> all three times and *still* don't get a consistent read, unless we > >> play nasty games with lots of version checks (I have a patch for that, > >> but I don't like it very much). The patch is here: > >> > >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=a69754dc5ff33f5187162b5338854ad23dd7be8d > >> > >> but I don't like it. > >> > >> Thus far, I've been told unambiguously that a guest can't observe pvti > >> while it's being written, and I think you're now telling me that this > >> isn't true and that a guest *can* observe pvti while it's being > >> written while the low bit of the version field is not set. If so, > >> this is rather strongly incompatible with the spec in the KVM docs. > >> > >> I don't suppose that you and Marcelo could agree on what the actual > >> semantics that KVM provides are and could write it down in a way that > >> people who haven't spent a long time staring at the request code > >> understand? And maybe you could even fix the implementation while > >> you're at it if the implementation is, indeed, broken. I have ugly > >> patches to fix it here: > >> > >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=3b718a050cba52563d831febc2e1ca184c02bac0 > >> > >> but I'm not thrilled with them. > >> > >> --Andy > > > > I suppose that separating the version write from the rest of the pvclock > > structure is sufficient, as that would guarantee the writes are not > > reordered even with fast string REP MOVS. > > > > Thanks for catching this Andy! > > > > Don't you stil need: > > version++; > write the rest; > version++; > > with possible smp_wmb() in there to keep the compiler from messing around? Correct. Could just as well follow the protocol and use odd/even, which is what your patch does. What is the point with the new flags bit though? > Also, if you do this, can you also make setting and clearing > STABLE_BIT properly atomic across all vCPUs? Or at least do something > like setting it last and clearing it first on vPCU 0? If the version "seqlock" works properly across vCPUs, why do you need STABLE_BIT "properly atomic" ? Please define what you mean by "properly atomic". _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |