[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code



On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
>> The paravirtualized clock used in KVM and Xen uses a version field to
>> allow the guest to see when the shared data structure is inconsistent.
>> The code reads the version field twice (before and after the data
>> structure is copied) and checks whether they haven't
>> changed and that the lowest bit is not set. As the second access is not
>> synchronized, the compiler could generate code that accesses version
>> more than two times and you end up with inconsistent data.
> 
> Could you paste in the code that the 'bad' compiler generates
> vs the compiler that generate 'good' code please?

At least 4.8 and probably older compilers compile this as intended. The
point is that the standard does not guarantee the indented behavior,
i.e. the code is wrong.

I can refer to this lwn article:
https://lwn.net/Articles/508991/

The whole point of ACCESS_ONCE is to avoid time bombs like that. There
are lots of place where ACCESS_ONCE is used in the kernel:

http://lxr.free-electrons.com/ident?i=ACCESS_ONCE

See for example the check_zero function here:
http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559

Julian

> 
>>
>> An example using pvclock_get_time_values:
>>
>> host starts updating data, sets src->version to 1
>> guest reads src->version (1) and stores it into dst->version.
>> guest copies inconsistent data
>> guest reads src->version (1) and computes xor with dst->version.
>> host finishes updating data and sets src->version to 2
>> guest reads src->version (2) and checks whether lower bit is not set.
>> while loop exits with inconsistent data!
>>
>> AFAICS the compiler is allowed to optimize the given code this way.
>>
>> Signed-off-by: Julian Stecklina <jsteckli@xxxxxxxxxxxxxxxxxxxx>
>> ---
>>  arch/x86/kernel/pvclock.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
>> index 42eb330..f62b41c 100644
>> --- a/arch/x86/kernel/pvclock.c
>> +++ b/arch/x86/kernel/pvclock.c
>> @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
>> pvclock_shadow_time *shadow)
>>  static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
>>                                      struct pvclock_vcpu_time_info *src)
>>  {
>> +    u32 nversion;
>> +
>>      do {
>>              dst->version = src->version;
>>              rmb();          /* fetch version before data */
>> @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
>> pvclock_shadow_time *dst,
>>              dst->tsc_shift         = src->tsc_shift;
>>              dst->flags             = src->flags;
>>              rmb();          /* test version after fetching data */
>> -    } while ((src->version & 1) || (dst->version != src->version));
>> +            nversion = ACCESS_ONCE(src->version);
>> +    } while ((nversion & 1) || (dst->version != nversion));
>>  
>>      return dst->version;
>>  }
>> @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
>> *wall_clock,
>>                          struct pvclock_vcpu_time_info *vcpu_time,
>>                          struct timespec *ts)
>>  {
>> -    u32 version;
>> +    u32 version, nversion;
>>      u64 delta;
>>      struct timespec now;
>>  
>> @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
>> *wall_clock,
>>              now.tv_sec  = wall_clock->sec;
>>              now.tv_nsec = wall_clock->nsec;
>>              rmb();          /* fetch time before checking version */
>> -    } while ((wall_clock->version & 1) || (version != wall_clock->version));
>> +            nversion = ACCESS_ONCE(wall_clock->version);
>> +    } while ((nversion & 1) || (version != nversion));
>>  
>>      delta = pvclock_clocksource_read(vcpu_time);    /* time since system 
>> boot */
>>      delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
>> -- 
>> 1.8.4.2
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.