Xen project Mailing List

Re: [Xen-devel] [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall

To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Fri, 20 Nov 2015 09:15:24 +0000

Accept-language: en-GB, en-US

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, "Keir \(Xen.org\)" <keir@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>

Delivery-date: Fri, 20 Nov 2015 09:15:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRIs1NQjquU9pPG0GLLbXkgRzzw56jcjUAgAAdMwD///QeAIABGt+A

Thread-topic: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall

> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: 19 November 2015 17:09 > To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); > Jan > Beulich > Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall > > On 19/11/15 16:57, Paul Durrant wrote: > >> -----Original Message----- > >> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > >> Sent: 19 November 2015 16:07 > >> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx > >> Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); > Jan > >> Beulich > >> Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall > >> > >> On 19/11/15 13:19, Paul Durrant wrote: > >>> @@ -561,10 +584,81 @@ int viridian_hypercall(struct cpu_user_regs > *regs) > >>> switch ( input.call_code ) > >>> { > >>> case HvNotifyLongSpinWait: > >>> + /* > >>> + * See Microsoft Hypervisor Top Level Spec. section 18.5.1. > >>> + */ > >>> perfc_incr(mshv_call_long_wait); > >>> do_sched_op(SCHEDOP_yield, guest_handle_from_ptr(NULL, > void)); > >>> status = HV_STATUS_SUCCESS; > >>> break; > >>> + > >>> + case HvFlushVirtualAddressSpace: > >>> + case HvFlushVirtualAddressList: > >>> + { > >>> + cpumask_t *pcpu_mask; > >>> + struct vcpu *v; > >>> + struct { > >>> + uint64_t address_space; > >>> + uint64_t flags; > >>> + uint64_t vcpu_mask; > >>> + } input_params; > >>> + > >>> + /* > >>> + * See Microsoft Hypervisor Top Level Spec. sections 12.4.2 > >>> + * and 12.4.3. > >>> + */ > >>> + perfc_incr(mshv_flush); > >>> + > >>> + /* These hypercalls should never use the fast-call convention. */ > >>> + status = HV_STATUS_INVALID_PARAMETER; > >>> + if ( input.fast ) > >>> + break; > >>> + > >>> + /* Get input parameters. */ > >>> + if ( hvm_copy_from_guest_phys(&input_params, > input_params_gpa, > >>> + sizeof(input_params)) != > >>> HVMCOPY_okay ) > >>> + break; > >>> + > >>> + /* > >>> + * It is not clear from the spec. if we are supposed to > >>> + * include current virtual CPU in the set or not in this case, > >>> + * so err on the safe side. > >>> + */ > >>> + if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS ) > >>> + input_params.vcpu_mask = ~0ul; > >>> + > >>> + pcpu_mask = curr->arch.hvm_vcpu.viridian.flush_cpumask; > >>> + cpumask_clear(pcpu_mask); > >>> + > >>> + /* > >>> + * For each specified virtual CPU flush all ASIDs to invalidate > >>> + * TLB entries the next time it is scheduled and then, if it > >>> + * is currently running, add its physical CPU to a mask of > >>> + * those which need to be interrupted to force a flush. > >>> + */ > >>> + for_each_vcpu ( currd, v ) > >>> + { > >>> + if ( !(input_params.vcpu_mask & (1ul << v->vcpu_id)) ) > >>> + continue; > >> You need to cap this loop at a vcpu_id of 63, or the above conditional > >> will become undefined. > > The guest should not issue the hypercall if it has more than 64 vCPUs so to > some extend I don't care what happens, as long as it is not harmful to the > hypervisor in general, and I don't think that it is in this case. > > The compiler is free to do anything it wishes when encountering > undefined behaviour, including crash the hypervisor. > I don't believe so. See section 6.5.7 of the C99 spec. (top pf page 85): "The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined" So, the undefined behaviour you refer to is only in the case that E1 is signed and in this case it is most definitely unsigned, so the modulo rule applies. > Any undefined-behaviour which can be triggered by a guest action > warrants an XSA, because there is no telling what might happen. > > > > >> It might also be wise to fail the vcpu_initialise() for a > >> viridian-enabled domain having more than 64 vcpus. > >> > >>> + > >>> + hvm_asid_flush_vcpu(v); > >>> + if ( v->is_running ) > >>> + cpumask_set_cpu(v->processor, pcpu_mask); > >> __cpumask_set_cpu(). No need for atomic operations here. > >> > > Ok. > > > >>> + } > >>> + > >>> + /* > >>> + * Since ASIDs have now been flushed it just remains to > >>> + * force any CPUs currently running target vCPUs out of non- > >>> + * root mode. It's possible that re-scheduling has taken place > >>> + * so we may unnecessarily IPI some CPUs. > >>> + */ > >>> + if ( !cpumask_empty(pcpu_mask) ) > >>> + flush_tlb_mask(pcpu_mask); > >> Wouldn't it be easier to simply and input_params.vcpu_mask with > >> d->vcpu_dirty_mask ? > >> Actually I realise your original statement makes no sense anyway. There is no such mask as d->vcpu_dirty_mask.There is d->domain_dirty_cpumask which is a mask of *physical* CPUs, but since (as the name implies) input_params. vcpu_mask is a mask of *virtual* CPUs then ANDing the two together would just yield garbage. > > No, that may yield much too big a mask. All we need here is a mask of > where the vcpus are running *now*, not everywhere they've been. > > The dirty mask is a "currently scheduled on" mask. No it's not. The comment in sched.h clearly states that domain_dirty_cpumask is a "Bitmask of CPUs which are holding onto this domain's state" which is, as I said before, essentially everywhere the domains vcpus have been scheduled since the last time state was flushed. Since, in this case, I have already invalidated ASIDs for all targeted virtual CPUs I don't need to IPI that many physical CPUs, I only need the mask of where the virtual CPUs are *currently* running. If one of the them gets descheduled before the IPI then the IPI was unnecessary (but there is no low-cost way of determining or preventing that). Paul > > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.