[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: 19 November 2015 17:09
> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); 
> Jan
> Beulich
> Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall
> 
> On 19/11/15 16:57, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> >> Sent: 19 November 2015 16:07
> >> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx
> >> Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org);
> Jan
> >> Beulich
> >> Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall
> >>
> >> On 19/11/15 13:19, Paul Durrant wrote:
> >>> @@ -561,10 +584,81 @@ int viridian_hypercall(struct cpu_user_regs
> *regs)
> >>>      switch ( input.call_code )
> >>>      {
> >>>      case HvNotifyLongSpinWait:
> >>> +        /*
> >>> +         * See Microsoft Hypervisor Top Level Spec. section 18.5.1.
> >>> +         */
> >>>          perfc_incr(mshv_call_long_wait);
> >>>          do_sched_op(SCHEDOP_yield, guest_handle_from_ptr(NULL,
> void));
> >>>          status = HV_STATUS_SUCCESS;
> >>>          break;
> >>> +
> >>> +    case HvFlushVirtualAddressSpace:
> >>> +    case HvFlushVirtualAddressList:
> >>> +    {
> >>> +        cpumask_t *pcpu_mask;
> >>> +        struct vcpu *v;
> >>> +        struct {
> >>> +            uint64_t address_space;
> >>> +            uint64_t flags;
> >>> +            uint64_t vcpu_mask;
> >>> +        } input_params;
> >>> +
> >>> +        /*
> >>> +         * See Microsoft Hypervisor Top Level Spec. sections 12.4.2
> >>> +         * and 12.4.3.
> >>> +         */
> >>> +        perfc_incr(mshv_flush);
> >>> +
> >>> +        /* These hypercalls should never use the fast-call convention. */
> >>> +        status = HV_STATUS_INVALID_PARAMETER;
> >>> +        if ( input.fast )
> >>> +            break;
> >>> +
> >>> +        /* Get input parameters. */
> >>> +        if ( hvm_copy_from_guest_phys(&input_params,
> input_params_gpa,
> >>> +                                      sizeof(input_params)) != 
> >>> HVMCOPY_okay )
> >>> +            break;
> >>> +
> >>> +        /*
> >>> +         * It is not clear from the spec. if we are supposed to
> >>> +         * include current virtual CPU in the set or not in this case,
> >>> +         * so err on the safe side.
> >>> +         */
> >>> +        if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS )
> >>> +            input_params.vcpu_mask = ~0ul;
> >>> +
> >>> +        pcpu_mask = curr->arch.hvm_vcpu.viridian.flush_cpumask;
> >>> +        cpumask_clear(pcpu_mask);
> >>> +
> >>> +        /*
> >>> +         * For each specified virtual CPU flush all ASIDs to invalidate
> >>> +         * TLB entries the next time it is scheduled and then, if it
> >>> +         * is currently running, add its physical CPU to a mask of
> >>> +         * those which need to be interrupted to force a flush.
> >>> +         */
> >>> +        for_each_vcpu ( currd, v )
> >>> +        {
> >>> +            if ( !(input_params.vcpu_mask & (1ul << v->vcpu_id)) )
> >>> +                continue;
> >> You need to cap this loop at a vcpu_id of 63, or the above conditional
> >> will become undefined.
> > The guest should not issue the hypercall if it has more than 64 vCPUs so to
> some extend I don't care what happens, as long as it is not harmful to the
> hypervisor in general, and I don't think that it is in this case.
> 
> The compiler is free to do anything it wishes when encountering
> undefined behaviour, including crash the hypervisor.
> 

I don't believe so. See section 6.5.7 of the C99 spec. (top pf page 85):

"The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are 
filled with
zeros. If E1 has an unsigned type, the value of the result is E1 x 2^E2, 
reduced modulo
one more than the maximum value representable in the result type. If E1 has a 
signed
type and nonnegative value, and E1 x 2^E2 is representable in the result type, 
then that is
the resulting value; otherwise, the behavior is undefined"

So, the undefined behaviour you refer to is only in the case that E1 is signed 
and in this case it is most definitely unsigned, so the modulo rule applies.

> Any undefined-behaviour which can be triggered by a guest action
> warrants an XSA, because there is no telling what might happen.
> 
> >
> >> It might also be wise to fail the vcpu_initialise() for a
> >> viridian-enabled domain having more than 64 vcpus.
> >>
> >>> +
> >>> +            hvm_asid_flush_vcpu(v);
> >>> +            if ( v->is_running )
> >>> +                cpumask_set_cpu(v->processor, pcpu_mask);
> >> __cpumask_set_cpu().  No need for atomic operations here.
> >>
> > Ok.
> >
> >>> +        }
> >>> +
> >>> +        /*
> >>> +         * Since ASIDs have now been flushed it just remains to
> >>> +         * force any CPUs currently running target vCPUs out of non-
> >>> +         * root mode. It's possible that re-scheduling has taken place
> >>> +         * so we may unnecessarily IPI some CPUs.
> >>> +         */
> >>> +        if ( !cpumask_empty(pcpu_mask) )
> >>> +            flush_tlb_mask(pcpu_mask);
> >> Wouldn't it be easier to simply and input_params.vcpu_mask with
> >> d->vcpu_dirty_mask ?
> >>

Actually I realise your original statement makes no sense anyway. There is no 
such mask as d->vcpu_dirty_mask.There is d->domain_dirty_cpumask which is a 
mask of *physical* CPUs, but since (as the name implies) input_params. 
vcpu_mask is a mask of *virtual* CPUs then ANDing the two together would just 
yield garbage. 

> > No, that may yield much too big a mask. All we need here is a mask of
> where the vcpus are running *now*, not everywhere they've been.
> 
> The dirty mask is a "currently scheduled on" mask.

No it's not. The comment in sched.h clearly states that domain_dirty_cpumask is 
a "Bitmask of CPUs which are holding onto this domain's state" which is, as I 
said before, essentially everywhere the domains vcpus have been scheduled since 
the last time state was flushed. Since, in this case, I have already 
invalidated ASIDs for all targeted virtual CPUs I don't need to IPI that many 
physical CPUs, I only need the mask of where the virtual CPUs are *currently* 
running. If one of the them gets descheduled before the IPI then the IPI was 
unnecessary (but there is no low-cost way of determining or preventing that).

  Paul

> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.