[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall



On 20/11/15 09:15, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>> Sent: 19 November 2015 17:09
>> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx
>> Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org); 
>> Jan
>> Beulich
>> Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall
>>
>> On 19/11/15 16:57, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>>>> Sent: 19 November 2015 16:07
>>>> To: Paul Durrant; xen-devel@xxxxxxxxxxxxxxxxxxxx
>>>> Cc: Ian Jackson; Stefano Stabellini; Ian Campbell; Wei Liu; Keir (Xen.org);
>> Jan
>>>> Beulich
>>>> Subject: Re: [PATCH v3] x86/hvm/viridian: flush remote tlbs by hypercall
>>>>
>>>> On 19/11/15 13:19, Paul Durrant wrote:
>>>>> @@ -561,10 +584,81 @@ int viridian_hypercall(struct cpu_user_regs
>> *regs)
>>>>>      switch ( input.call_code )
>>>>>      {
>>>>>      case HvNotifyLongSpinWait:
>>>>> +        /*
>>>>> +         * See Microsoft Hypervisor Top Level Spec. section 18.5.1.
>>>>> +         */
>>>>>          perfc_incr(mshv_call_long_wait);
>>>>>          do_sched_op(SCHEDOP_yield, guest_handle_from_ptr(NULL,
>> void));
>>>>>          status = HV_STATUS_SUCCESS;
>>>>>          break;
>>>>> +
>>>>> +    case HvFlushVirtualAddressSpace:
>>>>> +    case HvFlushVirtualAddressList:
>>>>> +    {
>>>>> +        cpumask_t *pcpu_mask;
>>>>> +        struct vcpu *v;
>>>>> +        struct {
>>>>> +            uint64_t address_space;
>>>>> +            uint64_t flags;
>>>>> +            uint64_t vcpu_mask;
>>>>> +        } input_params;
>>>>> +
>>>>> +        /*
>>>>> +         * See Microsoft Hypervisor Top Level Spec. sections 12.4.2
>>>>> +         * and 12.4.3.
>>>>> +         */
>>>>> +        perfc_incr(mshv_flush);
>>>>> +
>>>>> +        /* These hypercalls should never use the fast-call convention. */
>>>>> +        status = HV_STATUS_INVALID_PARAMETER;
>>>>> +        if ( input.fast )
>>>>> +            break;
>>>>> +
>>>>> +        /* Get input parameters. */
>>>>> +        if ( hvm_copy_from_guest_phys(&input_params,
>> input_params_gpa,
>>>>> +                                      sizeof(input_params)) != 
>>>>> HVMCOPY_okay )
>>>>> +            break;
>>>>> +
>>>>> +        /*
>>>>> +         * It is not clear from the spec. if we are supposed to
>>>>> +         * include current virtual CPU in the set or not in this case,
>>>>> +         * so err on the safe side.
>>>>> +         */
>>>>> +        if ( input_params.flags & HV_FLUSH_ALL_PROCESSORS )
>>>>> +            input_params.vcpu_mask = ~0ul;
>>>>> +
>>>>> +        pcpu_mask = curr->arch.hvm_vcpu.viridian.flush_cpumask;
>>>>> +        cpumask_clear(pcpu_mask);
>>>>> +
>>>>> +        /*
>>>>> +         * For each specified virtual CPU flush all ASIDs to invalidate
>>>>> +         * TLB entries the next time it is scheduled and then, if it
>>>>> +         * is currently running, add its physical CPU to a mask of
>>>>> +         * those which need to be interrupted to force a flush.
>>>>> +         */
>>>>> +        for_each_vcpu ( currd, v )
>>>>> +        {
>>>>> +            if ( !(input_params.vcpu_mask & (1ul << v->vcpu_id)) )
>>>>> +                continue;
>>>> You need to cap this loop at a vcpu_id of 63, or the above conditional
>>>> will become undefined.
>>> The guest should not issue the hypercall if it has more than 64 vCPUs so to
>> some extend I don't care what happens, as long as it is not harmful to the
>> hypervisor in general, and I don't think that it is in this case.
>>
>> The compiler is free to do anything it wishes when encountering
>> undefined behaviour, including crash the hypervisor.
>>
> I don't believe so. See section 6.5.7 of the C99 spec. (top pf page 85):
>
> "The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are 
> filled with
> zeros. If E1 has an unsigned type, the value of the result is E1 x 2^E2, 
> reduced modulo
> one more than the maximum value representable in the result type. If E1 has a 
> signed
> type and nonnegative value, and E1 x 2^E2 is representable in the result 
> type, then that is
> the resulting value; otherwise, the behavior is undefined"
>
> So, the undefined behaviour you refer to is only in the case that E1 is 
> signed and in this case it is most definitely unsigned, so the modulo rule 
> applies.

Ok.

>
>> Any undefined-behaviour which can be triggered by a guest action
>> warrants an XSA, because there is no telling what might happen.
>>
>>>> It might also be wise to fail the vcpu_initialise() for a
>>>> viridian-enabled domain having more than 64 vcpus.
>>>>
>>>>> +
>>>>> +            hvm_asid_flush_vcpu(v);
>>>>> +            if ( v->is_running )
>>>>> +                cpumask_set_cpu(v->processor, pcpu_mask);
>>>> __cpumask_set_cpu().  No need for atomic operations here.
>>>>
>>> Ok.
>>>
>>>>> +        }
>>>>> +
>>>>> +        /*
>>>>> +         * Since ASIDs have now been flushed it just remains to
>>>>> +         * force any CPUs currently running target vCPUs out of non-
>>>>> +         * root mode. It's possible that re-scheduling has taken place
>>>>> +         * so we may unnecessarily IPI some CPUs.
>>>>> +         */
>>>>> +        if ( !cpumask_empty(pcpu_mask) )
>>>>> +            flush_tlb_mask(pcpu_mask);
>>>> Wouldn't it be easier to simply and input_params.vcpu_mask with
>>>> d->vcpu_dirty_mask ?
>>>>
> Actually I realise your original statement makes no sense anyway. There is no 
> such mask as d->vcpu_dirty_mask.There is d->domain_dirty_cpumask which is a 
> mask of *physical* CPUs, but since (as the name implies) input_params. 
> vcpu_mask is a mask of *virtual* CPUs then ANDing the two together would just 
> yield garbage.
>
>>> No, that may yield much too big a mask. All we need here is a mask of
>> where the vcpus are running *now*, not everywhere they've been.
>>
>> The dirty mask is a "currently scheduled on" mask.
> No it's not. The comment in sched.h clearly states that domain_dirty_cpumask 
> is a "Bitmask of CPUs which are holding onto this domain's state" which is, 
> as I said before, essentially everywhere the domains vcpus have been 
> scheduled since the last time state was flushed. Since, in this case, I have 
> already invalidated ASIDs for all targeted virtual CPUs I don't need to IPI 
> that many physical CPUs, I only need the mask of where the virtual CPUs are 
> *currently* running. If one of the them gets descheduled before the IPI then 
> the IPI was unnecessary (but there is no low-cost way of determining or 
> preventing that).

In the case of a FlushAll, the domain dirty mask would be fine, as you
don't care about a subset of vcpus.  Having said that, it is likely not
worth doing unless we can optimise away the loop entirely.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.