[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 10/11] viridian: add implementation of synthetic timers



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: 13 March 2019 14:06
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: Julien Grall <julien.grall@xxxxxxx>; Andrew Cooper 
> <Andrew.Cooper3@xxxxxxxxxx>; Roger Pau Monne
> <roger.pau@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; George Dunlap 
> <George.Dunlap@xxxxxxxxxx>; Ian
> Jackson <Ian.Jackson@xxxxxxxxxx>; Stefano Stabellini 
> <sstabellini@xxxxxxxxxx>; xen-devel <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>; Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; 
> Tim (Xen.org)
> <tim@xxxxxxx>
> Subject: Re: [PATCH v5 10/11] viridian: add implementation of synthetic timers
> 
> >>> On 11.03.19 at 14:41, <paul.durrant@xxxxxxxxxx> wrote:
> > +bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx,
> > +                                      unsigned int index,
> > +                                      uint64_t expiration,
> > +                                      uint64_t delivery)
> > +{
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> > +    const union viridian_sint_msr *vs = &vv->sint[sintx];
> > +    HV_MESSAGE *msg = vv->simp.ptr;
> > +    struct {
> > +        uint32_t TimerIndex;
> > +        uint32_t Reserved;
> > +        uint64_t ExpirationTime;
> > +        uint64_t DeliveryTime;
> > +    } payload = {
> > +        .TimerIndex = index,
> > +        .ExpirationTime = expiration,
> > +        .DeliveryTime = delivery,
> > +    };
> > +
> > +    if ( test_bit(sintx, &vv->msg_pending) )
> > +        return false;
> > +
> > +    BUILD_BUG_ON(sizeof(*msg) != HV_MESSAGE_SIZE);
> > +    msg += sintx;
> > +
> > +    /*
> > +     * To avoid using an atomic test-and-set, and barrier before calling
> > +     * vlapic_set_irq(), this function must be called in context of the
> > +     * vcpu receiving the message.
> > +     */
> > +    ASSERT(v == current);
> > +    if ( msg->Header.MessageType != HvMessageTypeNone )
> > +    {
> > +        msg->Header.MessageFlags.MessagePending = 1;
> > +        __set_bit(sintx, &vv->msg_pending);
> > +        return false;
> > +    }
> > +
> > +    msg->Header.MessageType = HvMessageTimerExpired;
> > +    msg->Header.MessageFlags.MessagePending = 0;
> > +    msg->Header.PayloadSize = sizeof(payload);
> > +    memcpy(msg->Payload, &payload, sizeof(payload));
> 
> Since you can't use plain assignment here, how about a
> BUILD_BUG_ON(sizeof(payload) <= sizeof(msg->payload))?

Surely '>' rather than '<='?

> 
> > +static uint64_t time_now(struct domain *d)
> > +{
> > +    const struct viridian_page *rt = &d->arch.hvm.viridian->reference_tsc;
> > +    HV_REFERENCE_TSC_PAGE *p = rt->ptr;
> > +    uint32_t start, end;
> > +    uint64_t tsc;
> > +    uint64_t scale;
> > +    uint64_t offset;
> 
> Would mind grouping them all on one line, just like you do for start
> and end?

Sure.

> 
> > +    /*
> > +     * If the reference TSC page is not enabled, or has been invalidated
> > +     * fall back to the partition reference counter.
> > +     */
> > +    if ( !p || !p->TscSequence )
> > +        return time_ref_count(d);
> > +
> > +    /*
> > +     * The following sampling algorithm for tsc, scale and offset is
> > +     * documented in the specifiction.
> > +     */
> > +    do {
> > +        start = p->TscSequence;
> > +        smp_rmb();
> > +
> > +        tsc = hvm_get_guest_tsc(pt_global_vcpu_target(d));
> > +        scale = p->TscScale;
> > +        offset = p->TscOffset;
> > +
> > +        smp_rmb();
> > +        end = p->TscSequence;
> > +    } while (end != start);
> 
> Blanks immediately inside the parentheses please.
> 

Yes, missed that.

> As to safety of this, I have two concerns:
> 
> 1) TscSequence gets updated as a result of a guest action (an MSR
> write). This makes it non-obvious that the loop above will get
> exited in due course.
> 

True. The domain could try to DoS this call. This could be avoided by doing a 
domain_pause() if we test continuously fails for a number of iterations, or 
maybe just one iteration.

> 2) The way update_reference_tsc() deals with the two "invalid"
> values suggests ~0 and 0 should be special cased in general. I
> _think_ this is not necessary here, but it also seems to me as if
> the VM ever having a way to observe either of those two values
> would be wrong too. Shouldn't the function avoid to ever store
> ~0 into that field, i.e. increment into a local variable, update
> that local variable to skip the two "invalid" values, and only then
> store into the field?
> 
> Otoh, making it into that function being a result of an MSR write,
> it may welll be that the spec precludes the guest from reading
> the reference page while an update was invoked from one of its
> vCPU-s. If this was the case, then I also wouldn't have to
> wonder any longer how this entire mechanism can be race free
> in the first place (without a double increment like we do in the
> pv-clock protocol).

From observation, it looks like Windows initializes the reference tsc page 
before it brings secondary CPUs online and then doesn't touch the MSR again, so 
we should probably only tolerate one mismatch in time_now() before doing 
domain_pause().
I also agree that the 'invalid' values should not be exposed to the guest, even 
though it is highly unlikely to ever observe one in practice.

> 
> > +static void start_stimer(struct viridian_stimer *vs)
> > +{
> > +    const struct vcpu *v = vs->v;
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> > +    unsigned int stimerx = vs - &vv->stimer[0];
> > +    int64_t now = time_now(v->domain);
> > +    s_time_t timeout;
> > +
> > +    if ( !test_and_set_bit(stimerx, &vv->stimer_enabled) )
> > +        printk(XENLOG_G_INFO "%pv: VIRIDIAN STIMER%u: enabled\n", v,
> > +               stimerx);
> > +
> > +    if ( vs->config.fields.periodic )
> > +    {
> > +        unsigned int missed = 0;
> > +        int64_t next;
> > +
> > +        /*
> > +         * The specification says that if the timer is lazy then we
> > +         * skip over any missed expirations so we can treat this case
> > +         * as the same as if the timer is currently stopped, i.e. we
> > +         * just schedule expiration to be 'count' ticks from now.
> > +         */
> > +        if ( !vs->started || vs->config.fields.lazy )
> > +            next = now + vs->count;
> > +        else
> > +        {
> > +            /*
> > +             * The timer is already started, so we're re-scheduling.
> > +             * Hence advance the timer expiration by one tick.
> > +             */
> > +            next = vs->expiration + vs->count;
> > +
> > +            /* Now check to see if any expirations have been missed */
> > +            if ( now - next > 0 )
> > +                missed = (now - next) / vs->count;
> > +
> > +            /*
> > +             * The specification says that if the timer is not lazy then
> > +             * a non-zero missed count should be used to reduce the period
> > +             * of the timer until it catches up, unless the count has
> > +             * reached a 'significant number', in which case the timer
> > +             * should be treated as lazy. Unfortunately the specification
> > +             * does not state what that number is so the choice of number
> > +             * here is a pure guess.
> > +             */
> > +            if ( missed > 3 )
> > +            {
> > +                missed = 0;
> > +                next = now + vs->count;
> > +            }
> > +        }
> > +
> > +        vs->expiration = next;
> > +        timeout = ((next - now) * 100ull) / (missed + 1);
> 
> The logic above guarantees next > now only if missed > 3. Yet
> with next < now, the multiplication by 100ull will produce a huge
> 64-bit value, and division by a value larger than 1 will make it a
> huge 62- or 63-bit value (sign bit lost). That's hardly the timeout
> you mean.

Indeed, I'll need to re-think that.

> 
> > +    }
> > +    else
> > +    {
> > +        vs->expiration = vs->count;
> > +        if ( vs->count - now <= 0 )
> 
> Unless you really mean != 0 you have an issue here, due to
> vs->count being uint64_t. Perhaps, taking also ...
> 
> > +        {
> > +            set_bit(stimerx, &vv->stimer_pending);
> > +            return;
> > +        }
> > +
> > +        timeout = (vs->expiration - now) * 100ull;
> 
> ... this, you want to calculate the difference into a local
> variable of type int64_t instead, and use it in both places?

Yes, this was broken when expiration and now became unsigned.

> 
> > +static void poll_stimer(struct vcpu *v, unsigned int stimerx)
> > +{
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> > +    struct viridian_stimer *vs = &vv->stimer[stimerx];
> > +
> > +    if ( !test_bit(stimerx, &vv->stimer_pending) )
> > +        return;
> > +
> > +    if ( !viridian_synic_deliver_timer_msg(v, vs->config.fields.sintx,
> > +                                           stimerx, vs->expiration,
> > +                                           time_now(v->domain)) )
> > +        return;
> > +
> > +    clear_bit(stimerx, &vv->stimer_pending);
> 
> While perhaps benign, wouldn't it be better to clear the pending bit
> before delivering the message?

No, because I only want to clear it if the delivery is successful.

> 
> > +void viridian_time_vcpu_freeze(struct vcpu *v)
> > +{
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> > +    unsigned int i;
> > +
> > +    if ( !is_viridian_vcpu(v) )
> > +        return;
> 
> Don't you also want/need to check the HVMPV_stimer bit here (and
> the also in the thaw counterpart)?

Yes, I probably do.

> 
> > @@ -149,6 +398,63 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t idx, 
> > uint64_t val)
> >          }
> >          break;
> >
> > +    case HV_X64_MSR_TIME_REF_COUNT:
> > +        return X86EMUL_EXCEPTION;
> > +
> > +    case HV_X64_MSR_STIMER0_CONFIG:
> > +    case HV_X64_MSR_STIMER1_CONFIG:
> > +    case HV_X64_MSR_STIMER2_CONFIG:
> > +    case HV_X64_MSR_STIMER3_CONFIG:
> > +    {
> > +        unsigned int stimerx =
> > +            array_index_nospec((idx - HV_X64_MSR_STIMER0_CONFIG) / 2,
> > +                               ARRAY_SIZE(vv->stimer));
> > +        struct viridian_stimer *vs = &vv->stimer[stimerx];
> 
> I again think you'd better use array_access_nospec() here (also
> for the rdmsr counterparts).

I don't follow. I *am* using array_index_nospec().

> 
> > @@ -160,6 +466,7 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t idx, 
> > uint64_t val)
> >
> >  int viridian_time_rdmsr(const struct vcpu *v, uint32_t idx, uint64_t *val)
> >  {
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> 
> const?
> 

I don't think so. A read of the reference TSC MSR updates a flag.

> > @@ -201,6 +508,37 @@ int viridian_time_rdmsr(const struct vcpu *v, uint32_t
> > idx, uint64_t *val)
> >          break;
> >      }
> >
> > +    case HV_X64_MSR_STIMER0_CONFIG:
> > +    case HV_X64_MSR_STIMER1_CONFIG:
> > +    case HV_X64_MSR_STIMER2_CONFIG:
> > +    case HV_X64_MSR_STIMER3_CONFIG:
> > +    {
> > +        unsigned int stimerx =
> > +            array_index_nospec((idx - HV_X64_MSR_STIMER0_CONFIG) / 2,
> > +                               ARRAY_SIZE(vv->stimer));;
> > +
> > +        if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> > +            return X86EMUL_EXCEPTION;
> > +
> > +        *val = vv->stimer[stimerx].config.raw;
> > +        break;
> > +    }
> > +    case HV_X64_MSR_STIMER0_COUNT:
> 
> Blank line between case blocks please.

Ok.

> 
> > @@ -231,11 +590,36 @@ void viridian_time_domain_deinit(const struct domain 
> > *d)
> >  void viridian_time_save_vcpu_ctxt(
> >      const struct vcpu *v, struct hvm_viridian_vcpu_context *ctxt)
> >  {
> > +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> 
> const?

Possibly.

> 
> > +    unsigned int i;
> > +
> > +    BUILD_BUG_ON(ARRAY_SIZE(vv->stimer) !=
> > +                 ARRAY_SIZE(ctxt->stimer_config_msr));
> > +    BUILD_BUG_ON(ARRAY_SIZE(vv->stimer) !=
> > +                 ARRAY_SIZE(ctxt->stimer_count_msr));
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(vv->stimer); i++ )
> > +    {
> > +        struct viridian_stimer *vs = &vv->stimer[i];
> 
> const (if you really consider the variable useful in the first place)?
> 

Ok.

> > @@ -322,6 +324,15 @@ int guest_wrmsr_viridian(struct vcpu *v, uint32_t idx, 
> > uint64_t val)
> >      case HV_X64_MSR_TSC_FREQUENCY:
> >      case HV_X64_MSR_APIC_FREQUENCY:
> >      case HV_X64_MSR_REFERENCE_TSC:
> > +    case HV_X64_MSR_TIME_REF_COUNT:
> > +    case HV_X64_MSR_STIMER0_CONFIG:
> > +    case HV_X64_MSR_STIMER0_COUNT:
> > +    case HV_X64_MSR_STIMER1_CONFIG:
> > +    case HV_X64_MSR_STIMER1_COUNT:
> > +    case HV_X64_MSR_STIMER2_CONFIG:
> > +    case HV_X64_MSR_STIMER2_COUNT:
> > +    case HV_X64_MSR_STIMER3_CONFIG:
> > +    case HV_X64_MSR_STIMER3_COUNT:
> 
> For readability / brevity
> 
>     case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
> 
> ?

Certainly brevity, but I'm not sure about readability. I'll make the change.

  Paul

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.