[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 10/11] viridian: add implementation of synthetic timers



>>> On 11.03.19 at 14:41, <paul.durrant@xxxxxxxxxx> wrote:
> +bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx,
> +                                      unsigned int index,
> +                                      uint64_t expiration,
> +                                      uint64_t delivery)
> +{
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> +    const union viridian_sint_msr *vs = &vv->sint[sintx];
> +    HV_MESSAGE *msg = vv->simp.ptr;
> +    struct {
> +        uint32_t TimerIndex;
> +        uint32_t Reserved;
> +        uint64_t ExpirationTime;
> +        uint64_t DeliveryTime;
> +    } payload = {
> +        .TimerIndex = index,
> +        .ExpirationTime = expiration,
> +        .DeliveryTime = delivery,
> +    };
> +
> +    if ( test_bit(sintx, &vv->msg_pending) )
> +        return false;
> +
> +    BUILD_BUG_ON(sizeof(*msg) != HV_MESSAGE_SIZE);
> +    msg += sintx;
> +
> +    /*
> +     * To avoid using an atomic test-and-set, and barrier before calling
> +     * vlapic_set_irq(), this function must be called in context of the
> +     * vcpu receiving the message.
> +     */
> +    ASSERT(v == current);
> +    if ( msg->Header.MessageType != HvMessageTypeNone )
> +    {
> +        msg->Header.MessageFlags.MessagePending = 1;
> +        __set_bit(sintx, &vv->msg_pending);
> +        return false;
> +    }
> +
> +    msg->Header.MessageType = HvMessageTimerExpired;
> +    msg->Header.MessageFlags.MessagePending = 0;
> +    msg->Header.PayloadSize = sizeof(payload);
> +    memcpy(msg->Payload, &payload, sizeof(payload));

Since you can't use plain assignment here, how about a
BUILD_BUG_ON(sizeof(payload) <= sizeof(msg->payload))?

> +static uint64_t time_now(struct domain *d)
> +{
> +    const struct viridian_page *rt = &d->arch.hvm.viridian->reference_tsc;
> +    HV_REFERENCE_TSC_PAGE *p = rt->ptr;
> +    uint32_t start, end;
> +    uint64_t tsc;
> +    uint64_t scale;
> +    uint64_t offset;

Would mind grouping them all on one line, just like you do for start
and end?

> +    /*
> +     * If the reference TSC page is not enabled, or has been invalidated
> +     * fall back to the partition reference counter.
> +     */
> +    if ( !p || !p->TscSequence )
> +        return time_ref_count(d);
> +
> +    /*
> +     * The following sampling algorithm for tsc, scale and offset is
> +     * documented in the specifiction.
> +     */
> +    do {
> +        start = p->TscSequence;
> +        smp_rmb();
> +
> +        tsc = hvm_get_guest_tsc(pt_global_vcpu_target(d));
> +        scale = p->TscScale;
> +        offset = p->TscOffset;
> +
> +        smp_rmb();
> +        end = p->TscSequence;
> +    } while (end != start);

Blanks immediately inside the parentheses please.

As to safety of this, I have two concerns:

1) TscSequence gets updated as a result of a guest action (an MSR
write). This makes it non-obvious that the loop above will get
exited in due course.

2) The way update_reference_tsc() deals with the two "invalid"
values suggests ~0 and 0 should be special cased in general. I
_think_ this is not necessary here, but it also seems to me as if
the VM ever having a way to observe either of those two values
would be wrong too. Shouldn't the function avoid to ever store
~0 into that field, i.e. increment into a local variable, update
that local variable to skip the two "invalid" values, and only then
store into the field?

Otoh, making it into that function being a result of an MSR write,
it may welll be that the spec precludes the guest from reading
the reference page while an update was invoked from one of its
vCPU-s. If this was the case, then I also wouldn't have to
wonder any longer how this entire mechanism can be race free
in the first place (without a double increment like we do in the
pv-clock protocol).

> +static void start_stimer(struct viridian_stimer *vs)
> +{
> +    const struct vcpu *v = vs->v;
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> +    unsigned int stimerx = vs - &vv->stimer[0];
> +    int64_t now = time_now(v->domain);
> +    s_time_t timeout;
> +
> +    if ( !test_and_set_bit(stimerx, &vv->stimer_enabled) )
> +        printk(XENLOG_G_INFO "%pv: VIRIDIAN STIMER%u: enabled\n", v,
> +               stimerx);
> +
> +    if ( vs->config.fields.periodic )
> +    {
> +        unsigned int missed = 0;
> +        int64_t next;
> +
> +        /*
> +         * The specification says that if the timer is lazy then we
> +         * skip over any missed expirations so we can treat this case
> +         * as the same as if the timer is currently stopped, i.e. we
> +         * just schedule expiration to be 'count' ticks from now.
> +         */
> +        if ( !vs->started || vs->config.fields.lazy )
> +            next = now + vs->count;
> +        else
> +        {
> +            /*
> +             * The timer is already started, so we're re-scheduling.
> +             * Hence advance the timer expiration by one tick.
> +             */
> +            next = vs->expiration + vs->count;
> +
> +            /* Now check to see if any expirations have been missed */
> +            if ( now - next > 0 )
> +                missed = (now - next) / vs->count;
> +
> +            /*
> +             * The specification says that if the timer is not lazy then
> +             * a non-zero missed count should be used to reduce the period
> +             * of the timer until it catches up, unless the count has
> +             * reached a 'significant number', in which case the timer
> +             * should be treated as lazy. Unfortunately the specification
> +             * does not state what that number is so the choice of number
> +             * here is a pure guess.
> +             */
> +            if ( missed > 3 )
> +            {
> +                missed = 0;
> +                next = now + vs->count;
> +            }
> +        }
> +
> +        vs->expiration = next;
> +        timeout = ((next - now) * 100ull) / (missed + 1);

The logic above guarantees next > now only if missed > 3. Yet
with next < now, the multiplication by 100ull will produce a huge
64-bit value, and division by a value larger than 1 will make it a
huge 62- or 63-bit value (sign bit lost). That's hardly the timeout
you mean.

> +    }
> +    else
> +    {
> +        vs->expiration = vs->count;
> +        if ( vs->count - now <= 0 )

Unless you really mean != 0 you have an issue here, due to
vs->count being uint64_t. Perhaps, taking also ...

> +        {
> +            set_bit(stimerx, &vv->stimer_pending);
> +            return;
> +        }
> +
> +        timeout = (vs->expiration - now) * 100ull;

... this, you want to calculate the difference into a local
variable of type int64_t instead, and use it in both places?

> +static void poll_stimer(struct vcpu *v, unsigned int stimerx)
> +{
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> +    struct viridian_stimer *vs = &vv->stimer[stimerx];
> +
> +    if ( !test_bit(stimerx, &vv->stimer_pending) )
> +        return;
> +
> +    if ( !viridian_synic_deliver_timer_msg(v, vs->config.fields.sintx,
> +                                           stimerx, vs->expiration,
> +                                           time_now(v->domain)) )
> +        return;
> +
> +    clear_bit(stimerx, &vv->stimer_pending);

While perhaps benign, wouldn't it be better to clear the pending bit
before delivering the message?

> +void viridian_time_vcpu_freeze(struct vcpu *v)
> +{
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;
> +    unsigned int i;
> +
> +    if ( !is_viridian_vcpu(v) )
> +        return;

Don't you also want/need to check the HVMPV_stimer bit here (and
the also in the thaw counterpart)?

> @@ -149,6 +398,63 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t idx, 
> uint64_t val)
>          }
>          break;
>  
> +    case HV_X64_MSR_TIME_REF_COUNT:
> +        return X86EMUL_EXCEPTION;
> +
> +    case HV_X64_MSR_STIMER0_CONFIG:
> +    case HV_X64_MSR_STIMER1_CONFIG:
> +    case HV_X64_MSR_STIMER2_CONFIG:
> +    case HV_X64_MSR_STIMER3_CONFIG:
> +    {
> +        unsigned int stimerx =
> +            array_index_nospec((idx - HV_X64_MSR_STIMER0_CONFIG) / 2,
> +                               ARRAY_SIZE(vv->stimer));
> +        struct viridian_stimer *vs = &vv->stimer[stimerx];

I again think you'd better use array_access_nospec() here (also
for the rdmsr counterparts).

> @@ -160,6 +466,7 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t idx, 
> uint64_t val)
>  
>  int viridian_time_rdmsr(const struct vcpu *v, uint32_t idx, uint64_t *val)
>  {
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;

const?

> @@ -201,6 +508,37 @@ int viridian_time_rdmsr(const struct vcpu *v, uint32_t 
> idx, uint64_t *val)
>          break;
>      }
>  
> +    case HV_X64_MSR_STIMER0_CONFIG:
> +    case HV_X64_MSR_STIMER1_CONFIG:
> +    case HV_X64_MSR_STIMER2_CONFIG:
> +    case HV_X64_MSR_STIMER3_CONFIG:
> +    {
> +        unsigned int stimerx =
> +            array_index_nospec((idx - HV_X64_MSR_STIMER0_CONFIG) / 2,
> +                               ARRAY_SIZE(vv->stimer));;
> +
> +        if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> +            return X86EMUL_EXCEPTION;
> +
> +        *val = vv->stimer[stimerx].config.raw;
> +        break;
> +    }
> +    case HV_X64_MSR_STIMER0_COUNT:

Blank line between case blocks please.

> @@ -231,11 +590,36 @@ void viridian_time_domain_deinit(const struct domain *d)
>  void viridian_time_save_vcpu_ctxt(
>      const struct vcpu *v, struct hvm_viridian_vcpu_context *ctxt)
>  {
> +    struct viridian_vcpu *vv = v->arch.hvm.viridian;

const?

> +    unsigned int i;
> +
> +    BUILD_BUG_ON(ARRAY_SIZE(vv->stimer) !=
> +                 ARRAY_SIZE(ctxt->stimer_config_msr));
> +    BUILD_BUG_ON(ARRAY_SIZE(vv->stimer) !=
> +                 ARRAY_SIZE(ctxt->stimer_count_msr));
> +
> +    for ( i = 0; i < ARRAY_SIZE(vv->stimer); i++ )
> +    {
> +        struct viridian_stimer *vs = &vv->stimer[i];

const (if you really consider the variable useful in the first place)?

> @@ -322,6 +324,15 @@ int guest_wrmsr_viridian(struct vcpu *v, uint32_t idx, 
> uint64_t val)
>      case HV_X64_MSR_TSC_FREQUENCY:
>      case HV_X64_MSR_APIC_FREQUENCY:
>      case HV_X64_MSR_REFERENCE_TSC:
> +    case HV_X64_MSR_TIME_REF_COUNT:
> +    case HV_X64_MSR_STIMER0_CONFIG:
> +    case HV_X64_MSR_STIMER0_COUNT:
> +    case HV_X64_MSR_STIMER1_CONFIG:
> +    case HV_X64_MSR_STIMER1_COUNT:
> +    case HV_X64_MSR_STIMER2_CONFIG:
> +    case HV_X64_MSR_STIMER2_COUNT:
> +    case HV_X64_MSR_STIMER3_CONFIG:
> +    case HV_X64_MSR_STIMER3_COUNT:

For readability / brevity

    case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:

?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.