[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] x86: refactor psr implementation in hypervisor.



>> >>> On 25.08.16 at 07:22, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
>> > +                         struct psr_socket_alloc_info *info);
>> > +    /*
>> > +     * get_old_set_new is used in set value process to get all features'
>> > +     * COS registers values according to original cos id of the domain.
>> > +     * Then, assemble them into an mask array as feature list order.
>> 
>> This sentence in particular doesn't make any sense to me. What
>> follows below also looks like it is in need of improvement.
>> 
> Do you mean the comments are not accurate?

I simply wasn't able to tell, because of not being able to interpret
the sentence.

> How about below description?
>  
> get_old_set_new will traverse all features in list. It is used to do below two
> things:
> 1. get old_cos register value of all supported features and
> 2. set the new value for appointed feature.
> 
> All the values are set into mask array according the traversal order, meaning
> the same order of feature list members.

Sounds reasonable. I suppose the example that you gave right
next wasn't meant to go into the comment.

>> > +    /*
>> > +     * exceed_range is used to check if the input cos id exceeds the
>> > +     * feature's cos_max and if the input mask value is not the default 
>> > one.
>> > +     * Even if the associated cos exceeds the cos_max, HW can work as 
>> > default
>> > +     * value. That is the reason we need check is mask value is default 
>> > one.
>> > +     * If both criteria are fulfilled, that means the input exceeds the
>> > +     * range.
>> > +     */
>> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
>> > +                                 unsigned int cos);
>> 
>> According to the comment this is kind of a predicate, which seems
>> unlikely to return an unsigned value. In fact without a word on the
>> return value I'd expect such to return bool. And I'd also expect the
>> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
>> for compare above I wonder whether come or all of the parameters
>> should be pointers to const (please go over the entire patch and do
>> constification wherever possible/sensible).
>> 
> Yes, you are right. I will change the function type to bool and add const
> for not changed input pointers.
> 
> This function is used to check if the input cos id exceeds the cos_max. If yes
> and the set value is not default value, we should return error. So, I think
> to change the function name to exceed_cos_max(). How do you think?

Okay, except that I continue to think you mean "exceeds".
"exceed_cos_max" to me is kind of a directive, not a predicate.

>> > +#define MAX_FEAT_INFO_SIZE 8
>> > +#define MAX_COS_REG_NUM  128
>> 
>> Are these numbers arbitrary, or based on something?
>> 
> MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
> consider the extension for future feature.

In that case please use that sizeof() in the expression here.

> MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
> for all PSR features so far.

"So far" makes me still wonder: Is this an architectural limit or one
resulting from current (hardware) implementations. In the former
case I don't think a comment is strictly needed, but in the latter
case the choice should be explained.

>> > +struct psr_socket_alloc_info {
>> 
>> I've yet to see whether the "alloc" in the name really makes sense.

And btw., having seen the entire patch I don't think this alloc_ infix
is warranted both here and in the variable name.

>> > +/* Common functions for supporting feature callback functions. */
>> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
>> > +{
>> > +    if ( NULL == pHead || NULL == pTmp )
>> > +        return;
>> > +
>> > +    while ( pHead->pNext )
>> > +        pHead = pHead->pNext;
>> > +
>> > +    pHead->pNext = pTmp;
>> > +}
>> 
>> Do you really need custom list management here?
>> 
> It seems xen list interfaces require the input list be a double linked list 
> but
> my list is a single linked list. Furthermore, I only need simple add to tail
> function and free function. So I create custom list management functions.

Unless there's a strong need, I'd like you to go with what is there,
or introduce _generic_ singly linked list management.

>> > +static void free_feature(struct psr_socket_alloc_info *info)
>> > +{
>> > +    struct feat_list *pTmp;
>> > +    struct feat_list *pPrev;
>> > +
>> > +    if ( NULL == info )
>> > +        return;
>> > +
>> > +    if ( NULL == info->pFeat )
>> > +        return;
>> > +
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        pPrev = pTmp;
>> > +        pTmp = pTmp->pNext;
>> > +        clear_bit(pPrev->feature, &(info->features));
>> > +        xfree(pPrev);
>> > +        pPrev = NULL;
>> > +    }
>> > +}
>> 
>> Why does info->pFeat not get freed here?
>> 
> The info->pFeat is a redundant list head to facilitate list operations.
> It is only freed when socket info is freed.
> 
> With this list head, the advantage is that we can insert/delete the first
> element same as others without special operations.

But it results in less obvious code (and a basically pointless extra
allocation). One more reason to use generic list management
primitives.

>> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
>> > +                                unsigned int ecx, unsigned int edx,
>> > +                                struct feat_list *pFeat,
>> > +                                struct psr_socket_alloc_info *info)
>> > +{
>> > +    struct psr_cat_lvl_info l3_cat;
>> > +    unsigned int socket;
>> > +    uint64_t val;
>> > +
>> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
>> > +        return;
>> 
>> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
>> array is one with unsigned int as the base type, which doesn't get
>> reflected here.
>> 
> I will use BUILD_BUG_ON() for the error case.
> 
> As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
> than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
> sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
> should be good.

In fact the BUILD_BUG_ON() may not be needed if you follow the
suggestion above.

>> > +    }
>> > +
>> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
>> 
>> Aforementioned BUILD_BUG_ON() would probably bets be placed
>> right ahead of this memcpy().
>> 
> Can we call BUILD_BUG_ON() at the top of the function to avoid executing above
> process if the size is not right?

Well, the purpose of BUILD_BUG_ON() is what its name implies: It
fails the _build_ if violated, so execution can't possibly come here
(or in fact anywhere) in that case.

>> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
>> > +                               unsigned int cos, bool_t *found)
>> > +{
>> > +    struct psr_cat_lvl_info cat_info;
>> > +    uint64_t l3_def_cbm;
>> > +
>> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
>> 
>> Already here I think this memcpy()ing gets unwieldy. Can't you
>> simply make the structure field a union of all types of interest?
>> 
> Sorry that I am not very clear about your meaning to make a union. Do you mean
> make feat_info a union? If so, it will lose the universality to cover all
> features. Future feature may have different info.

Which is the purpose of a union - you'd simply add a new member
then.

> I think I can replace the memcpy() to directly assign value to cat_info.

No, this copying (done in _many_ places) really should go away.

>> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
>> 
>> I haven't figured yet at what times this function will get called, but
>> I suspect it's inappropriate for it to log messages.
>> 
>> > +                return -ENOENT;
>> > +            }
>> > +            *found = 1;
>> > +            return 2;
>> 
>> Who or what is 2?
>> 
> CDP occupies 2 members in mask[] array. So, it returns 2 to make codes outside
> the function to skip to next feature's mask value.

I guess that's going to be understandable once the return values
of the hooks get properly explained in their comments.

>> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
>> > +        *found = 1;
>> > +    else
>> > +        *found = 0;
>> 
>> Please use a simple assignment in cases like this, instead of if/else.
>> 
> Do you mean codes like below?
> 
> *found = 0;
> if ( mask[0] == pFeat->cos_reg_val[cos] )
>     *found = 1;

No.

    *found = (mask[0] == pFeat->cos_reg_val[cos]);

>> > +    /* CAT */
>> > +    if ( old_cos > cat_info.cos_max )
>> > +        mask[0] =  pFeat->cos_reg_val[0];
>> > +    else
>> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
>> 
>> Across the entire function - wouldn't it be easier to range check
>> old_cos once up front, forcing it to zero if the check fails? That
>> would reduce code size to a better readable amount. (And just to
>> avoid any misunderstanding - comments like this also may apply
>> to other functions; I'm not going to repeat them there.)
>> 
> Good idea! Thanks for the suggestion. I will check other functions like 
> this.
> 
>> Also note that there are undue double blanks in here.
>> 
> Do you mean two spaces? I cannot see it in my original patch and codes.

It's even still visible in the quoted text above.

>> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
>> > +        mask[0] = m;
>> 
>> This overwriting behavior also looks quite strange to me. What's
>> the purpose? And if this really is meant to be that way, why is
>> this not (leaving aside the other suggested adjustment)
>> 
>>     if ( type == PSR_MASK_TYPE_L3_CBM )
>>         mask[0] = m;
>>     else if ( old_cos > cat_info.cos_max )
>>         mask[0] = pFeat->cos_reg_val[0];
>>     else
>>         mask[0] = pFeat->cos_reg_val[old_cos];
>> 
>> ?
>> 
> get_old_set_new() is used to do below two things:
> 1. get old_cos register value of all supported features and
> 2. set the new value for appointed feature.
> 
> So, if the appointed feature is L3 CAT, we should set input vallue for it 
> here.

Okay, that answers the "what" aspect, but leaves open _why_ it
needs to be that way.

>> >  static inline void psr_assoc_init(void)
>> >  {
>> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
>> > +    struct psr_socket_alloc_info *info;
>> > +    unsigned int cos_max;
>> > +    unsigned int socket;
>> >  
>> > -    if ( cat_socket_info )
>> > +    if ( socket_alloc_info )
>> >      {
>> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
>> > +        socket = cpu_to_socket(smp_processor_id());
>> 
>> What's the point of moving the variable declaration? Quite the
>> other way around, the new variable you add could - afaict - all
>> live in this more narrow scope. And info could presumably become
>> a pointer to const.
>> 
> I remember it reports warning or error when compiling old codes.

I don't see why it would.

>> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
>> > -                        uint32_t *cos_max, uint32_t *flags)
>> > +int psr_get_info(unsigned int socket, enum mask_type type,
>> > +                 uint32_t *dat0, uint32_t *dat1,
>> > +                 uint32_t *dat2)
>> >  {
>> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
>> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
>> > +    struct feat_list *pTmp;
>> >  
>> >      if ( IS_ERR(info) )
>> >          return PTR_ERR(info);
>> >  
>> > -    *cbm_len = info->cbm_len;
>> > -    *cos_max = info->cos_max;
>> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
>> > +       return -ENODEV;
>> >  
>> > -    *flags = 0;
>> > -    if ( cdp_is_enabled(socket) )
>> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
>> 
>> Considering that this is one of the generic ops: What guarantees
>> that 3 elements will suffice going forward? I.e. why is this not an
>> array?
>> 
> Good point. I consider this too. Per all features we know, three parameters
> are sufficient. To simplify codes, I write the function like this. I think
> we can make change if future feature has more requirements.

If you introduce an abstraction layer, you should design it with
abstract requirements in mind, not with the requirements of
currently known implementations.

>> > -    case PSR_CBM_TYPE_L3_DATA:
>> > -        if ( !cdp_enabled )
>> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
>> > -        else
>> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
>> > -        break;
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        /* mask getting and setting order is same as feature list */
>> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
>> > +        if ( ret < 0 )
>> > +            return ret;
>> >  
>> > -    default:
>> > -        ASSERT_UNREACHABLE();
>> > +        mask += ret;
>> 
>> With this - what ensures the array isn't going to be overrun?
>> 
> It seems you missed one line below. The while() loop traverse all features in
> list. It will exit when the last member has been accessed.
> 
> +        pTmp = pTmp->pNext;

No, I didn't miss any line. I'm asking because there's nothing but
programmer discipline keeping array size and list length match up.
That's fragile.

>> And I'm also having some difficulty seeing how an array of outputs
>> matches up with exactly one input (m). Can you explain this please?
>> 
> This while loop will traverse all features in list. It will get all features
> old_cos value back. If the feature is the one to set new value, the input
> new value (m) will replace the old_cos value. All the value are set into mask
> array according the traversal order, meaning the same order of feature list
> members, like L3 CAT as the first and L2 CAT as the second.

This gets us back to the question of _why_ you need this "get all but
set only one" functionality. Plus, wasn't it that CDP requires two
values per feature?

>> > -static void cat_cpu_init(void)
>> > +static void internal_cpu_init(void)
>> 
>> Is "internal_" really a useful prefix here?
>> 
> I wanted to express this is an internal function in psr.c to separate it from
> psr_cpu_init().

The static keyword already says it's an internal one.

>> > +        pL3CAT = NULL;
>> >  
>> > -        spin_lock_init(&info->cbm_lock);
>> > -
>> > -        set_bit(socket, cat_socket_enable);
>> > -
>> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
>> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
>> > -        {
>> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
>> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
>> > -
>> > -            /* We only write mask1 since mask0 is always all ones by 
>> > default. */
>> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
>> > -
>> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
>> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << 
>> > PSR_L3_QOS_CDP_ENABLE_BIT));
>> > -
>> > -            /* Cut half of cos_max when CDP is enabled. */
>> > -            info->cos_max >>= 1;
>> > -
>> > -            set_bit(socket, cdp_socket_enable);
>> > -        }
>> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, 
>> > cbm_len:%u, CDP:%s\n",
>> > -               socket, info->cos_max, info->cbm_len,
>> > -               cdp_is_enabled(socket) ? "on" : "off");
>> > +        /* Initialize this feature according to CPUID. */
>> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
>> > +        pTmp->ops = l3_cat_ops;
>> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
>> 
>> This function can return without having consumed pTmp, which means
>> you'd leak the memory in that case.
>> 
> Yes, right, this is by design.

I hope not. You shouldn't design in memory leaks.

> pL3CAT/pL2CAT are allocated in
> internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
> pL2CAT will not be added into feature list. But when CPU on other socket up,
> internal_cpu_prepare() will be called again and the pL2CAT will not be
> allocated again (check if it is NULL). We can use the buffer has been 
> allocated.
> That means we maintain one redundant buffer to simplify the codes.

I don't think that's the case - pL3CAT gets set to NULL further up. But
with you switching to generic list management, this bogus extra list
element will go away anyway, and so will the extra allocation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.