[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] x86: refactor psr implementation in hypervisor.



Hi, Jan,

Thank you very much for reviewing my patches in details! Please
check my comments below.

On 16-09-06 01:40:00, Jan Beulich wrote:
> >>> On 25.08.16 at 07:22, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -23,22 +23,116 @@
> >  #define PSR_CAT        (1<<1)
> >  #define PSR_CDP        (1<<2)
> >  
> > -struct psr_cat_cbm {
> > -    union {
> > -        uint64_t cbm;
> > -        struct {
> > -            uint64_t code;
> > -            uint64_t data;
> > -        };
> > -    };
> > +/*
> > + * To add a new PSR feature, please follow below steps:
> > + * 1. Implement feature specific callback functions listed in
> > + *    "struct feat_ops";
> > + * 2. Implement a feature specific "struct feat_ops" object and register
> > + *    feature specific callback functions into it;
> > + * 3. Delcare feat_list object for the feature and malloc memory for it
> > + *    in internal_cpu_prepare(). Correspondingly, free them in
> > + *    free_feature();
> > + * 4. Add feature initialization codes in internal_cpu_init().
> > + */
> > +
> > +struct feat_list;
> > +struct psr_socket_alloc_info;
> > +
> > +/* Every feature enabled should implement such ops and callback functions. 
> > */
> > +struct feat_ops {
> > +    /*
> > +     * init_feature is used in cpu initialization process to do feature
> > +     * specific initialization works.
> > +     */
> > +    void (*init_feature)(unsigned int eax, unsigned int ebx,
> > +                         unsigned int ecx, unsigned int edx,
> > +                         struct feat_list *pFeat,
> 
> Here and below - we don't normally use identifiers with capital letters
> in the middle.
> 
Thanks! I will fix it.

> > +                         struct psr_socket_alloc_info *info);
> > +    /*
> > +     * get_old_set_new is used in set value process to get all features'
> > +     * COS registers values according to original cos id of the domain.
> > +     * Then, assemble them into an mask array as feature list order.
> 
> This sentence in particular doesn't make any sense to me. What
> follows below also looks like it is in need of improvement.
> 
Do you mean the comments are not accurate? How about below description?

get_old_set_new will traverse all features in list. It is used to do below two
things:
1. get old_cos register value of all supported features and
2. set the new value for appointed feature.

All the values are set into mask array according the traversal order, meaning
the same order of feature list members.

For example, old_cos of the domain is 1. User wants to set L3 CAT CBM to 0x1ff.
The original COS registers like below.

        -----------------------
        | COS 0 | COS 1 | ... |
        -----------------------
L3 CAT  | 0x7ff | 0x3ff | ... |
        -----------------------
L2 CAT  | 0xff  | 0x3f  | ... |
        -----------------------

Then, mask array should be:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask().

> > +     * Then, set the new feature value into feature corresponding position
> > +     * in the array. This function is used to pair with compare_mask.
> > +     */
> > +    int (*get_old_set_new)(uint64_t *mask,
> > +                           struct feat_list *pFeat,
> > +                           unsigned int old_cos,
> > +                           enum mask_type type,
> > +                           uint64_t m);
> > +    /*
> > +     * compare_mask is used to in set value process to compare if the
> > +     * input mask array can match all the features' COS registers values
> > +     * according to input cos id.
> > +     */
> > +    int (*compare_mask)(uint64_t *mask, struct feat_list *pFeat,
> > +                        unsigned int cos, bool_t *found);
> 
> For a "compare" function I'd expect a word on the meaning of the
> return value, and I'd expect most if not all pointer parameters to
> be pointers to const.
>
Thanks! I will explain the return value.
 
> Also plain bool please.
> 
Thanks! I will fix it.

> > +    /*
> > +     * get_cos_max_as_type is used to get the cos_max value of the feature
> > +     * according to input mask_type.
> > +     */
> > +    unsigned int (*get_cos_max_as_type)(struct feat_list *pFeat,
> > +                                        enum mask_type type);
> 
> DYM mean "get_cos_max_from_type()"?
> 
Yes, correct. I will revise the function name.

> > +    /*
> > +     * exceed_range is used to check if the input cos id exceeds the
> > +     * feature's cos_max and if the input mask value is not the default 
> > one.
> > +     * Even if the associated cos exceeds the cos_max, HW can work as 
> > default
> > +     * value. That is the reason we need check is mask value is default 
> > one.
> > +     * If both criteria are fulfilled, that means the input exceeds the
> > +     * range.
> > +     */
> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> > +                                 unsigned int cos);
> 
> According to the comment this is kind of a predicate, which seems
> unlikely to return an unsigned value. In fact without a word on the
> return value I'd expect such to return bool. And I'd also expect the
> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> for compare above I wonder whether come or all of the parameters
> should be pointers to const (please go over the entire patch and do
> constification wherever possible/sensible).
> 
Yes, you are right. I will change the function type to bool and add const
for not changed input pointers.

This function is used to check if the input cos id exceeds the cos_max. If yes
and the set value is not default value, we should return error. So, I think
to change the function name to exceed_cos_max(). How do you think?

> > +    /*
> > +     * write_msr is used to write feature specific MSR registers.
> > +     */
> 
> This is a single line comment.
> 
Yes, will change the format. Thanks!

> > +    int (*write_msr)(unsigned int cos, uint64_t *mask,
> > +                     struct feat_list *pFeat);
> 
> Such write operations don't normally alter the value to be written.
> If that's different here, that's perhaps worth a word in the comment.
> If it isn't, then I don't see why the address of the value gets passed
> instead of the value itself.
> 
> Oh, having gone further in the patch I see this is to update multiple
> MSRs. I guess this would best be reflected by making the parameter
> const uint64_t masks[] (albeit it's also not clear to me why this
> necessarily have to be masks kind items, and not just arbitrary
> values - proper choice of variable and parameter names helps the
> understanding).
> 
Thanks for your suggestion! Will change it to uint64_t val[].

> > +#define MAX_FEAT_INFO_SIZE 8
> > +#define MAX_COS_REG_NUM  128
> 
> Are these numbers arbitrary, or based on something?
> 
MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
consider the extension for future feature.

MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
for all PSR features so far.

> > +struct psr_socket_alloc_info {
> 
> I've yet to see whether the "alloc" in the name really makes sense.
>
Thanks! Will change it.
 
> > @@ -58,8 +150,427 @@ static unsigned int __read_mostly opt_cos_max = 255;
> >  static uint64_t rmid_mask;
> >  static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
> >  
> > -static struct psr_cat_cbm *temp_cos_to_cbm;
> > +static struct psr_ref *temp_cos_ref;
> > +/* Every feature has its own object. */
> > +struct feat_list *pL3CAT;
> 
> static. And a better name please, more in line with the other variable
> or similar purpose.
> 
Thanks! Will modify it.

> > +/* Common functions for supporting feature callback functions. */
> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
> > +{
> > +    if ( NULL == pHead || NULL == pTmp )
> > +        return;
> > +
> > +    while ( pHead->pNext )
> > +        pHead = pHead->pNext;
> > +
> > +    pHead->pNext = pTmp;
> > +}
> 
> Do you really need custom list management here?
> 
It seems xen list interfaces require the input list be a double linked list but
my list is a single linked list. Furthermore, I only need simple add to tail
function and free function. So I create custom list management functions.

> > +static void free_feature(struct psr_socket_alloc_info *info)
> > +{
> > +    struct feat_list *pTmp;
> > +    struct feat_list *pPrev;
> > +
> > +    if ( NULL == info )
> > +        return;
> > +
> > +    if ( NULL == info->pFeat )
> > +        return;
> > +
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        pPrev = pTmp;
> > +        pTmp = pTmp->pNext;
> > +        clear_bit(pPrev->feature, &(info->features));
> > +        xfree(pPrev);
> > +        pPrev = NULL;
> > +    }
> > +}
> 
> Why does info->pFeat not get freed here?
> 
The info->pFeat is a redundant list head to facilitate list operations.
It is only freed when socket info is freed.

With this list head, the advantage is that we can insert/delete the first
element same as others without special operations.

> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
> > +                                unsigned int ecx, unsigned int edx,
> > +                                struct feat_list *pFeat,
> > +                                struct psr_socket_alloc_info *info)
> > +{
> > +    struct psr_cat_lvl_info l3_cat;
> > +    unsigned int socket;
> > +    uint64_t val;
> > +
> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
> > +        return;
> 
> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
> array is one with unsigned int as the base type, which doesn't get
> reflected here.
> 
I will use BUILD_BUG_ON() for the error case.

As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
should be good.

> > +    /* No valid value so do not enable feature. */
> > +    if ( 0 == eax || 0 == edx )
> > +        return;
> > +
> > +    l3_cat.cbm_len = (eax & 0x1f) + 1;
> > +    l3_cat.cos_max = min(opt_cos_max, edx & 0xffff);
> >  
> > +    /* cos=0 is reserved as default cbm(all ones). */
> > +    pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> > +
> > +    pFeat->feature = PSR_SOCKET_L3_CAT;
> > +    set_bit(PSR_SOCKET_L3_CAT, &(info->features));
> 
> Does this need to be set_bit(), or would __set_bit() suffice? Also
> there are many cases of this strange &() construct - the
> parentheses are pointless there.
> 
This is inherited from original codes. After checking codes, I think this can
be replaced by non-atomic operation __set_bit(). Thanks!

Will remove unnecessary parentheses.

> > +    if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> > +         !test_bit(PSR_SOCKET_L3_CDP, &(info->features)) )
> > +    {
> > +        /* Data */
> > +        pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> > +        /* Code */
> > +        pFeat->cos_reg_val[1] = (1ull << l3_cat.cbm_len) - 1;
> > +
> > +        /* We only write mask1 since mask0 is always all ones by default. 
> > */
> > +        wrmsrl(MSR_IA32_PSR_L3_MASK(1),
> > +               (1ull << l3_cat.cbm_len) - 1);
> > +        rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> > +        wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << 
> > PSR_L3_QOS_CDP_ENABLE_BIT));
> > +
> > +        /* Cut half of cos_max when CDP is enabled. */
> > +        l3_cat.cos_max >>= 1;
> > +
> > +        pFeat->feature = PSR_SOCKET_L3_CDP;
> > +        set_bit(PSR_SOCKET_L3_CDP, &(info->features));
> > +        clear_bit(PSR_SOCKET_L3_CAT, &(info->features));
> 
> A few lines up you set this bit, and now you clear it again. Please
> instead set it in an else branch below.
> 
Ok, thanks for the suggestion!

> > +    }
> > +
> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
> 
> Aforementioned BUILD_BUG_ON() would probably bets be placed
> right ahead of this memcpy().
> 
Can we call BUILD_BUG_ON() at the top of the function to avoid executing above
process if the size is not right?

> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> > +                               unsigned int cos, bool_t *found)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +    uint64_t l3_def_cbm;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> 
> Already here I think this memcpy()ing gets unwieldy. Can't you
> simply make the structure field a union of all types of interest?
> 
Sorry that I am not very clear about your meaning to make a union. Do you mean
make feat_info a union? If so, it will lose the universality to cover all
features. Future feature may have different info.

I think I can replace the memcpy() to directly assign value to cat_info.

> > +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> > +
> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( cos > cat_info.cos_max )
> > +        {
> > +            if ( mask[0] != l3_def_cbm ||
> > +                 mask[1] != l3_def_cbm )
> > +            {
> > +                *found = 0;
> 
> false and true for literal boolean values please.
> 
Thanks! Will do it.

> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
> 
> I haven't figured yet at what times this function will get called, but
> I suspect it's inappropriate for it to log messages.
>
Ok, will remove the log. Thanks! 

> > +                return -ENOENT;
> > +            }
> > +            *found = 1;
> > +            return 2;
> 
> Who or what is 2?
> 
CDP occupies 2 members in mask[] array. So, it returns 2 to make codes outside
the function to skip to next feature's mask value.

> > +        }
> > +
> > +        if ( mask[0] == pFeat->cos_reg_val[cos * 2] &&
> > +             mask[1] == pFeat->cos_reg_val[cos * 2 + 1])
> > +            *found = 1;
> > +        else
> > +            *found = 0;
> > +
> > +        return 2;
> > +    }
> > +
> > +    /* CAT */
> > +    if ( cos > cat_info.cos_max )
> > +    {
> > +        if ( mask[0] != l3_def_cbm )
> > +        {
> > +            *found = 0;
> > +            printk(XENLOG_ERR "CAT exceed cos max.\n");
> > +            return -ENOENT;
> > +        }
> > +        *found = 1;
> > +        return 1;
> > +    }
> > +
> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
> > +        *found = 1;
> > +    else
> > +        *found = 0;
> 
> Please use a simple assignment in cases like this, instead of if/else.
> 
Do you mean codes like below?

*found = 0;
if ( mask[0] == pFeat->cos_reg_val[cos] )
    *found = 1;

> > +static unsigned int l3_cat_exceed_range(uint64_t *mask, struct feat_list 
> > *pFeat,
> > +                                        unsigned int cos)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +    uint64_t l3_def_cbm;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> > +
> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( cos > cat_info.cos_max )
> > +            if ( mask[0] != l3_def_cbm ||
> > +                 mask[1] != l3_def_cbm )
> 
> Please combine such if()s.
> 
Sure. Thanks!

> > +                /*
> > +                 * Exceed cos_max and value to set is not default,
> > +                 * return error.
> > +                 */
> > +                return 0;
> > +
> > +        return 2;
> 
> Same question as above - who or what is 2?
> 
Please check above comment.

> > +static int l3_cat_get_old_set_new(uint64_t *mask,
> > +                                  struct feat_list *pFeat,
> > +                                  unsigned int old_cos,
> > +                                  enum mask_type type,
> > +                                  uint64_t m)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +
> > +    if ( type == PSR_MASK_TYPE_L3_DATA ||
> > +         type == PSR_MASK_TYPE_L3_CODE ||
> > +         type == PSR_MASK_TYPE_L3_CBM )
> > +    {
> > +        if ( !psr_check_cbm(cat_info.cbm_len, m) )
> > +            return -EINVAL;
> > +    }
> > +
> > +    if ( ( type == PSR_MASK_TYPE_L3_DATA ||
> > +         type == PSR_MASK_TYPE_L3_CODE ) &&
> > +         pFeat->feature != PSR_SOCKET_L3_CDP )
> > +        return -ENXIO;
> 
> Please either combine the two if()s into a single switch() (with
> appropriate fall through behavior) or, perhaps better, move
> this second if() - suitably shrunk - past the following one. This is
> to avoid redundant code, which makes things more difficult to
> follow.
> 
Thanks! Will make modification.

> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( old_cos > cat_info.cos_max )
> > +        {
> > +            /* Data */
> > +            mask[0] = pFeat->cos_reg_val[0];
> > +            /* Code */
> > +            mask[1] = pFeat->cos_reg_val[1];
> > +        }
> > +        else
> > +        {
> > +            /* Data */
> > +            mask[0] = pFeat->cos_reg_val[old_cos * 2];
> > +            /* Code */
> > +            mask[1] = pFeat->cos_reg_val[old_cos * 2 + 1];
> > +        }
> > +
> > +        /* Set new mask */
> > +        if ( type == PSR_MASK_TYPE_L3_DATA )
> > +            mask[0] = m;
> > +        if ( type == PSR_MASK_TYPE_L3_CODE )
> > +            mask[1] = m;
> > +
> > +        return 2;
> > +    }
> > +
> > +    /* CAT */
> > +    if ( old_cos > cat_info.cos_max )
> > +        mask[0] =  pFeat->cos_reg_val[0];
> > +    else
> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
> 
> Across the entire function - wouldn't it be easier to range check
> old_cos once up front, forcing it to zero if the check fails? That
> would reduce code size to a better readable amount. (And just to
> avoid any misunderstanding - comments like this also may apply
> to other functions; I'm not going to repeat them there.)
> 
Good idea! Thanks for the suggestion. I will check other functions like this.

> Also note that there are undue double blanks in here.
> 
Do you mean two spaces? I cannot see it in my original patch and codes.

> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> > +        mask[0] = m;
> 
> This overwriting behavior also looks quite strange to me. What's
> the purpose? And if this really is meant to be that way, why is
> this not (leaving aside the other suggested adjustment)
> 
>     if ( type == PSR_MASK_TYPE_L3_CBM )
>         mask[0] = m;
>     else if ( old_cos > cat_info.cos_max )
>         mask[0] = pFeat->cos_reg_val[0];
>     else
>         mask[0] = pFeat->cos_reg_val[old_cos];
> 
> ?
> 
get_old_set_new() is used to do below two things:
1. get old_cos register value of all supported features and
2. set the new value for appointed feature.

So, if the appointed feature is L3 CAT, we should set input vallue for it here.

> > +static int l3_cat_get_feat_info(struct feat_list *pFeat, enum mask_type 
> > type,
> > +                                uint32_t *dat0, uint32_t *dat1,
> > +                                uint32_t *dat2)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +
> > +    if ( type != PSR_MASK_TYPE_L3_DATA &&
> > +         type != PSR_MASK_TYPE_L3_CODE &&
> > +         type != PSR_MASK_TYPE_L3_CBM )
> > +        return 0;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +
> > +    *dat0 = cat_info.cbm_len;
> > +    *dat1 = cat_info.cos_max;
> > +
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +        *dat2 |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> > +    else
> > +        *dat2 = 0;
> > +
> > +    return 1;
> > +}
> 
> The latest here it becomes clear that you want to use switch()
> much more widely.
> 
Ok, will consider it carefully.

> >  static inline void psr_assoc_init(void)
> >  {
> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
> > +    struct psr_socket_alloc_info *info;
> > +    unsigned int cos_max;
> > +    unsigned int socket;
> >  
> > -    if ( cat_socket_info )
> > +    if ( socket_alloc_info )
> >      {
> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
> > +        socket = cpu_to_socket(smp_processor_id());
> 
> What's the point of moving the variable declaration? Quite the
> other way around, the new variable you add could - afaict - all
> live in this more narrow scope. And info could presumably become
> a pointer to const.
> 
I remember it reports warning or error when compiling old codes.

> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
> > -                        uint32_t *cos_max, uint32_t *flags)
> > +int psr_get_info(unsigned int socket, enum mask_type type,
> > +                 uint32_t *dat0, uint32_t *dat1,
> > +                 uint32_t *dat2)
> >  {
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> > +    struct feat_list *pTmp;
> >  
> >      if ( IS_ERR(info) )
> >          return PTR_ERR(info);
> >  
> > -    *cbm_len = info->cbm_len;
> > -    *cos_max = info->cos_max;
> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
> > +       return -ENODEV;
> >  
> > -    *flags = 0;
> > -    if ( cdp_is_enabled(socket) )
> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
> 
> Considering that this is one of the generic ops: What guarantees
> that 3 elements will suffice going forward? I.e. why is this not an
> array?
> 
Good point. I consider this too. Per all features we know, three parameters
are sufficient. To simplify codes, I write the function like this. I think
we can make change if future feature has more requirements.

If you like to get things done one time, I will make change.

> > -int psr_get_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t *cbm, enum cbm_type type)
> > +static int get_old_set_new(uint64_t *mask,
> > +                           struct psr_socket_alloc_info *info,
> > +                           unsigned int old_cos,
> > +                           enum mask_type type,
> > +                           uint64_t m)
> >  {
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > -    bool_t cdp_enabled = cdp_is_enabled(socket);
> > -
> > -    if ( IS_ERR(info) )
> > -        return PTR_ERR(info);
> > +    struct feat_list *pTmp;
> > +    int ret;
> >  
> > -    switch ( type )
> > -    {
> > -    case PSR_CBM_TYPE_L3:
> > -        if ( cdp_enabled )
> > -            return -EXDEV;
> > -        *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        break;
> > +    if ( !mask )
> > +        return -EINVAL;
> >  
> > -    case PSR_CBM_TYPE_L3_CODE:
> > -        if ( !cdp_enabled )
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        else
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].code;
> > -        break;
> > +    if ( !info || !info->pFeat )
> > +        return -ENODEV;
> 
> Earlier on you used (slightly oddly arranged) checks against NULL.
> Please be consistent (and I'd prefer if you used ! across the board).
> 
Got it, thanks!

> > -    case PSR_CBM_TYPE_L3_DATA:
> > -        if ( !cdp_enabled )
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        else
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
> > -        break;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        /* mask getting and setting order is same as feature list */
> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
> > +        if ( ret < 0 )
> > +            return ret;
> >  
> > -    default:
> > -        ASSERT_UNREACHABLE();
> > +        mask += ret;
> 
> With this - what ensures the array isn't going to be overrun?
> 
It seems you missed one line below. The while() loop traverse all features in
list. It will exit when the last member has been accessed.

+        pTmp = pTmp->pNext;

> And I'm also having some difficulty seeing how an array of outputs
> matches up with exactly one input (m). Can you explain this please?
> 
This while loop will traverse all features in list. It will get all features
old_cos value back. If the feature is the one to set new value, the input
new value (m) will replace the old_cos value. All the value are set into mask
array according the traversal order, meaning the same order of feature list
members, like L3 CAT as the first and L2 CAT as the second.

For example, old_cos of the domain is 1. User wants to set L3 CAT CBM to 0x1ff.
The original COS registers like below.

        -----------------------
        | COS 0 | COS 1 | ... |
        -----------------------
L3 CAT  | 0x7ff | 0x3ff | ... |
        -----------------------
L2 CAT  | 0xff  | 0x3f  | ... |
        -----------------------

Then, mask array should be:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask().

> > -static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
> > +int psr_get_val(struct domain *d, unsigned int socket,
> > +                uint64_t *val, enum mask_type type)
> >  {
> > -    unsigned int first_bit, zero_bit;
> > -
> > -    /* Set bits should only in the range of [0, cbm_len). */
> > -    if ( cbm & (~0ull << cbm_len) )
> > -        return 0;
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> > +    unsigned int cos = d->arch.psr_cos_ids[socket];
> > +    struct feat_list *pTmp;
> > +    int ret;
> >  
> > -    /* At least one bit need to be set. */
> > -    if ( cbm == 0 )
> > -        return 0;
> > +    if ( IS_ERR(info) )
> > +        return PTR_ERR(info);
> >  
> > -    first_bit = find_first_bit(&cbm, cbm_len);
> > -    zero_bit = find_next_zero_bit(&cbm, cbm_len, first_bit);
> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext )
> > +        return -ENODEV;
> >  
> > -    /* Set bits should be contiguous. */
> > -    if ( zero_bit < cbm_len &&
> > -         find_next_bit(&cbm, cbm_len, zero_bit) < cbm_len )
> > -        return 0;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        ret = pTmp->ops.get_val(pTmp, cos, type, val);
> > +        if ( ret < 0 )
> > +            return -EINVAL;
> 
> Why don't you return "ret" here?
> 
This is an unnecessary check. Will remove it.

> > +        else if ( ret > 0 )
> 
> Pointless "else".
Will remove it.

> 
> > -static void do_write_l3_cbm(void *data)
> > +static void do_write_psr_ref(void *data)
> 
> What does "_ref" stand for?
> 
Sorry, a mistake.

> > -static int find_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> > -                    uint64_t cbm_code, uint64_t cbm_data, bool_t 
> > cdp_enabled)
> > +static int find_cos(uint64_t *mask, enum mask_type type,
> > +                    struct psr_socket_alloc_info *info)
> >  {
> >      unsigned int cos;
> > +    struct psr_ref *map = info->cos_ref;
> > +    struct feat_list *pTmp;
> > +    uint64_t *pMask = mask;
> > +    int ret;
> > +    bool_t found = 0;
> > +    unsigned int cos_max = 0;
> > +
> > +    /* cos_max is the max COS of the feature to be set. */
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> > +        if ( cos_max > 0 )
> > +            break;
> > +
> > +        pTmp = pTmp->pNext;
> > +    }
> >  
> > +    pTmp = info->pFeat->pNext;
> >      for ( cos = 0; cos <= cos_max; cos++ )
> >      {
> > -        if ( (map[cos].ref || cos == 0) &&
> > -             ((!cdp_enabled && map[cos].cbm == cbm_code) ||
> > -              (cdp_enabled && map[cos].code == cbm_code &&
> > -                              map[cos].data == cbm_data)) )
> > +        if ( 0 != cos && 0 == map[cos].ref )
> > +            continue;
> > +
> > +        while ( pTmp )
> > +        {
> > +            /*
> > +             * Compare mask according to feature list order.
> > +             * We must follow this order because mask is assembled
> > +             * as this order in get_old_set_new().
> > +             */
> > +            ret = pTmp->ops.compare_mask(pMask, pTmp, cos, &found);
> > +            if ( ret < 0 )
> > +                return ret;
> > +
> > +            /* If fail to match, go to next cos to compare. */
> > +            if ( !found )
> > +                break;
> > +
> > +            pMask += ret;
> > +            pTmp = pTmp->pNext;
> > +        }
> > +
> > +        /* Every feature can match, this cos is what we find. */
> > +        if ( found )
> >              return cos;
> > +
> > +        /* Not found, need find again from beginning. */
> > +        pTmp = info->pFeat->pNext;
> > +        pMask = mask;
> 
> Please do once (at the appropriate place inside the outer loop) what
> you have done prior to the loop and repeat here. As already alluded
> to - limiting the scope of variables also helps ensuring that things
> work as intended.
> 
Good suggestion, thanks!

> >      }
> >  
> > +    printk(XENLOG_INFO "%s: cannot find cos.\n", __func__);
> 
> Again a questionable log message with a questionable use of __func__.
> In any case log messages should normally not have a full stop at their
> end.
> 
Will remove this log.

> > -static int pick_avail_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> > -                          unsigned int old_cos)
> > +static int check_exceed_range(uint64_t *mask, struct feat_list *pTmp,
> > +                              unsigned int cos)
> >  {
> > +    unsigned int ret = 0;
> 
> Pointless initializer.
> 
Will remove it.

> > +static int alloc_new_cos(struct psr_socket_alloc_info *info,
> > +                         uint64_t *mask, unsigned int old_cos,
> > +                         enum mask_type type)
> > +{
> > +    unsigned int cos_max = 0;
> > +    struct feat_list *pTmp;
> >      unsigned int cos;
> > +    struct psr_ref *map = info->cos_ref;
> > +
> > +    /*
> > +     * cos_max is the max COS of the feature to be set.
> > +     */
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> > +        if ( cos_max > 0 )
> > +            break;
> > +
> > +        pTmp = pTmp->pNext;
> > +    }
> > +
> > +    if ( !cos_max )
> > +        return -ENOENT;
> >  
> >      /* If old cos is referred only by the domain, then use it. */
> >      if ( map[old_cos].ref == 1 && old_cos != 0 )
> > +    {
> > +        pTmp = info->pFeat->pNext;
> > +        if ( check_exceed_range(mask, pTmp, old_cos) )
> > +            goto find_new;
> > +
> >          return old_cos;
> > +    }
> >  
> > +find_new:
> 
> Labels indented by at least one blank please. In this case, though,
> it looks like you could easily do without any label and goto.
> 
Ok, I will consider not to use goto and label here. At least, will add one
blank. Thanks!

> > -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t cbm, enum cbm_type type)
> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint64_t val, enum mask_type type)
> >  {
> > -    unsigned int old_cos, cos_max;
> > +    unsigned int old_cos;
> >      int cos, ret;
> > -    uint64_t cbm_data, cbm_code;
> > -    bool_t cdp_enabled = cdp_is_enabled(socket);
> > -    struct psr_cat_cbm *map;
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > +    struct psr_ref *map;
> > +    uint64_t *mask;
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> >  
> >      if ( IS_ERR(info) )
> >          return PTR_ERR(info);
> >  
> > -    if ( !psr_check_cbm(info->cbm_len, cbm) )
> > -        return -EINVAL;
> > -
> > -    if ( !cdp_enabled && (type == PSR_CBM_TYPE_L3_CODE ||
> > -                          type == PSR_CBM_TYPE_L3_DATA) )
> > -        return -ENXIO;
> > -
> > -    cos_max = info->cos_max;
> > +    map = info->cos_ref;
> >      old_cos = d->arch.psr_cos_ids[socket];
> > -    map = info->cos_to_cbm;
> > -
> > -    switch ( type )
> > -    {
> > -    case PSR_CBM_TYPE_L3:
> > -        cbm_code = cbm;
> > -        cbm_data = cbm;
> > -        break;
> > -
> > -    case PSR_CBM_TYPE_L3_CODE:
> > -        cbm_code = cbm;
> > -        cbm_data = map[old_cos].data;
> > -        break;
> > -
> > -    case PSR_CBM_TYPE_L3_DATA:
> > -        cbm_code = map[old_cos].code;
> > -        cbm_data = cbm;
> > -        break;
> >  
> > -    default:
> > -        ASSERT_UNREACHABLE();
> > -        return -EINVAL;
> > -    }
> > +    /* Considering CDP liking features */
> 
> DYM "like"? And what do you want to say with this comment?
> 
Yes, "like" should be. CDP needs occupy two members in mask array. So, I
make nr_feat multiply 2 as the array size.

> > +    mask = xzalloc_array(uint64_t, info->nr_feat * 2);
> 
> Perhaps it's related to the doubling of nr_feat here? If so - what
> precludes another future feature to require 3 or 4 slots per element?
> I think this needs abstracting out, now that you create an abstract
> framework.
> 
Good point. I will consider it.

> > +    if ( (ret = get_old_set_new(mask, info, old_cos, type, val)) != 0 )
> 
> What if the xzalloc_array() returned NULL?
> 
Sorry, missed the check.

> > +        return ret;
> >  
> > -    spin_lock(&info->cbm_lock);
> > -    cos = find_cos(map, cos_max, cbm_code, cbm_data, cdp_enabled);
> > +    spin_lock(&info->mask_lock);
> > +    cos = find_cos(mask, type, info);
> >      if ( cos >= 0 )
> >      {
> >          if ( cos == old_cos )
> >          {
> > -            spin_unlock(&info->cbm_lock);
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> >              return 0;
> >          }
> >      }
> >      else
> >      {
> > -        cos = pick_avail_cos(map, cos_max, old_cos);
> > +        cos = alloc_new_cos(info, mask, old_cos, type);
> >          if ( cos < 0 )
> >          {
> > -            spin_unlock(&info->cbm_lock);
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> >              return cos;
> >          }
> >  
> > -        /* We try to avoid writing MSR. */
> > -        if ( (cdp_enabled &&
> > -             (map[cos].code != cbm_code || map[cos].data != cbm_data)) ||
> > -             (!cdp_enabled && map[cos].cbm != cbm_code) )
> > +        /* Write all features mask MSRs corresponding to the COS */
> > +        ret = write_psr_ref(socket, cos, mask);
> > +        if ( ret )
> >          {
> > -            ret = write_l3_cbm(socket, cos, cbm_code, cbm_data, 
> > cdp_enabled);
> > -            if ( ret )
> > -            {
> > -                spin_unlock(&info->cbm_lock);
> > -                return ret;
> > -            }
> > -            map[cos].code = cbm_code;
> > -            map[cos].data = cbm_data;
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> > +            return ret;
> >          }
> >      }
> >  
> >      map[cos].ref++;
> >      map[old_cos].ref--;
> > -    spin_unlock(&info->cbm_lock);
> > +
> > +    spin_unlock(&info->mask_lock);
> >  
> >      d->arch.psr_cos_ids[socket] = cos;
> > +    xfree(mask);
> > +    mask = NULL;
> >  
> >      return 0;
> 
> I don't think setting mask to NULL right ahead of a return statement
> is very helpful.
> 
Yes, this sentence can be removed.

> > -static void cat_cpu_init(void)
> > +static void internal_cpu_init(void)
> 
> Is "internal_" really a useful prefix here?
> 
I wanted to express this is an internal function in psr.c to separate it from
psr_cpu_init().

> >  {
> >      unsigned int eax, ebx, ecx, edx;
> > -    struct psr_cat_socket_info *info;
> > +    struct psr_socket_alloc_info *info;
> >      unsigned int socket;
> >      unsigned int cpu = smp_processor_id();
> > -    uint64_t val;
> >      const struct cpuinfo_x86 *c = cpu_data + cpu;
> > +    struct feat_list *pTmp;
> >  
> >      if ( !cpu_has(c, X86_FEATURE_PQE) || c->cpuid_level < 
> > PSR_CPUID_LEVEL_CAT )
> >          return;
> >  
> >      socket = cpu_to_socket(cpu);
> > -    if ( test_bit(socket, cat_socket_enable) )
> > +    info = socket_alloc_info + socket;
> > +    if ( info->features )
> >          return;
> >  
> >      cpuid_count(PSR_CPUID_LEVEL_CAT, 0, &eax, &ebx, &ecx, &edx);
> >      if ( ebx & PSR_RESOURCE_TYPE_L3 )
> >      {
> > -        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> > -        info = cat_socket_info + socket;
> > -        info->cbm_len = (eax & 0x1f) + 1;
> > -        info->cos_max = min(opt_cos_max, edx & 0xffff);
> > -
> > -        info->cos_to_cbm = temp_cos_to_cbm;
> > -        temp_cos_to_cbm = NULL;
> > -        /* cos=0 is reserved as default cbm(all ones). */
> > -        info->cos_to_cbm[0].cbm = (1ull << info->cbm_len) - 1;
> > +        pTmp = pL3CAT;
> > +        if ( !pTmp )
> > +            return;
> 
> You check the allocation result in internal_cpu_prepare() - can you
> really get here with this being NULL? If you can't, drop the if() (or
> make it an ASSERT()).
> 
Ok, thanks!

> > +        pL3CAT = NULL;
> >  
> > -        spin_lock_init(&info->cbm_lock);
> > -
> > -        set_bit(socket, cat_socket_enable);
> > -
> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
> > -        {
> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
> > -
> > -            /* We only write mask1 since mask0 is always all ones by 
> > default. */
> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
> > -
> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << 
> > PSR_L3_QOS_CDP_ENABLE_BIT));
> > -
> > -            /* Cut half of cos_max when CDP is enabled. */
> > -            info->cos_max >>= 1;
> > -
> > -            set_bit(socket, cdp_socket_enable);
> > -        }
> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, 
> > cbm_len:%u, CDP:%s\n",
> > -               socket, info->cos_max, info->cbm_len,
> > -               cdp_is_enabled(socket) ? "on" : "off");
> > +        /* Initialize this feature according to CPUID. */
> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> > +        pTmp->ops = l3_cat_ops;
> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
> 
> This function can return without having consumed pTmp, which means
> you'd leak the memory in that case.
> 
Yes, right, this is by design. pL3CAT/pL2CAT are allocated in
internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
pL2CAT will not be added into feature list. But when CPU on other socket up,
internal_cpu_prepare() will be called again and the pL2CAT will not be
allocated again (check if it is NULL). We can use the buffer has been allocated.
That means we maintain one redundant buffer to simplify the codes.

> > -static void __init psr_cat_free(void)
> > +static void __init psr_free(void)
> >  {
> > -    xfree(cat_socket_enable);
> > -    cat_socket_enable = NULL;
> > -    xfree(cat_socket_info);
> > -    cat_socket_info = NULL;
> > +    int i;
> 
> unsigned int
> 
Will change it. Thanks!

> > -static void __init init_psr_cat(void)
> > +static void __init init_psr(void)
> >  {
> > +    int i;
> 
> Again.
> 
Will change it. Thanks!

> > +
> >      if ( opt_cos_max < 1 )
> >      {
> >          printk(XENLOG_INFO "CAT: disabled, cos_max is too small\n");
> >          return;
> >      }
> >  
> > -    cat_socket_enable = xzalloc_array(unsigned long, 
> > BITS_TO_LONGS(nr_sockets));
> > -    cat_socket_info = xzalloc_array(struct psr_cat_socket_info, 
> > nr_sockets);
> > -    cdp_socket_enable = xzalloc_array(unsigned long, 
> > BITS_TO_LONGS(nr_sockets));
> > +    socket_alloc_info = xzalloc_array(struct psr_socket_alloc_info, 
> > nr_sockets);
> > +
> > +    if ( !socket_alloc_info )
> > +    {
> > +        printk(XENLOG_INFO "Fail to alloc socket_alloc_info!\n");
> > +        return;
> > +    }
> >  
> > -    if ( !cat_socket_enable || !cat_socket_info )
> > -        psr_cat_free();
> > +    for ( i = 0; i < nr_sockets; i++ )
> > +    {
> > +        socket_alloc_info[i].pFeat =  xzalloc(struct feat_list);
> 
> What's the point of this dummy list element?
> 
The pFeat is a redundant list head to facilitate list operations.
With this list head, the advantage is that we can insert/delete the first
element same as others without special operations.

> > +        if ( NULL == socket_alloc_info[i].pFeat )
> > +        {
> > +            printk(XENLOG_INFO "Fail to alloc pFeat!\n");
> 
> How would the reader of the log easily associate such a message
> with something having gone wrong in PSR initialization?
> 
Yes, seems not necessary. Will remove it.

> > +            return;
> > +        }
> > +        socket_alloc_info[i].pFeat->pNext = NULL;
> 
> This is redundant with the use of xzalloc().
> 
Will remove it, thanks!

> >  static void psr_cpu_init(void)
> >  {
> > -    if ( cat_socket_info )
> > -        cat_cpu_init();
> > +    unsigned int socket;
> > +    struct psr_socket_alloc_info *info;
> > +
> > +    if ( socket_alloc_info )
> > +    {
> > +        socket = cpu_to_socket(smp_processor_id());
> > +        info = socket_alloc_info + socket;
> > +        info->cos_ref = temp_cos_ref;
> > +        temp_cos_ref = NULL;
> > +        spin_lock_init(&info->mask_lock);
> > +
> > +        internal_cpu_init();
> 
> Why is everything ahead of this call not part of that function?
> 
The original purpose is to make internal_cpu_init() be simpler. After
considering it again, move these codes into the function may be better.
Thanks!

> > --- a/xen/include/asm-x86/psr.h
> > +++ b/xen/include/asm-x86/psr.h
> > @@ -46,10 +46,10 @@ struct psr_cmt {
> >      struct psr_cmt_l3 l3;
> >  };
> >  
> > -enum cbm_type {
> > -    PSR_CBM_TYPE_L3,
> > -    PSR_CBM_TYPE_L3_CODE,
> > -    PSR_CBM_TYPE_L3_DATA,
> > +enum mask_type {
> > +    PSR_MASK_TYPE_L3_CBM,
> > +    PSR_MASK_TYPE_L3_CODE,
> > +    PSR_MASK_TYPE_L3_DATA,
> >  };
> 
> The enum tag is now too generic a name - please make this
> psr_mask_type or something similar.
> 
Ok, thanks!

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.