[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] x86: refactor psr implementation in hypervisor.



On 16-09-07 03:01:34, Jan Beulich wrote:
> >> >>> On 25.08.16 at 07:22, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
> >> > +                         struct psr_socket_alloc_info *info);
> >> > +    /*
> >> > +     * get_old_set_new is used in set value process to get all features'
> >> > +     * COS registers values according to original cos id of the domain.
> >> > +     * Then, assemble them into an mask array as feature list order.
> >> 
> >> This sentence in particular doesn't make any sense to me. What
> >> follows below also looks like it is in need of improvement.
> >> 
> > Do you mean the comments are not accurate?
> 
> I simply wasn't able to tell, because of not being able to interpret
> the sentence.
> 
> > How about below description?
> >  
> > get_old_set_new will traverse all features in list. It is used to do below 
> > two
> > things:
> > 1. get old_cos register value of all supported features and
> > 2. set the new value for appointed feature.
> > 
> > All the values are set into mask array according the traversal order, 
> > meaning
> > the same order of feature list members.
> 
> Sounds reasonable. I suppose the example that you gave right
> next wasn't meant to go into the comment.
> 
Great, will replace the comments with the new sentences. The example will not
go in.

> >> > +    /*
> >> > +     * exceed_range is used to check if the input cos id exceeds the
> >> > +     * feature's cos_max and if the input mask value is not the default 
> >> > one.
> >> > +     * Even if the associated cos exceeds the cos_max, HW can work as 
> >> > default
> >> > +     * value. That is the reason we need check is mask value is default 
> >> > one.
> >> > +     * If both criteria are fulfilled, that means the input exceeds the
> >> > +     * range.
> >> > +     */
> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list 
> >> > *pFeat,
> >> > +                                 unsigned int cos);
> >> 
> >> According to the comment this is kind of a predicate, which seems
> >> unlikely to return an unsigned value. In fact without a word on the
> >> return value I'd expect such to return bool. And I'd also expect the
> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> >> for compare above I wonder whether come or all of the parameters
> >> should be pointers to const (please go over the entire patch and do
> >> constification wherever possible/sensible).
> >> 
> > Yes, you are right. I will change the function type to bool and add const
> > for not changed input pointers.
> > 
> > This function is used to check if the input cos id exceeds the cos_max. If 
> > yes
> > and the set value is not default value, we should return error. So, I think
> > to change the function name to exceed_cos_max(). How do you think?
> 
> Okay, except that I continue to think you mean "exceeds".
> "exceed_cos_max" to me is kind of a directive, not a predicate.
> 
How about "beyond"?

> >> > +#define MAX_FEAT_INFO_SIZE 8
> >> > +#define MAX_COS_REG_NUM  128
> >> 
> >> Are these numbers arbitrary, or based on something?
> >> 
> > MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
> > consider the extension for future feature.
> 
> In that case please use that sizeof() in the expression here.
> 
Sure. Thanks!

> > MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
> > for all PSR features so far.
> 
> "So far" makes me still wonder: Is this an architectural limit or one
> resulting from current (hardware) implementations. In the former
> case I don't think a comment is strictly needed, but in the latter
> case the choice should be explained.
> 
It is the latter case. I will add comment to explain it. Thanks!

> >> > +struct psr_socket_alloc_info {
> >> 
> >> I've yet to see whether the "alloc" in the name really makes sense.
> 
> And btw., having seen the entire patch I don't think this alloc_ infix
> is warranted both here and in the variable name.
> 
Ok, will consider to remove it in codes. Thanks!

> >> > +/* Common functions for supporting feature callback functions. */
> >> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
> >> > +{
> >> > +    if ( NULL == pHead || NULL == pTmp )
> >> > +        return;
> >> > +
> >> > +    while ( pHead->pNext )
> >> > +        pHead = pHead->pNext;
> >> > +
> >> > +    pHead->pNext = pTmp;
> >> > +}
> >> 
> >> Do you really need custom list management here?
> >> 
> > It seems xen list interfaces require the input list be a double linked list 
> > but
> > my list is a single linked list. Furthermore, I only need simple add to tail
> > function and free function. So I create custom list management functions.
> 
> Unless there's a strong need, I'd like you to go with what is there,
> or introduce _generic_ singly linked list management.
> 
I will check below list management interfaces to see if I can reuse them.
Thanks!
xen/include/xen/list.h

> >> > +static void free_feature(struct psr_socket_alloc_info *info)
> >> > +{
> >> > +    struct feat_list *pTmp;
> >> > +    struct feat_list *pPrev;
> >> > +
> >> > +    if ( NULL == info )
> >> > +        return;
> >> > +
> >> > +    if ( NULL == info->pFeat )
> >> > +        return;
> >> > +
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        pPrev = pTmp;
> >> > +        pTmp = pTmp->pNext;
> >> > +        clear_bit(pPrev->feature, &(info->features));
> >> > +        xfree(pPrev);
> >> > +        pPrev = NULL;
> >> > +    }
> >> > +}
> >> 
> >> Why does info->pFeat not get freed here?
> >> 
> > The info->pFeat is a redundant list head to facilitate list operations.
> > It is only freed when socket info is freed.
> > 
> > With this list head, the advantage is that we can insert/delete the first
> > element same as others without special operations.
> 
> But it results in less obvious code (and a basically pointless extra
> allocation). One more reason to use generic list management
> primitives.
> 
I will check below list management interfaces to see if I can reuse them.
Thanks!
xen/include/xen/list.h

> >> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
> >> > +                                unsigned int ecx, unsigned int edx,
> >> > +                                struct feat_list *pFeat,
> >> > +                                struct psr_socket_alloc_info *info)
> >> > +{
> >> > +    struct psr_cat_lvl_info l3_cat;
> >> > +    unsigned int socket;
> >> > +    uint64_t val;
> >> > +
> >> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
> >> > +        return;
> >> 
> >> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
> >> array is one with unsigned int as the base type, which doesn't get
> >> reflected here.
> >> 
> > I will use BUILD_BUG_ON() for the error case.
> > 
> > As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
> > than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
> > sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
> > should be good.
> 
> In fact the BUILD_BUG_ON() may not be needed if you follow the
> suggestion above.
> 
Ok, thanks!

> >> > +    }
> >> > +
> >> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
> >> 
> >> Aforementioned BUILD_BUG_ON() would probably bets be placed
> >> right ahead of this memcpy().
> >> 
> > Can we call BUILD_BUG_ON() at the top of the function to avoid executing 
> > above
> > process if the size is not right?
> 
> Well, the purpose of BUILD_BUG_ON() is what its name implies: It
> fails the _build_ if violated, so execution can't possibly come here
> (or in fact anywhere) in that case.
> 
Thanks for the explanation!

> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> >> > +                               unsigned int cos, bool_t *found)
> >> > +{
> >> > +    struct psr_cat_lvl_info cat_info;
> >> > +    uint64_t l3_def_cbm;
> >> > +
> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct 
> >> > psr_cat_lvl_info));
> >> 
> >> Already here I think this memcpy()ing gets unwieldy. Can't you
> >> simply make the structure field a union of all types of interest?
> >> 
> > Sorry that I am not very clear about your meaning to make a union. Do you 
> > mean
> > make feat_info a union? If so, it will lose the universality to cover all
> > features. Future feature may have different info.
> 
> Which is the purpose of a union - you'd simply add a new member
> then.
>
I guess your idea likes below. Right?
union {
    struct l3_info {
        union {
            uint64_t cbm;
            struct {
                uint64_t code;
                uint64_t data;
            };
        };

    };

    struct l2_info {
        uint64_t cbm;
    };
};
 
My original design is to use this feat_info cover all features and eliminate
the feature's specific properties. If using above union, we still need to
know the current feature is which when handles feat_info. That loses the
abstraction.

If my thought is not right, please correct me. Thanks!

> > I think I can replace the memcpy() to directly assign value to cat_info.
> 
> No, this copying (done in _many_ places) really should go away.
> 
I want to replace memcpy() to below codes.
cat_info.cbm_len = feat_info[0];
cat_info.cos_max = feat_info[1];

> >> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
> >> 
> >> I haven't figured yet at what times this function will get called, but
> >> I suspect it's inappropriate for it to log messages.
> >> 
> >> > +                return -ENOENT;
> >> > +            }
> >> > +            *found = 1;
> >> > +            return 2;
> >> 
> >> Who or what is 2?
> >> 
> > CDP occupies 2 members in mask[] array. So, it returns 2 to make codes 
> > outside
> > the function to skip to next feature's mask value.
> 
> I guess that's going to be understandable once the return values
> of the hooks get properly explained in their comments.
> 
Yes, I will explain return value by adding comments.

> >> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
> >> > +        *found = 1;
> >> > +    else
> >> > +        *found = 0;
> >> 
> >> Please use a simple assignment in cases like this, instead of if/else.
> >> 
> > Do you mean codes like below?
> > 
> > *found = 0;
> > if ( mask[0] == pFeat->cos_reg_val[cos] )
> >     *found = 1;
> 
> No.
> 
>     *found = (mask[0] == pFeat->cos_reg_val[cos]);
> 
Thank you!

> >> > +    /* CAT */
> >> > +    if ( old_cos > cat_info.cos_max )
> >> > +        mask[0] =  pFeat->cos_reg_val[0];
> >> > +    else
> >> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
> >> 
> >> Across the entire function - wouldn't it be easier to range check
> >> old_cos once up front, forcing it to zero if the check fails? That
> >> would reduce code size to a better readable amount. (And just to
> >> avoid any misunderstanding - comments like this also may apply
> >> to other functions; I'm not going to repeat them there.)
> >> 
> > Good idea! Thanks for the suggestion. I will check other functions like 
> > this.
> > 
> >> Also note that there are undue double blanks in here.
> >> 
> > Do you mean two spaces? I cannot see it in my original patch and codes.
> 
> It's even still visible in the quoted text above.
> 
Oh, really strange. Anyway, I need submit a new version. I will check this
carefully. Thanks!

> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> >> > +        mask[0] = m;
> >> 
> >> This overwriting behavior also looks quite strange to me. What's
> >> the purpose? And if this really is meant to be that way, why is
> >> this not (leaving aside the other suggested adjustment)
> >> 
> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
> >>         mask[0] = m;
> >>     else if ( old_cos > cat_info.cos_max )
> >>         mask[0] = pFeat->cos_reg_val[0];
> >>     else
> >>         mask[0] = pFeat->cos_reg_val[old_cos];
> >> 
> >> ?
> >> 
> > get_old_set_new() is used to do below two things:
> > 1. get old_cos register value of all supported features and
> > 2. set the new value for appointed feature.
> > 
> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
> > here.
> 
> Okay, that answers the "what" aspect, but leaves open _why_ it
> needs to be that way.
> 
A scenario here to help to understand _why_. 

Like the example for explaining get_old_set_new(), old_cos of the domain is 1.
Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
COS registers like below.

        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
        -------------------------------
L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
        -------------------------------

Then, mask array should be assembled in get_old_set_new() like below:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask(). We can find COS 2 is the matching one. 

If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 2)
same as the mask array, we can reuse this combination directly but not to
allocate a new COS ID. By this way, we can save the COS registers. You know,
there is limitation on COS registers number.

Of course, if we cannot find it in existing combinations, we will call
alloc_new_cos() to allocate a new COS to use.

> >> >  static inline void psr_assoc_init(void)
> >> >  {
> >> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
> >> > +    struct psr_socket_alloc_info *info;
> >> > +    unsigned int cos_max;
> >> > +    unsigned int socket;
> >> >  
> >> > -    if ( cat_socket_info )
> >> > +    if ( socket_alloc_info )
> >> >      {
> >> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
> >> > +        socket = cpu_to_socket(smp_processor_id());
> >> 
> >> What's the point of moving the variable declaration? Quite the
> >> other way around, the new variable you add could - afaict - all
> >> live in this more narrow scope. And info could presumably become
> >> a pointer to const.
> >> 
> > I remember it reports warning or error when compiling old codes.
> 
> I don't see why it would.
> 
Hmm, I will check this again. Thanks!

> >> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
> >> > -                        uint32_t *cos_max, uint32_t *flags)
> >> > +int psr_get_info(unsigned int socket, enum mask_type type,
> >> > +                 uint32_t *dat0, uint32_t *dat1,
> >> > +                 uint32_t *dat2)
> >> >  {
> >> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> >> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> >> > +    struct feat_list *pTmp;
> >> >  
> >> >      if ( IS_ERR(info) )
> >> >          return PTR_ERR(info);
> >> >  
> >> > -    *cbm_len = info->cbm_len;
> >> > -    *cos_max = info->cos_max;
> >> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
> >> > +       return -ENODEV;
> >> >  
> >> > -    *flags = 0;
> >> > -    if ( cdp_is_enabled(socket) )
> >> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
> >> 
> >> Considering that this is one of the generic ops: What guarantees
> >> that 3 elements will suffice going forward? I.e. why is this not an
> >> array?
> >> 
> > Good point. I consider this too. Per all features we know, three parameters
> > are sufficient. To simplify codes, I write the function like this. I think
> > we can make change if future feature has more requirements.
> 
> If you introduce an abstraction layer, you should design it with
> abstract requirements in mind, not with the requirements of
> currently known implementations.
> 
Ok, will consider to make a general interface.

> >> > -    case PSR_CBM_TYPE_L3_DATA:
> >> > -        if ( !cdp_enabled )
> >> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> >> > -        else
> >> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
> >> > -        break;
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        /* mask getting and setting order is same as feature list */
> >> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
> >> > +        if ( ret < 0 )
> >> > +            return ret;
> >> >  
> >> > -    default:
> >> > -        ASSERT_UNREACHABLE();
> >> > +        mask += ret;
> >> 
> >> With this - what ensures the array isn't going to be overrun?
> >> 
> > It seems you missed one line below. The while() loop traverse all features 
> > in
> > list. It will exit when the last member has been accessed.
> > 
> > +        pTmp = pTmp->pNext;
> 
> No, I didn't miss any line. I'm asking because there's nothing but
> programmer discipline keeping array size and list length match up.
> That's fragile.
> 
Sorry, I misunderstood your meaning. I will add a mechanism to check mask array
size. Thanks for pointing this out!

> >> And I'm also having some difficulty seeing how an array of outputs
> >> matches up with exactly one input (m). Can you explain this please?
> >> 
> > This while loop will traverse all features in list. It will get all features
> > old_cos value back. If the feature is the one to set new value, the input
> > new value (m) will replace the old_cos value. All the value are set into 
> > mask
> > array according the traversal order, meaning the same order of feature list
> > members, like L3 CAT as the first and L2 CAT as the second.
> 
> This gets us back to the question of _why_ you need this "get all but
> set only one" functionality. Plus, wasn't it that CDP requires two
> values per feature?
> 
Please check above comment about "_why_".

After operation in get_old_set_new(), CDP occupies two elements of mask array.

> >> > -static void cat_cpu_init(void)
> >> > +static void internal_cpu_init(void)
> >> 
> >> Is "internal_" really a useful prefix here?
> >> 
> > I wanted to express this is an internal function in psr.c to separate it 
> > from
> > psr_cpu_init().
> 
> The static keyword already says it's an internal one.
> 
Yes, I did not figure out a better function name so I used 'internal'. Will
consider another one if you think this is not appropriate.

> >> > +        pL3CAT = NULL;
> >> >  
> >> > -        spin_lock_init(&info->cbm_lock);
> >> > -
> >> > -        set_bit(socket, cat_socket_enable);
> >> > -
> >> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> >> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
> >> > -        {
> >> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
> >> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
> >> > -
> >> > -            /* We only write mask1 since mask0 is always all ones by 
> >> > default. */
> >> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 
> >> > 1);
> >> > -
> >> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> >> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << 
> >> > PSR_L3_QOS_CDP_ENABLE_BIT));
> >> > -
> >> > -            /* Cut half of cos_max when CDP is enabled. */
> >> > -            info->cos_max >>= 1;
> >> > -
> >> > -            set_bit(socket, cdp_socket_enable);
> >> > -        }
> >> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, 
> >> > cbm_len:%u, CDP:%s\n",
> >> > -               socket, info->cos_max, info->cbm_len,
> >> > -               cdp_is_enabled(socket) ? "on" : "off");
> >> > +        /* Initialize this feature according to CPUID. */
> >> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> >> > +        pTmp->ops = l3_cat_ops;
> >> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
> >> 
> >> This function can return without having consumed pTmp, which means
> >> you'd leak the memory in that case.
> >> 
> > Yes, right, this is by design.
> 
> I hope not. You shouldn't design in memory leaks.
> 
Ok, I will free the redundant buffer in free_feature(). Thanks!

> > pL3CAT/pL2CAT are allocated in
> > internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
> > pL2CAT will not be added into feature list. But when CPU on other socket up,
> > internal_cpu_prepare() will be called again and the pL2CAT will not be
> > allocated again (check if it is NULL). We can use the buffer has been 
> > allocated.
> > That means we maintain one redundant buffer to simplify the codes.
> 
> I don't think that's the case - pL3CAT gets set to NULL further up. But
> with you switching to generic list management, this bogus extra list
> element will go away anyway, and so will the extra allocation.
> 
Yes, you are right. For above scenario I described, pL3CAT will be allocated
again because it has been added into feature list and set to NULL. Finally,
it will be freed in free_feature(). So, pL3CAT is not a memory leak.

I am trying to explain the memory leak case, pL2CAT. Because CPU on socket0
does not support L2 CAT, it is not added into feature list and not set to
NULL. But if CPU on socket1 up and it supports L2 CAT, the pL2CAT allocated
before can be used directly without allocating a new buffer. Of course, if
there is no such case, e.g. all CPUs only support L3 CAT, the pL2CAT is leaked.

Per your suggestion that not allowing any memory leak, I will free it in
free_feature().

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.