[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Prototype Code Review Dashboards (input required)



Daniel, (added Jan - search for @Jan)

thanks for the feedback

> On 2 Mar 2016, at 22:45, Daniel Izquierdo <dizquierdo@xxxxxxxxxxxx> wrote:
> 
> On 01/03/16 18:04, Lars Kurth wrote:
>> Daniel, Jesus,
>> 
>> I am going to break my comments down into different sections to make this 
>> more consumable. Let's focus on the A1-A3 use-cases in this mail.
>> 
>> First I wanted to start of with some questions about definitions, as I am 
>> seeing some discrepancies in some of the data shown and am trying to 
>> understand exactly what the data means, and then have a look at the 
>> individual sections.
>> 
>> General, to the Xen-A1.A2.A3 dash board
>> - I played with some filters and noticed some oddities, e.g. if I filter on 
>> "merged: 0" all views change as expected
>> - If I filter on "merged: 1", a lot of widgets show no data. Is this showing 
>> that there is an issue with the data somewhere?
>> - I see similar issues with other filters, e.g. 'emailtype: "patch"'
> 
> In order to bring some context to the dataset, ElasticSearch was initially 
> used for parsing Apache logs. That means that data should be formatted as 'a 
> row = an event'.
> 
> In this dataset there are several events that are defined by the field 
> 'EmailType'. 'patchserie', 'patch', 'comment', 'flag'. And then, depending on 
> that 'EmailType', each of the columns may have some meaning or some other.

That makes sense and is quite important background information and different 
from the original. I think for people to be able to build sophisticated search 
queries - once we have finalised all the fields - we should probably document 
this and the exact meaning of fields (also see below). I was thinking - and I 
volunteer - to put together an extra view that acts like a legend. With the 
editable version of the dashboard, that should be easy to do.

I think we either
- need to look at the panels and come up with some cleaner and clearer 
terminology 
- or, split the panels and make sure that only views which refer  
let me think about it and make a proposal.

> This structure uses the 'EmailType' as the central key where the rest of the 
> columns provide extra syntax. For instance, post_ack_comment field only makes 
> sense for the EmailType:comment.

Now I understand.

> Coming back to the comments:
> 
> There are fields that apply only to specific type of events. In the case of 
> 'merge' this applies only in the case of patches. merge:1 would filter 
> patches that are merged (so the rest of the information is literally removed 
> as they are not merged). If we filter by merge:0, these are the rest of the 
> information (even including flags).

OK. I think this is very confusing because when you look at the tables at the 3 
tables at the bottom, all of them show the same fields. And the default for an 
undefined field seems to be 0. Is there any way to make sure that undefined 
fields are set to na or -1 or similar. That would make things clearer. 

> Thus, using the filter merge:1 leads to having info only related to 'patches' 
> in this case.
> 
> As this panel shows information about other types than 'patch', if you filter 
> by some 'emailtype' such as 'patch' then you're focusing only on patches data 
> and this will display the merged and not merged ones.
> 
> In order to improve this, we can either create a panel for type of analysis 
> (one panel for patches, one for comments, etc). Or we can play with adding 
> the 'merge' field to any flag, patchserie, patch and comment whose patch was 
> merged at some point. The latter may sound a bit weird as a 'merged' status 
> does not apply to a flag (Reviewed-by) for instance.

Given that we have a lot of stuff on the panels, we should probably go for the 
1st option. But there may be corner cases, when we may want to go for the 
second. When I go to the editable dashboard version, how can I tell which 
EmailType is used. I couldn't figure that out.

> 
>>> On 1 Mar 2016, at 13:53, Lars Kurth <lars.kurth.xen@xxxxxxxxx> wrote:
>>> 
>>> Case of Study A.1: Identify top reviewers (for both individuals and 
>>> companies)
>>> ------------------------------------------------------------------------------
>>> 
>>> Goal: Highlight review contributions - ability to use the data to "reward" 
>>> review contributions and encourage more "review" contributions
>> The widgets in question are:
>> - Evolution 'Reviewed by-flag' (no patchseries, no patches)
>> - What is the difference to Evolution of patches
>> - Top People/Domains Reviewing patches
>> 
>> Q1: Are this the reviewed-by flags?
> 
> They are only the Reviewed-by flags.
> 
>> Q2: What is the scope? Do the number count
>> - the # files someone reviewed
>> - the # patches someone reviewed
>> - the # series someone reviewed
> 
> The number counts the number of reviews accomplished by a developer or by a 
> domain. A review is accomplished when the flag 'reviewed-by' is detected in a 
> email replying a patch.
> 
> If a developer reviews several patches or several versions of the same patch, 
> each of those is counted as a different review.

So this is basically the number of reviewed by flags aggregated per developer. 

>> If a reviewer is solely defined by the reviewed-by tags, the data does not 
>> provide a correct picture.
> This is how this works so far.
> 
>> It may be better to use the following definition (although, others may 
>> disagree)
>> A reviewer is someone who did one of the following for a patch or series:
>> - Added a reviewed-by flag
>> - Added an acked-by flag (maintainers tend to use acked-by)
>> - Made a comment, but is NOT the author
> 
> We can update that definition. Do we want to have extra discussion with this 
> respect?

I think that would be more correct. In particular, as we still will be able 
@Jan, what is your view? This use-case was primarily created because of 

> 
>> Related to that use-case are also the following widgets
>> - Evolution of Comments Activity
>> - Top People/Domains Commenting (which also contain post-ACK comments and 
>> are thus also related to A.3)
>> - Evolution of Email activity
>> 
>> Q3: Again, the scope isn't quite clear
> 
> This is the number of comments replying to a patch. A comment is defined as 
> an email reply to a patch.
> 
>> Q4: The figures are higher than those in "People/Domains Reviewing patches". 
>> Are comments on people's own patches included (these would be replies to the 
>> comments of others)
> 
> I should check the  last question. I'd say that we're including them, as they 
> are 'comments' to a patch. You can indeed comment your own patches :). But we 
> can deal with this if this does not make sense.

Given the use-case of highlighting people commenting on the patches of others, 
we should probably not count comments to your own patches. 


>>> Possible places where this could be added : a separate table which is not 
>>> time based, but can be filtered by time
>>> Possible metrics: number of review comments by person, number of 
>>> patches/patch series a person is actively commenting on, number of ACKS and 
>>> reviewed by tags submitted by person
>>> 
>>> Actions: we could try to have a panel only focused on rewarding people and 
>>> only based on information per domain and individual with some large table 
>>> at the end with all of the reviews.
>> I don't have a strong view, but I think that there are too many tables and 
>> graphs and that they could possibly be consolidated. For example
>> 
>> - The "Top People/Domains Reviewing Patches" views are a subset of the 
>> imbalance tables. In particular exporting the data would be useful for me.
>>   - Personally, I don't mind just having the superset.
>>   - In particular, as the tables are sortable
>>   - And the only filters that can be added are sender and sender_domain
> 
> Each set of evolutionary chart and the two tables are based on a different 
> 'search' in ElasticSearch that is in the end like a 'view' in sql.
> So the consolidation may require extra work here not that easy to change, 
> although we can discuss about this. This is divided by use cases, so 
> consolidating this may mean consolidate the use cases in first place.

Ah OK. This is not such a big deal. I think we may need to consolidate a little 
bit though, looking at the data. There seems to be a lot of duplicate 
information.

>> - Depending on the answer of Q2, the "Evolution 'Reviewed by-flag'..." and 
>> "Evolution of patches..." could probably be shown as a combined bar graph
>>   - If they have the same scope, then a bar chart may be better
>>   - I will probably have some further thoughts about this, based on the 
>> answer.
> 
> In the case of patches, this is the number of patches. The several versions 
> of a patch are counted as a new patch in this case.

OK: just brings me back to that we need a legend with clear definitions and 
consistently apply terminology. Is this also true for the tables? 

> We assumed that patches re-sent were those that required extra work, so they 
> could be counted as a new patch. Regarding to the reviews, this is a similar 
> approach, tables and the evolutionary chart shows the number of total reviews 
> that were detailed in the specific email.

OK: I think this is fine


>>> Case of Study A.2: Identify imbalances between reviewing and contribution
>>> ------------------------------------------------------------------------
>>> 
>>> Context: We suspect that we have some imbalances in the community, aka
>>> - some companies/individuals which primarily commit patches, but not review 
>>> code
>>> - some companies/individuals which primarily comment on patches, but do not 
>>> write code
>>> - some which do both
>> I think this works quite fine, and the data is consistent with A.1
>> The only comment I would have is that we should calculate the Balance by 
>> Reviews - Patches posted
> 
> We can changed that.
> 
>> 
>>> Goal: Highlight imbalances
>>> 
>>> Possible places where this could be added : a separate table which is not 
>>> time based, but can be filtered by time or some histogram
>>> Possible metrics: number of patches/patch series a person/company is 
>>> actively commenting on divided by number of patches/patch series a 
>>> person/company is actively submitting
>>> 
>>> Actions: build this dashboard with pre-processed information about the 
>>> balances.
>> This seems to be quite good: the only observation is that authors (and 
>> domains) are case sensitive and probably should be normalised
>> There also seem to be entries such as "Jan Beulich 
>> [mailto:JBeulich@xxxxxxxx]";
>> 
>> I also found lots of entries, with multiple e-mail addresses such as
>> - "andrew.cooper3@xxxxxxxxxx; Ian.Campbell@xxxxxxxxxx; wei.liu2@xxxxxxxxxx; 
>> ian.jackson@xxxxxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx; Dong, Eddie; 
>> Nakajima, Jun; Tian, Kevin; xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx"
>> - Most of these have (0,0,0) and can probably be removed, if that's possible
> 
> This needs some cleaning actions, thanks for the pointer, I'll have a look at 
> this!

Before you start, I wanted to seed the following idea. I noticed that a lot of 
other CC's are in those lists also. Remember, we were talking about increasing 
the matching rate of xen-devel@ e-mail traffic and git repos better. One of the 
key outstanding issues was that a significant portion of patch reviews posted 
xen-devel are in fact for QEMU, Linux, FreeBSD, NetBSD, etc. All of them follow 
a similar review process as Xen, but to identify the actual back-log in Xen - 
and to not have to deal with a lot of extra complexity - we originally decided 
to filter these out.

But that was before I understood how powerful Kibana was: an alternative may be 
to add an extra field to comments, patches, ... (I think that field would apply 
to EmailType=*) that deals with CC'ed mailing lists, such as 
- qemu-devel@xxxxxxxxxx
- linux-api@xxxxxxxxxxxxxxx
- linux-arch@xxxxxxxxxxxxxxx
  ... 
- linuxppc-dev@xxxxxxxxxxxxxxxx
- port-xen@xxxxxxxxxx
- (this may not be a complete list)

and put patchseries, patches, comments, flags into different buckets via an 
extra flag (e.g. project: xen-only, linux, netbsd, freebsd, ...) and allow to 
filter (e.g. via a pie chart). This would mean that we could use a filter, 
instead of complex matching, to cut the data. The default could be 'project: 
"xen-only"'. 

>>> Case of Study A.3: identify post ACK-commenting on patches
>>> ----------------------------------------------------------
>>> 
>>> Background: We suspect that post-ACK commenting on patches may be a key 
>>> issue in our community. Post-ACK comments would be an indicator for 
>>> something having gone wrong in the review process.
>>> Goal:
>>> - Identify people and companies which persistently comment post ACK
>>> - Potentially this could be very powerful, if we had a widget such as a pie 
>>> chart which shows the proportion of patches/patch series with no post-ACK 
>>> comments vs. with post-ACK comments
>> I think that
>> - Evolution of Comments Activity
>> - Top People/Domains Commenting (which also contain post-ACK comments and 
>> are thus also related to A.3)
>> - Evolution of Email activity
>> - _emailtype views : not quite sure what the difference is
>> 
>> serves two purposes: it contributes to A.1, but also to A.3
>> 
>> Related to this seem to be
>> - Evolution of Flags
>> - Top People/Domain analysis
>> - Evolution of Patch series
>> 
>> Q5: What are All Flags? This seems awfully high: maybe signed off?
>> Something doesn't quite seem to add up, e.g. if I look at March 16th, I get
>> - Patch e-mails = 1851
>> - All flags = 1533
>> - Reviewed by = 117
>> - Acked by = 150
>> - Comments = 480
>>  All flags seems to be tracking Patch e-mails
> 
> Flags are found in some replies to some patches. When a flag is found in an 
> email, there's a new entry in the list of flags. If there were three flags 
> (eg signed-off, cc and reported-by) those are three new entries in the table.
> In this case, 'all flags' is the aggregation of all of the flags found. Each 
> of those special flags counts.
> 
> Said this, we can reduce the flags in the dataset down to the number of flags 
> of interest for the analysis: reviewed-by and acked-by. This flags info 
> contains richer information.

Is there an easy way to identify the flags which are available? I can then 
decide which ones we care about.

>> Q6: Same question about the scope. The background is whether we can 
>> consolidate some of these, and what we don't need.
>> 
>>> NOTE: Need to check the data. It seems there are many post-ACK comments in 
>>> a 5 year view, but none in the last year. That seems wrong.
>> However it seems that the data for post-ACK comments may be wrong: in the 
>> last year, there were 0 post-ACK comments. That is clearly wrong.
> 
> This sounds like a bug. I'll review this. Thanks again for this pointer.

Thanks.

>>> - AND if that could be used to see how all the other data was different if 
>>> one or the other were selected
>>> - In addition being able to get a table of people/companies which shows 
>>> data by person/company such as: #of comments post-ACK, #of patches/series 
>>> impacted by post-ACK comments and then being able to get to those series 
>>> would be incredible
>> This seems to not be there yet. If there was a filter (manual or through 
>> some widget) that would be good. But I guess we need to look at the data 
>> first.
> 
> With the current data and for this specific case, a work around would be to 
> write in the search box: "sender_domain:citrix.com AND post_ack_comment:1". 
> This provides a list of comments in one of the bottom tables ( so far wrong 
> data as you mentioned regarding to no existing post-ack comments in 2015). 
> There you can see that there's a patchserie_id that can be later used for 
> tracking those patchseries affected by post_ack_comments.

Let me try it, once you fixed the bug.

Regards
Lars
P.S.: The more I play with this, the more I like it. 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.