[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen 4.6 retrospective] [bad] review load near freeze point



Jan,

> On 12 Aug 2015, at 09:00, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> 
>>>> On 04.08.15 at 14:52, <lars.kurth.xen@xxxxxxxxx> wrote:
>   = Issue / Observation =
> 
> Maybe my memory regarding the 4.5 release has faded, but I'm
> having the impression that 4.6 was quite a bit worse. This was at
> parts due to the sheer number of patch series and their sizes,
> but also because frequently they took quite a bit more than just a
> couple of iterations.

it definitely feels as if 4.6 was worse than 4.5 (and other releases). You may 
remember that at last year's summit I highlighted that the average from first 
post on xen-devel@ to commit has increased. Moving from the soft freeze we used 
to have to the hard freeze, IMHO exacerbated the "stress" for both reviewers 
and contributors, because of the shorter time window.

Anyway, Stefano run some of the same tools that we ran last year on this 
release cycle. This does show that the situation has become worse. Note that 
there is a "gap" from Sept - Dec 14. Didn't have time to run the scripts: they 
take forever to complete

Attachment: PastedGraphic-1.pdf
Description: Adobe PDF document


If you look at the average time from first post on xen-devel@ to commit, the 
data looks even worse

July 2013 - Jan 2014, Average to Commit = 28.8 days
Feb 2014 - Aug 2014, Average to Commit =  31.5 days 
Jan 2015 - Jul 2015, Average to Commit =  70 days

Note that the 70 days are caused by about 40 patches which took more than 1, 2 
or 3 years to commit. There are a number of reasons why a series may take > 1 
year: 
* One possible reason is excessively long review times, maybe combined with 
long wait times in between reviews
* Another one is that a series may have been parked by the contributor and 
picked up later, sometimes several times

If you remove the 1+ year outliers from the stats, you get...

July 2013 - Jan 2014, Average to Commit Adjusted = 26.4 days
Feb 2014 - Aug 2014, Average to Commit Adjusted =  26.7 days 
Jan 2015 - Jul 2015, Average to Commit Adjusted =  34.7 days

This does show that we still do have a significant issue

> Some of this is (as mentioned on past discussions) caused by too
> little review involvement of non-maintainers (the need of which
> exists because of limited bandwidth maintainers have for doing
> reviews), but I think there are also quality issues here (see below).

I definitely agree that we need more review involvement of non-maintainers (see 
later)

I did a bit more number crunching to try and get a clearer handle on the root 
causes of what we have seen in this release cycle. Unfortunately, with existing 
open source data mining tools it is near-impossible to model an e-mail based 
review process. With the tools I have today I can't identify whether there are 
clear patterns which contribute to the issues we are seeing with reviews. Of 
course individual reviewers and contributors will have their own experiences 
and get a sense of what is going on: but at least I would like to understand 
more clearly what is going on, backed up by data, before we make any drastic 
changes which could be having unintended consequences.

I asked to Advisory Board to fund Bitergia to help us develop some 
tools/database support to model our review process. This was approved 3 weeks 
ago. I a started working with them, and it appears that they can model our 
e-mail based review process (it assumes that we all use "git send-email" / 
patch bomb, which I think is almost always true). This will take until end of 
Sept/beginning of October to complete because of summer holidays in Spain.

In any case, I did some work to try and understand what is happening, presented 
it to the Advisory Board and refined it after a couple of conversations with 
Bitergia. 

First I looked at the number of patches we typically accept per year: in the 
last 5 years we have accepted between 2000 - 2200 patches and this year looks 
similar. 

Then I looked at who does reviews (the 2015 figures are partial)

Attachment: PastedGraphic-3.pdf
Description: Adobe PDF document

 
This shows that we have a small number of developers (about 14) who do almost 
all of the reviews. This clearly is an issue. The table shows the number of 
developers doing more/less than x% of reviews. This would be become a problem, 
if we saw something that upsets the status quo (e.g. more contributions, more 
complex contributions, more communication issues, ...).

So I started looking at patches posted and comments on patches posted (without 
being able to tie patch series together, and tie versions of series together). 
Note that the 2015 figures count Jan to Aug 11th - I added a projection column 
based on past performance.

Attachment: PastedGraphic-4.pdf
Description: Adobe PDF document


This is actually staggering, but also interesting
* The first observation is that on average for the past 3 year each patch 
version had on average 2.1 replies and that this ratio is fairly stable
* The number of patches posted (and/or re-posted) has crept up slowly over the 
last few years
* This year: the number of patches posted (and/or re-posted) will almost be 
twice that of 2014. 

So it appears that either 
a) we have seen *a lot* of new contributions, OR 
b) on average the number of iterations to get patches into xen.git has almost 
doubled for a number of reasons (somehow this doesn't feel right) 

Most likely both is happening. In any case the effect of both of these would be 
an increased back-log, while our capacity to review code has remained static, 
which is causing the issues we are seeing with reviews.

Given that the average number of replies to patches stayed fairly constant, I 
thought I'd look at the ratio of patches (including re-posts) on the list 
divided by patches that made it into xen.git - and I also thought I'd do this 
for a similar project such as QEMU. We had a go at Linux as well, but it 
appears that Linux is a lot more de-centralized today using lots of different 
mailing lists: to cut things short, we didn't know whether the data was sound.

Attachment: PastedGraphic-5.pdf
Description: Adobe PDF document


In any case, this appears to show that a ratio of 2-3 was normal for us and 
QEMU in the last few year (note that there is no research which shows an 
optimum), when we had no problems. It also shows that both Xen and Qemu have 
started to see similar issues in 2015 (albeit ours is more severe). Again, this 
doesn't allow us to pinpoint where exactly our problem is.

What we are seeing could be due to ...

A) Increasing number of contributions 
Not enough review capacity to support growth; we know that review capacity has 
remained stable 
=> Increasing back-log (aka ongoing reviews on xendevel@) 

B) Increasing number of review cycles per patch/patch series 

Note: we know that the average number of review comments per patch version is 
stable. Also the data could be *significantly* skewed when large patch series 
are reposted frequently 

Reasons for this could be:
- More disagreements amongst maintainers, reviewers & contributors 
- Lower quality contributions, requiring more review cycles 
- More complex and larger contributions, requiring more review cycles 
- Increasing standards required to get code up-streamed (aka quality) 
- ...

=> Increasing back-log (aka ongoing reviews on xendevel@)

For the latter, with appropriate tooling, we should be able to see patterns in 
review process data. This would allow us for example to correlate size of patch 
with review times/number of iterations, identify whether individuals struggle 
in some form - and then help them, correlate areas of code with ... review 
times/number of iterations, ... Right now I can't do any of this: I can only 
ever be reactive towards issues and crises.

C) A combination of all/some of the he above

If the issue is mostly caused by A), we can only fix this with more reviewers. 
If the issue is B) there may actions we can come up with to buy us some time to 
take off the pressure. Also in some cases, more reviewers could be counteractive

In any case, I have been working with a number of vendors who say they are 
committed to helping out with code reviews. But we all know that training 
reviewers (and ultimately maintainers) takes time and commitment. It is also 
not something everyone is comfortable doing.
 
>   = Possible Solution / Improvement =
> 
> While I agree that some aspects (like coding style issues) could be
> taken care of by requiring submitters to have their patches be
> checked by (not yet existing) tools/scripts,

This would be one of the measures which may buy us some time

> I think much of this is a
> problem of enforcing discipline on oneself: Sloppiness in style,
> according to my observation, frequently is accompanied by
> sloppiness in the actual coding.

I am not sure whether this is actually the primary issue. Everyone will most 
likely have a different take on the situation based on their personal 
experience as well as their personal level of expectations and experience. When 
I talk to reviewers and maintainers, there does not appear to be a single issue 
that sticks out. 

> While I think that suggestion may
> be considered harsh, I'd like to throw up for discussion a cut off
> on the number of iterations that may be submitted for a particular
> series within a certain time (with a high threshold on exceeding
> the limit for really minor adjustments). I would hope that making
> people aware that the number of tries they have is limited, they
> would start self-reviewing their changes more consciously.

Can you elaborate? I am not sure I fully understand what you propose. Also, 
there is a risk that this becomes complex. 

I would feel much more inclined and comfortable to
* get better tooling to help us understand the review process
* then actively work with people that are struggling 
** IMHO we have done this fairly successfully with design discussions

> And yes (to preempt respective responses), I'm aware that such a
> rule may raise the bar for first time contributors. But those often
> don't contribute large patches/series, and I don't think such a rule
> would need to be enforced as strictly on small contributions (plus
> newcomers could be granted some [limited] slack).

One of the challenges that newcomers often have is tracking review comments. 
Maybe we can put together a number of "recipes" which work - such as 
http://article.gmane.org/gmane.comp.emulators.xen.devel/254965 from Ian C - and 
include them into the training material at 
http://wiki.xenproject.org/wiki/Category:Developers

Maybe we should also start pointing people to training material, if the same 
issues come up with the same person. 

> And of course, people should submit their code early, and follow
> up diligently to review comments. Submitting a v1 half a year
> before the freeze, and being at say v4 around freeze time (and
> then perhaps even demanding the code to go in because "it was
> submitted early") is about as bad as starting work on a big series
> a month ahead of freeze.

One thing which we have seen this year, and which I believe you refer to, is an 
increase where we saw designs + implementations in the same release cycle, with 
the hope of getting it all done. Necessarily the implementation can't start 
until after the the design is agreed. 

Part of the issue here IMHO is that a 9-10 months release cadence is too long. 
The whole industry is moving at a very fast pace, which creates significant 
business pressure to get features into a specific release. If we were going too 
achieve say 6 months, then that pressure would mostly go away, as the cost of 
missing a release is less than now.

In other words
a) the pressure would be less, and we would see fewer requests for exceptions - 
we could even consider not to allow freeze exceptions
b) the development window would be too short to do a design & implementation of 
a larger/complex feature - so that problem would go away
c) we would also need more predictable release dates (let's say March and 
September) - right now, release schedules are only visibility up to the next 
release. That means that vendors/contributors can't plan for the longer term. 
This will lead most managers and project managers who work with developers to 
push them to get stuff into the release time-frame that is known.

However that should probably be a separate discussion/thread.

Sorry: that mail was a lot longer than I thought it would be

Regards
Lars

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.