Xen project Mailing List

Re: [Xen-devel] [xen 4.6 retrospective] [bad] review load near freeze point

From: Lars Kurth <lars.kurth.xen@xxxxxxxxx>

Date: Wed, 12 Aug 2015 12:34:56 +0100

Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 12 Aug 2015 11:35:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Jan, > On 12 Aug 2015, at 09:00, Jan Beulich <jbeulich@xxxxxxxx> wrote: > >>>> On 04.08.15 at 14:52, <lars.kurth.xen@xxxxxxxxx> wrote: > = Issue / Observation = > > Maybe my memory regarding the 4.5 release has faded, but I'm > having the impression that 4.6 was quite a bit worse. This was at > parts due to the sheer number of patch series and their sizes, > but also because frequently they took quite a bit more than just a > couple of iterations. it definitely feels as if 4.6 was worse than 4.5 (and other releases). You may remember that at last year's summit I highlighted that the average from first post on xen-devel@ to commit has increased. Moving from the soft freeze we used to have to the hard freeze, IMHO exacerbated the "stress" for both reviewers and contributors, because of the shorter time window. Anyway, Stefano run some of the same tools that we ran last year on this release cycle. This does show that the situation has become worse. Note that there is a "gap" from Sept - Dec 14. Didn't have time to run the scripts: they take forever to complete

Attachment: PastedGraphic-1.pdf
Description: Adobe PDF document

If you look at the average time from first post on xen-devel@ to commit, the data looks even worse July 2013 - Jan 2014, Average to Commit = 28.8 days Feb 2014 - Aug 2014, Average to Commit = 31.5 days Jan 2015 - Jul 2015, Average to Commit = 70 days Note that the 70 days are caused by about 40 patches which took more than 1, 2 or 3 years to commit. There are a number of reasons why a series may take > 1 year: * One possible reason is excessively long review times, maybe combined with long wait times in between reviews * Another one is that a series may have been parked by the contributor and picked up later, sometimes several times If you remove the 1+ year outliers from the stats, you get... July 2013 - Jan 2014, Average to Commit Adjusted = 26.4 days Feb 2014 - Aug 2014, Average to Commit Adjusted = 26.7 days Jan 2015 - Jul 2015, Average to Commit Adjusted = 34.7 days This does show that we still do have a significant issue > Some of this is (as mentioned on past discussions) caused by too > little review involvement of non-maintainers (the need of which > exists because of limited bandwidth maintainers have for doing > reviews), but I think there are also quality issues here (see below). I definitely agree that we need more review involvement of non-maintainers (see later) I did a bit more number crunching to try and get a clearer handle on the root causes of what we have seen in this release cycle. Unfortunately, with existing open source data mining tools it is near-impossible to model an e-mail based review process. With the tools I have today I can't identify whether there are clear patterns which contribute to the issues we are seeing with reviews. Of course individual reviewers and contributors will have their own experiences and get a sense of what is going on: but at least I would like to understand more clearly what is going on, backed up by data, before we make any drastic changes which could be having unintended consequences. I asked to Advisory Board to fund Bitergia to help us develop some tools/database support to model our review process. This was approved 3 weeks ago. I a started working with them, and it appears that they can model our e-mail based review process (it assumes that we all use "git send-email" / patch bomb, which I think is almost always true). This will take until end of Sept/beginning of October to complete because of summer holidays in Spain. In any case, I did some work to try and understand what is happening, presented it to the Advisory Board and refined it after a couple of conversations with Bitergia. First I looked at the number of patches we typically accept per year: in the last 5 years we have accepted between 2000 - 2200 patches and this year looks similar. Then I looked at who does reviews (the 2015 figures are partial)

Attachment: PastedGraphic-3.pdf
Description: Adobe PDF document

This shows that we have a small number of developers (about 14) who do almost all of the reviews. This clearly is an issue. The table shows the number of developers doing more/less than x% of reviews. This would be become a problem, if we saw something that upsets the status quo (e.g. more contributions, more complex contributions, more communication issues, ...). So I started looking at patches posted and comments on patches posted (without being able to tie patch series together, and tie versions of series together). Note that the 2015 figures count Jan to Aug 11th - I added a projection column based on past performance.

Attachment: PastedGraphic-4.pdf
Description: Adobe PDF document

This is actually staggering, but also interesting * The first observation is that on average for the past 3 year each patch version had on average 2.1 replies and that this ratio is fairly stable * The number of patches posted (and/or re-posted) has crept up slowly over the last few years * This year: the number of patches posted (and/or re-posted) will almost be twice that of 2014. So it appears that either a) we have seen *a lot* of new contributions, OR b) on average the number of iterations to get patches into xen.git has almost doubled for a number of reasons (somehow this doesn't feel right) Most likely both is happening. In any case the effect of both of these would be an increased back-log, while our capacity to review code has remained static, which is causing the issues we are seeing with reviews. Given that the average number of replies to patches stayed fairly constant, I thought I'd look at the ratio of patches (including re-posts) on the list divided by patches that made it into xen.git - and I also thought I'd do this for a similar project such as QEMU. We had a go at Linux as well, but it appears that Linux is a lot more de-centralized today using lots of different mailing lists: to cut things short, we didn't know whether the data was sound.

Attachment: PastedGraphic-5.pdf
Description: Adobe PDF document

In any case, this appears to show that a ratio of 2-3 was normal for us and QEMU in the last few year (note that there is no research which shows an optimum), when we had no problems. It also shows that both Xen and Qemu have started to see similar issues in 2015 (albeit ours is more severe). Again, this doesn't allow us to pinpoint where exactly our problem is. What we are seeing could be due to ... A) Increasing number of contributions Not enough review capacity to support growth; we know that review capacity has remained stable => Increasing back-log (aka ongoing reviews on xendevel@) B) Increasing number of review cycles per patch/patch series Note: we know that the average number of review comments per patch version is stable. Also the data could be *significantly* skewed when large patch series are reposted frequently Reasons for this could be: - More disagreements amongst maintainers, reviewers & contributors - Lower quality contributions, requiring more review cycles - More complex and larger contributions, requiring more review cycles - Increasing standards required to get code up-streamed (aka quality) - ... => Increasing back-log (aka ongoing reviews on xendevel@) For the latter, with appropriate tooling, we should be able to see patterns in review process data. This would allow us for example to correlate size of patch with review times/number of iterations, identify whether individuals struggle in some form - and then help them, correlate areas of code with ... review times/number of iterations, ... Right now I can't do any of this: I can only ever be reactive towards issues and crises. C) A combination of all/some of the he above If the issue is mostly caused by A), we can only fix this with more reviewers. If the issue is B) there may actions we can come up with to buy us some time to take off the pressure. Also in some cases, more reviewers could be counteractive In any case, I have been working with a number of vendors who say they are committed to helping out with code reviews. But we all know that training reviewers (and ultimately maintainers) takes time and commitment. It is also not something everyone is comfortable doing. > = Possible Solution / Improvement = > > While I agree that some aspects (like coding style issues) could be > taken care of by requiring submitters to have their patches be > checked by (not yet existing) tools/scripts, This would be one of the measures which may buy us some time > I think much of this is a > problem of enforcing discipline on oneself: Sloppiness in style, > according to my observation, frequently is accompanied by > sloppiness in the actual coding. I am not sure whether this is actually the primary issue. Everyone will most likely have a different take on the situation based on their personal experience as well as their personal level of expectations and experience. When I talk to reviewers and maintainers, there does not appear to be a single issue that sticks out. > While I think that suggestion may > be considered harsh, I'd like to throw up for discussion a cut off > on the number of iterations that may be submitted for a particular > series within a certain time (with a high threshold on exceeding > the limit for really minor adjustments). I would hope that making > people aware that the number of tries they have is limited, they > would start self-reviewing their changes more consciously. Can you elaborate? I am not sure I fully understand what you propose. Also, there is a risk that this becomes complex. I would feel much more inclined and comfortable to * get better tooling to help us understand the review process * then actively work with people that are struggling ** IMHO we have done this fairly successfully with design discussions > And yes (to preempt respective responses), I'm aware that such a > rule may raise the bar for first time contributors. But those often > don't contribute large patches/series, and I don't think such a rule > would need to be enforced as strictly on small contributions (plus > newcomers could be granted some [limited] slack). One of the challenges that newcomers often have is tracking review comments. Maybe we can put together a number of "recipes" which work - such as http://article.gmane.org/gmane.comp.emulators.xen.devel/254965 from Ian C - and include them into the training material at http://wiki.xenproject.org/wiki/Category:Developers Maybe we should also start pointing people to training material, if the same issues come up with the same person. > And of course, people should submit their code early, and follow > up diligently to review comments. Submitting a v1 half a year > before the freeze, and being at say v4 around freeze time (and > then perhaps even demanding the code to go in because "it was > submitted early") is about as bad as starting work on a big series > a month ahead of freeze. One thing which we have seen this year, and which I believe you refer to, is an increase where we saw designs + implementations in the same release cycle, with the hope of getting it all done. Necessarily the implementation can't start until after the the design is agreed. Part of the issue here IMHO is that a 9-10 months release cadence is too long. The whole industry is moving at a very fast pace, which creates significant business pressure to get features into a specific release. If we were going too achieve say 6 months, then that pressure would mostly go away, as the cost of missing a release is less than now. In other words a) the pressure would be less, and we would see fewer requests for exceptions - we could even consider not to allow freeze exceptions b) the development window would be too short to do a design & implementation of a larger/complex feature - so that problem would go away c) we would also need more predictable release dates (let's say March and September) - right now, release schedules are only visibility up to the next release. That means that vendors/contributors can't plan for the longer term. This will lead most managers and project managers who work with developers to push them to get stuff into the release time-frame that is known. However that should probably be a separate discussion/thread. Sorry: that mail was a lot longer than I thought it would be Regards Lars

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.