[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...



Taking the advisory board list off the CC list: will summarize when we have 
more of a plan forward

On 03/07/2018, 11:47, "Juergen Gross" <jgross@xxxxxxxx> wrote:

    On 03/07/18 12:23, Lars Kurth wrote:
    > Combined reply to Jan and Roger
    > Lars
    > 
    > On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@xxxxxxxxxx> wrote:
    > 
    >     On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
    >     > We then had a discussion around why the positive benefits didn't 
materialize:
    >     > * Andrew and a few other believe that the model isn't broken, but 
that the issue is with how we 
    >     >   develop. In other words, moving to a 9 months model will *not* 
fix the underlying issues, but 
    >     >   merely provide an incentive not to fix them.
    >     > * Issues highlighted were:
    >     >   * 2-3 months stabilizing period is too long
    >     
    >     I think one of the goals with the 6 month release cycle was to shrink
    >     the stabilizing period, but it didn't turn that way, and the
    >     stabilizing period is quite similar with a 6 or a 9 month release
    >     cycle.
    > 
    > Right: we need to establish what the reasons are:
    > * One has to do with a race condition between security issues and the 
desire to cut a release which has issues fixed in it. If I remember correctly, 
that has in effect almost added a month to the last few releases (more to this 
one). 
    
    The only way to avoid that would be to not allow any security fixes to
    be included in the release the last few weeks before the planned release
    date. I don't think this is a good idea. I'd rather miss the planned
    release date.

This kind of comes back down partially to opening master. When we are at the 
stage that we are only waiting for security issues, we should already have 
opened master. Although in this case, we also had 
    
    BTW: the problem wasn't waiting for the security patches, but some
    fixes for those needed. And this is something you can never rule out.
    And waiting for the fixes meant new security fixes being ready...

That is of course true. And some of the side-channel attack mitigations are 
generally complex and large and introduce more risk than more traditional 
fixes. 
    
    > * One seems to have to do with issues with OSSTEST
    
    ... which in turn led to more security fixes being available.

Agreed: because we didn't release when we planned, another set of security 
fixes pushed out the release. 
    
    > * <Please add other reasons>
    
    We didn't look at the sporadic failing tests thoroughly enough. The
    hypercall buffer failure has been there for ages, a newer kernel just
    made it more probable. This would have saved us some weeks.

That is certainly something we could look at. It seems to me that there is a 
dynamic of "because there is too much noise/random issues/HW issues", we ignore 
OSSTEST too often. I am wondering whether there is a way of mapping some tests 
to maintainers. Maintainers should certainly care about test failures in their 
respective areas, but to make this practical, we need to have a way to map 
failures and also CC reports to the right people. We could also potentially use 
get_maintainers.pl on the patches which are being tested (aka the staging => 
master transition), but we would need to know that a test was "clean" before. 
Maybe we need to build in an effort to deal with the sporadically failing 
tests: e.g. a commit moratorium until we get to a better base state.

I also think that from a mere psychological viewpoint, having some test 
capability at patch posting time and a patchbot rejecting a patch, would change 
the contribution dynamic significantly from a psychological viewpoint. In other 
words, it would make dealing with quality issues part of the contribution 
process, which kind of often seems to be deferred until commit time and/or 
release hardening time. Just a thought.

Also, coming back to Jan's bandwidth issue: if we had a set of more generic 
tests that can be offloaded into a cloud instance (e.g. via testing on QEMU), 
then we could reserve OSSTEST for tests which require hardware, thus 
potentially reducing bottlenecks. I am also wondering whether the bottleneck we 
are seeing is caused by the lack of good Arm test hardware (aka is that the 
critical path for the entire system): if so, maybe the two things can somehow 
be de-coupled.

These ideas are fairly half-baked right now, so open up for discussion. I 
wanted to get a good amount of input before we discuss at the community call.

Regards
Lars
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.