The Xen Project’s code contributions have been growing 10% a year. However, during this period of growth, the code review process became much slower, leading to issues in the community. Code review in the Xen Project—as in many other FOSS projects—is performed on mailing lists. During the last few years, the project observed an increase in the number of messages devoted to code review—in particular, an increase in the number of code review messages per patch series or individual patch.
Everyone in the community had a different theory as to the root causes of the issues based on their observations: some developers believed we didn’t have enough reviewers, some felt the project’s maintainers had become more aggressive, and some felt code review was not coordinated enough. Many observations contradicted each other and were based only on opinions. Consequently, key members of the project could not agree on how to deal with the perceived issues.
Lars Kurth and Daniel Izquierdo explain why the project decided to use data mining techniques using software development analytics to address the issue. The project needed a detailed analysis to verify which theories were valid, which were not, and which were missed. To do this, the team defined a number of parameters in the code review process to determine if it was deteriorating in some way and pinpoint the root causes of this deterioration, if any. Lars and Daniel cover the project’s journey through a number of stories and explore the techniques that enabled the community to improve their review process.
Requiring more debate Requiring more people to agree More code, more people require more coordination, which is not happening
The project cares more about quality and security than a few years agoThe bar has raised
Newcomers felt unfairly treatedCabal of maintainers bullying the rest of the community
Misunderstandings due to language issues
And slow-down of the process due to time zone issues
Visited vendors in the far east 4 times to deliver it
Avoid late disagreements about design and architecture
Governance changes failed: no consensus as to what was wrong (we did make some changes though, such as a move to a fixed release cadence)
Seeking help from Bitergia to get accurate data : Funded by Advisory Board
To identify root cause and avoid community tensions
TODO: change this, as the picture is somewhat disconcerting
TODO: change this, as the picture is somewhat disconcerting
TODO: Animation and phasing
Houston
Thread = Patch Serie and each root message reply = patch.
+ But there are cases where each patch is a new thread (hard to parse, perhaps using time windows)
+ External threads found. Those are of interest for the community but not for this process (eg coming from the Linux Kernel).
Versions. A new version is a new thread but containing same subject and a new version number [PATCH vX Y/Z] subject
+ Not that formal versioning, so playing with regular expressions and looking for same subjects
+ Missing some versions (cases where the process starts at v5).
Number of patch. Each patch is identified as that number out of the total number of patches
+ Not that formal numbering, so playing with regular expressions.
Matching between Thread and Commit. If the commit message and patch subject are the same, that commit is the merge of that patch.
+ Issues with the time when the commit took place
+ In some cases, some patches share the same subject
Light bulb
There's a slightly increase on the number of Patch Series
+ And a huge increase in the number of comments. There's a lot of more activity in the community in the mailing lists related to Patch review.
+ It is also noticeable the increase in the mean number of iterations and number of patches per patch serie.
- Patch series of any size follow a similar trend since 2011.
+ Increase up to 2014 and later a decrease and controlled by the community
TODO: change this, as the picture is somewhat disconcerting
Things like training, focus on architecture and design reviews before coding
People and orgs cherry picked subsets of results to support their own arguments (e.g. focus on individual reviews vs. statistical analysis)
That is of course to be expected
That was of course to be expected due to the Change Curve
That was of course to be expected due to the Change Curve
Focus: Use Paint Ball Analogy
Light bulb
1:
Required me to learn the workflow in detailRequired me understanding the data model, tools (ElasticSearch and Kibana) Required me to customise the Kibana back-end Lots of active convincing