This study compares academic research on modern code reviews with practitioner perceptions, revealing areas of alignment, significant gaps, and where future researchThis study compares academic research on modern code reviews with practitioner perceptions, revealing areas of alignment, significant gaps, and where future research

Researchers and Developers Aren’t Fully Aligned on Modern Code Review Goals

1 INTRODUCTION

2 BACKGROUND AND RELATED WORK

3 RESEARCH DESIGN

4 MAPPING STUDY RESULTS

5 SURVEY RESULTS

6 COMPARING THE STATE-OF-THE-ART AND THE PRACTITIONERS’ PERCEPTIONS

7 DISCUSSION

8 CONCLUSIONS AND ACKNOWLEDGMENTS

REFERENCES

\

6 COMPARING THE STATE-OF-THE-ART AND THE PRACTITIONERS’ PERCEPTIONS

In this section, we answer RQ3 — To what degree are researchers and practitioners aligned on the goals of MCR research? — by juxtaposing the results from the mapping study (Section 4) and the responses from the survey (Section 5)

6.1 Comparing the number of research articles and the practitioners’ perceptions

In Figure 9, we map survey responses, the percentage of papers representing a survey statement, and modern code review themes. The percentage of negative and positive responses for each statement is shown on the x- and y-axes, respectively. Each bubble represents a statement from the survey and its size indicates the percentage of representing papers. The different colors represent the five themes we identified in the mapping study. In addition, we evaluated if there is a statistical correlation between the number of papers and practitioners’ perceptions. Using Shapiro-Wilk normality test we determined that our data is normally distributed.

\ We then conducted a Pearson correlation test to evaluate if there is a significant relation between the ratings and the number of papers in different themes. The result of the correlation test is provided in Table 13, the statistically significant results are bold. Figure 9 accentuates a result we reported on the agreement levels in Section 5.2: while there is considerable research on solution support (SS), and human and organizational factors (HOF), as indicated by the number and size of bubbles, practitioners seem to have a rather negative attitude towards the research done in this theme. None of the solution statements received more than 50% positive responses.

\ Within this theme, research on support for understanding the code changes that need to be reviewed and support for the selection of appropriate reviewers received the most positive responses and was also associated with the most papers. This is a good example of alignment between research and practitioners’ interest. The positive alignment is also confirmed in the correlation test as the solutions that have fewer publications received also more negative ratings (c.f., Table 13). On the topic of reviewer selection, one of the respondents noted that "P9: The most effective review is the one done by developers who are the most familiar with a particular functionality or have worked on a similar functionality on a different project. I think there is no helping tool to tell who is the most appropriate reviewer."

\ Several studies propose or evaluate tools that do just exactly that. While the respondents’ answer is certainly not representative, more focus on knowledge translation and transfer to practitioners about existing solutions can be a beneficial target for researchers in this area. Furthermore, as seen in Section 4.2.1, only two out of 36 solutions supporting the reviewer recommendation provide links to the tools, which could explain why practitioners are unaware of the existing solutions. Looking at Figure 9, we see more negative than positive responses for statements related to human and organization factors (HOF). However, we did not find any statistical significant relation between the number of papers in the HOF theme and the rating, as indicated in Table 13. The

most positively received statement is related to investigating the effect of the number of involved reviewers in code reviews. The statement investigating review performance and reviewers’ age and experience in this theme is associated with the most studies but it is also perceived mostly negatively. For example, a respondent wrote: P2: "Age and experience is less important than code knowledge or ability to read code. An 18 year old with no experience writes the best comments, then that is the person I will invite to review".

\ Another participant elaborated more on the age factor: P7: "I don’t understand how the age of reviewer can help in performance, Experience to certain extent but that doesn’t mean the experienced person knows new technologies that are emerging so this statement should be viewed as 2 separate things with respect to experience yes important to investigate to certain extent. But with respect to age some younger ones are actually doing more reviews now a days". Another respondent emphasised the importance on a standard review process being more important than reviewer age and experience: P25: "Standard review procedure is to be independent of individual/team members’ age and experience".

\ Looking at the top-left corner of Figure 9, the area with high positive and low negative ratings is dominated by statements related to research on the impact of code reviews on product quality and human aspects (IOF) and modern code review process properties (CRP). Although, we can see that only the relation between the ratings and papers in the IOF theme is statistically significant (c.f., Table 13). This result indicates that practitioners are interested in research that investigates causal relationships, as indicated by a respondent P11: "Understanding how people approach and make decisions when performing a code review may open up some other interesting questions in how to structure and format code reviews to be more effective". However, there is only a relatively low number of studies in this area.

\

6.2 Comparing research impact and practitioners’ perceptions

We retrieved citations of all primary studies as of August 2022. Peer citation is one way of assessing the research impact and the activity of a theme. We compared the research impact with the practitioners responses from the survey. As we have the practitioners responses on each statement,

we calculated the research impact for each statement by considering the sum of citations of all primary studies representing a statement (see Table 7).

We grouped the analysis by creating bins for the publication year, since more recent publications have likely less citations than older publications, which may have had simply more time for being cited. The primary studies are published between 2007 and 2021 (Figure 10). The percentage of negative and positive responses for each statement is shown on the x- and y-axes and the colors represent the different themes. Each bubble represents a statement from the survey and its size indicates the total number of citations of all primary studies in each statement.

\ In addition, we evaluated if there is a statistical correlation between the research impact and practitioners’ perceptions. Using Shapiro-Wilk normality test we determined that our data is normally distributed. Then we conducted a Pearson correlation test to evaluate if there is a significant relation between the ratings and the research impact in different years. Table 14 shows the results of Pearson’s correlation test for the different years. We also evaluated the correlation between the ratings and the research impact of papers in each theme (see Table 15).

\ Although the overall positive ratings are low for the support systems for code reviews (SS) theme, the papers with high impact have higher positive rating compared to low impact papers. When considering all years together, the SS theme exhibits a significant negative correlation between negative ratings and research impact (r = -0.5087684 , p = 0.004827), indicating that when impact is high, the negative ratings are low. Similarly, the correlation between positive ratings and research impact is significant as well (r = 0.5502959 ,0.001982). In the human and organization factor (HOF) theme we can see from Figure 10 that some of the statements that have high impact were perceived negatively by practitioners, particularly in the time frame between 2016-2018.

\ However, we did not find any statistical significant relation between the ratings and statements in the HOF theme. In the theme related to impact of code reviews on product and human factors (IOF), we can see that statements that have high impact also received more positive ratings. We also observed a statistically significant correlation between the positive ratings and impact in the time frame between 2013-2015 (r = 0.7670108 , p = 0.04419). We did not find any interesting patterns in the other themes.

:::info Authors:

  1. DEEPIKA BADAMPUDI
  2. MICHAEL UNTERKALMSTEINER
  3. RICARDO BRITTO

:::

:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

\

Market Opportunity
Threshold Logo
Threshold Price(T)
$0.008669
$0.008669$0.008669
-4.40%
USD
Threshold (T) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.