Social Recommendation for Software Comprehension
When I joined Microsoft Research in the summer of 2004, I was faced with a challenging task: “find one of the bottlenecks for software developers and create a solution that addresses it”. For three months, I surveyed and interviewed developers in product teams at Microsoft, designed and ran controlled experiments and usability studies, wrote visual studio plugins and log analysis tools, and created a large part of the solution described below. The project evolved so much since I left at at the end of 2004 to join theMicrosoft Expression Blend teamand it’s now a large collaboration project between several teams inside Microsoft Research.
 
One of the major bottlenecks of the software development process is code comprehension. When a developer joins a new project, the first challenge she is faced with is how to understand the existing code base. By surveying users and running ethnography studies, a basic pattern emerged and two questions were frequently asked:
1- Where do I start?
2- Where do I go next?
 
The first step was to run a study with seven experienced programmers where they are exposed to a new code base and asked to perform some typical engineering tasks that involve debugging and adding functionality to the existing code. Visual Studio was chosen as the coding environment and it was instrumented to log the participants’ activities during the experiment.
 
Based on observations and post-experiment questionnaires, all subjects agreed that finding the entry point and understanding the control flow was the most difficult task since the code was giving them a broader working set than what they need for the task. The large number of members and variables in each class made it difficult for most of them to navigate using class view. Many participants expressed their need to eliminate from view those areas of code irrelevant to the current task, in order to identify areas that have not been explored yet. Moreover, they also expressed the need for an advanced highlighting and annotation tool that they can use with the same simplicity of noting something on paper, while being linked to the code. When asked about what they would say to subsequent code owners, all participants agreed about communicating the names of the important areas in the code. Some participants also communicated a "navigation path" that would lead others through the best route to understand the code.
 
The results of the study indicate that interaction history can be used to distinguish parts of the code based on their importance. An initial overview that provides developers with code hotspots should accelerate the comprehension process by highlighting the areas which should receive early attention. Additionally, developers need to specify which of these members are relevant to their tasks and constitute a working set for later use.
 
To address these problems, we created a prototype that integrates three solutions:
 
1- The FAN List: implicit queries of history:
Computational wear was combined with social filtering and implicit queries in our mockup of the Frequently Accessed Next (FAN) List design. At the top of the figure is the usual code editor. At the bottom left is the FAN List. Based on the current definition (method or field) under the cursor
in the editor window, the FAN List displays all definitions to which that previous users navigated directly after leaving the current definition more than 5% of the time. Clicking on an item in the FAN List causes the definition to appear in the preview window to the right of the FAN List. This allows the user to inspect the given definition (and the code around it) without changing the current focus in the editor window. (Double-clicking on an item in the FAN List puts that definition in the editor window, so the user can also navigate with the FAN List.)
 
FAN list (left) showing the most Frequently Accessed Next methods for the current function, and the preview source (right) shows a read only preview of the selected method.
 
2- Code Favorites: wear-filtered overviews
Code favorites is a prototype providing a customizable class browser to navigate classes and members. Class View displays in a tree view all of the projects in the system, the types defined in those projects and the members defined in those types. Code Favorites, instead, filters the tree based on interaction history. In this case, only members that previously received 50 or more visits are displayed
as children of their containing class node. The remaining members are displayed as children of an ellipsis node, labeled “More members”, that is a child of their containing type. Similarly, only the most frequently accessed projects and types are shown. Checkboxes allow the user to move items
in and out of the ellipsis folders. Checking an item in an ellipsis folder moves it out to the parent node; unchecking an item moves it to the parent’s ellipsis folder. The novelty of this design over
the original favorite folders is seeding the list of favorites based on the team’s interaction history.
 
 
Code Favorites (left) showing only the most frequently used classes and members, compared to the traditional class viewer (right) showing all classes and methods
3- Wear for degree-of-interest highlights
Since modern IDEs are capable of generating UML class diagrams, the third and final design proposed filtering these diagrams based on interaction history. The advantage of such a diagram is that it shows the whole system and the degree to which previous programmers worked on various parts. The display and its highlights remain unchanged as the user navigates among the definitions, which should help to keep the user oriented, particularly if a “you are here” marker is kept in sync with the editor’s cursor.
 
 
 
 
 
Team Tracks