Abstract
Developers perform a wide variety of tasks during software development and maintenance. The
most common tasks are documentation, bug fixing, enhancing capabilities, removing obsolete capabilities, and optimizations. Each task varies in complexity and effort. Developers spend significant time
acquiring the software system’s knowledge. Specifically, they find comprehension of collected information laborious and often repetitive. This difficulty is due to the dispersed information across multiple
artifacts such as source code, unit tests, bug reports, API documentation, commit history, and developer
discussions.
The comprehension effort gets confounded as the number of artifacts increases. Also, complete
comprehension is difficult as software systems grow in size and get more distributed. Due to this,
developers have a fragmented or incomplete understanding of the software system. To overcome this,
they adopt tactical strategies to familiarize themselves with the program. They either read code, add
trace statements, use debuggers, establish and test hypotheses, or observe the behavior. Due to this,
the changes are tedious and are of lower quality. So, supporting developers to acquire task-specific
knowledge rapidly is essential to develop and maintain a software system effectively. In this context, we
analyze the developers’ information needs, propose a model and developed a technique to help faster
code comprehension.
Developers choose to read software code. They do so due to a lack of time and face minimum
barriers. Developers read code on a need-basis without any plan – opportunistic. They choose either case-based or scavenging approaches [86]. Large-scale studies have observed that developers read
code despite the availability of modern tools and Integrated Development Environment (IDE) [91, 125].
While code reading is prevalent among developers, the effectiveness of approaches followed by them is
unknown. So, we study the effectiveness of comprehension by reading code in the context of reusing an
external module, which is a common developer task.
Reuse is a complex task, but developers are still required to integrate an external module rapidly.
Observing developers undertake the reuse task is difficult due to the unplanned nature of the task. So,
analyzing the historical bug reports and commit history helps us reason about the effectiveness of code
reading practice. We observed reuse of the JMapViewer module in two independently managed opensource software applications – Mobile Atlas Creator and JTileDownloader. Based on the bug reports and
large commits, we conjecture that comprehension was limited. It was to the extent required to accomplish the integration. We observed many inconsistencies, emergent behavior, conflicting scenarios, and fragility in the software structure. From this study, we infer that developers limit knowledge acquisition
in an opportunistic task. Furthermore, we also conclude that the issue is due to the information overload
they face when reading code.
There is a growing interest in understanding developers’ tasks and how they comprehend code. Nevertheless, little is known about developers’ information needs when reading code. For example, How do
developers locate concepts in a new program? What information do they seek to assess the impact of a
change? What knowledge about the system do they need to triage bugs? How do they decide where to
start for root cause analysis? When working with code, developers are constantly engaged in reasoning
about the choices, implications, purpose, relevance, and expectations. In doing so, they assess a code
fragment’s relevancy to the task. Typically, a comprehension task beings with an information-seeking
activity.
We find the current developer tool-sets fall short in assisting developers in reducing the information
overhead they face during comprehension. When developers read a code, they create rich mental models
of the program structure. However, these models are volatile and have no formal methods to record and
share among developers. As there are no records, developers repeatedly read the same code and rebuild
the mental model again. Having to repeat this across multiple source code is tedious, more so if the code
is unfamiliar.
Recently, researchers are examining large software projects using Mining Software Repositories
(MSR) techniques to address a variety of Software Engineering questions. Overwhelmingly, recent
research studies have focused on what can happen or why it happens. However, when reading code,
questions related to what is happening are essential. So, we explored how to assist a developer in
reading code by identifying and extracting task-specific information from software artifacts. We base
our approach on the program comprehension model and the program navigation models [23, 111, 114].
These models suggested that developers use beacons or cues to establish the rele