Building a “BugSwarm” Database to Help Fix Software Glitches
by Kelley Weiss
DAVIS, Calif.; October 24, 2016–Most of the time, our cell phones and computers run software to help us drive, cook, bank and shop without a hitch. But when these programs have defects, they can bring our daily activities to a screeching halt.
Software engineering researchers are hard at work developing new ways to find and fix these bugs. To help jumpstart this field, UC Davis College of Engineering Computer Science researchers are developing a dramatically new approach to evaluate techniques that deal with software bugs. The National Science Foundation has awarded the group a $1 million grant to fund this work.
Assistant Professor Cindy Rubio Gonzalez, Department of Computer Science, is the lead researcher for the project called BugSwarm. She says her co-PI, Professor Prem Devanbu, coined the term.

“The idea is that by leveraging open-source project information we can gather and reproduce hundreds of thousands of bugs, i.e. a swarm, with their respective fixes,” Rubio Gonzalez says.
Since she started her graduate research eight years ago, Rubio Gonzalez has been developing program-analysis tools to automatically find software defects and improve program performance. The BugSwarm idea, she says, was born in the fall of 2015 over discussions with her co-PI Devanbu, a postdoctoral researcher, and two undergraduates.
The postdoctoral researcher, Bogdan Vasilescu, is now an assistant professor at Carnegie Mellon University’s School of Computer Science but will continue to collaborate on the research grant with Rubio Gonzalez and Devanbu. Two UC Davis undergraduate students, a master’s student and a full-time programmer are also part of the team.
In the past, programmers used to rely on small datasets, largely made up of glitches artificially created by introducing defects intentionally into the computer code.

“We plan to identify and reproduce bugs that have been ‘naturally’ introduced as software is developed, and we plan to do this on a very large scale,” Rubio Gonzalez says.
In addition, she says that the team’s work will provide a new resource for researchers to access existing tests, patches, and scripts to easily build the code, run tests, and reproduce bugs. And all of this will be released in a public database.
By using this large-scale dataset of reproducible defects, test and patches, Rubio Gonzalez says researchers can, for example, apply bug repair techniques to a large and diverse set of “buggy” programs to automatically create patches. Then she says they can evaluate the automatically generated fixes against the programmer’s patches.
The National Science Foundation award is a three-year grant that will end in August 2019. Rubio Gonzalez says UC Davis will use the grant funding to create the database and administer it for the first three years.