Premkumar Devanbu Decodes “Natural” Software

Premkumar Devanbu, a professor in the UC Davis Department of Computer Science, is attracting world-wide attention with his intriguing discovery that most software coding, because of its tendency to be repetitive and predictable, can be analyzed productively by using statistical models that have become common in natural language processing.10845444416_e9f8acbaf5_n

This research is somewhat counter-intuitive, since programming languages are known to be rich, powerful and expressive. The point, however, is that most software programmers rarely exploit this potential, and instead mimic everyday human speech and writing, which is highly predictable and full of repetition. (This is why Google Translate and Siri work so well with spoken and written language.) As Devanbu has discovered, the statistical and semantic properties of “software naturalness” are rich in scientific questions and engineering promise. For example, by building suitable statistical models into tools, one could help save programmer effort on the boring, repetitive elements that are a big portion of their work.

“Given all the millions lines of code out there, and modern statistical techniques, we can estimate very good models of this code,” says Devanbu. “I believe that we can use these models to make transformative improvements to software development tools, that will help programmers write code, correct mistakes, and even produce documentation.”

Devanbu and his colleagues — Zhendong Su, Abram Hindle, Earl T. Barr and Mark Gabel  — presented a paper, “On the Naturalness of Software,” at the 34th International Conference on Software Engineering (ICSE), which took place June 2-9, 2012, in Zurich, Switzerland. More recently, Devanbu discussed the same topic on Sept. 18, 2013, as an invited guest for the Distinguished Lecturer Series hosted by the School of Computer Science at the University of Massachusetts, Amherst.

The expanding research is funded by the U.S. National Science Foundation and the UK’s Engineering and Physical Sciences Research Council (EPSRC). Collaborators include Su, in the UC Davis Department of Computer Science; Roni Rosenfeld and William Cohen, at Carnegie Mellon University; Barr and Mark Harman, at University College, London; Hindle, at the University of Alberta; Yuriy Brun, at UMass Amherst; and Charles Sutton, at the University of Edinburgh.

Devanbu earned his B.Tech at the Indian Institute of Technology in Chennai, India, and followed that with a doctorate from Rutgers University. After spending nearly 20 years as both a developer and researcher at AT&T Bell Labs and its various offshoots, he left industry to join the UC Davis College of Engineering faculty in 1997. He serves on the editorial boards of Empirical Software Engineering and the Wiley Journal of Software Process and Maintenance. In 2006, he was PI on a project funded by a three-year, $750,000 grant from the National Science Foundation, to study how collaborative teams construct open-source software such as the Apache web server, the PostgreSQL database and the Python scripting language.