Computer scientist Dan Gusfield shapes new disciplines, one book at a time

“Some university departments traditionally are perceived as ‘book departments’,” says Dan Gusfield, a professor in the UC Davis Department of Computer Science. “History, English, most of the humanities — where it’s expected that faculty will publish books, as the principle vehicle for disseminating their thoughts. In the sciences, that often isn’t true; typically, engineers publish individual papers on much narrower topics.”

Dan Gusfield, professor, UC Davis Department of Computer Science.

Dan Gusfield, professor, UC Davis Department of Computer Science.

Obviously not one to subscribe to such distinctions, Gusfield recently published his third book: ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. “If you want to make a bigger impact, and bring new people into a just-developing area,” Gusfield explains, “then a book is the right way to go. It gets much more exposure than a collection of papers. And a book definitely helps to shape a developing field.”

Gusfield grew up in Illinois, came to California to earn his doctorate at UC Berkeley, and subsequently served for six years as an assistant professor at Yale University. He then returned to California and joined UC Davis in January 1987.

Gusfield wasted no time embarking on his first book: 1989’s The Stable Marriage Problem: Structure and Algorithms, a collaboration with the University of Glasgow’s Robert W. Irving. In computer science, mathematics and economics, the so-called “stable marriage problem” involves finding a stable “matching” between two sets of elements, given a set of preferences for each element.

“At the time, that was a very small field,” Gusfield recalls, “and it’s interesting to see how it has blossomed. The book has gotten about 750 citations from research articles, so it had the desired effect of attracting more people from those three disciplines: computer science, mathematics and economics.”

His writing, research and teaching notwithstanding, Gusfield also made numerous contributions to the College of Engineering. He chaired the Department of Computer Science for four years, and he wrote the bioinformatics section of the genomics/bioinformatics initiative proposal that resulted in the creation of UC Davis’ Genomics Center. He also co-chaired the campus initiative on “Computational Characterization and Exploitation of Biological Networks,” which resulted in seven new faculty members hired in an equal number of different departments.

But publishing remained in his blood. No doubt emboldened by having had such an impact with his initial venture, Gusfield chose a more ambitious topic for his second book, which he authored himself: 1997’s Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology.

“I wrote that one explicitly to shape the field, which burst into life right around that time. Today it’s called bioinformatics, or computational biology, but I was involved before it even had a name. I wrote my first paper on this topic in the early 1980s, and sent it to a math/computer science journal, and it was rejected. ‘Nice paper,’ they said, ‘but it’s biology.’ Well, of course I knew that wasn’t true.

“I put the paper in a drawer, and then the human genome project came along in 1988, and the field exploded. I pulled that paper out a few years later, cut it in half, and submitted only the first half. It was published, and now it has become my most-cited paper!”

Gusfield took a rather novel approach to the development of his second book.

“I actually wrote it as I was learning stuff: reading papers that I often didn’t understand, so I’d re-write the information for myself. After awhile, I had a fair amount of notes, so I decided, if I just stop watching late-night TV, I could turn those notes into a book.

“It was designed as the background that I wanted computer scientists to know, if they planned on going into computational biology. It definitely helped define the intersection of those two disciplines, and has been cited by more than 5,000 research papers. Pretty much everybody in the field has a copy, and people still write and tell me how much they’ve enjoyed it. It has remained relevant because it focuses on principles and ideas, rather than glitzy software, which changes all the time. But the book’s fundamental ideas remain sound.”

Gusfield hopes to have a similar impact with his newest book, which was published in July 2014.

“This one is directed toward the emerging field of phylogenetic networks, and my piece has to do with ancestral recombination graphs. A couple of previous books discuss those from a biological perspective, since they’re written by biologists, but nothing thus far has examined combinatorial approaches.”

Phylogenetic network graphs are used to visualize evolutionary relationships — abstractly or explicitly — between nucleotide sequences, genes, chromosomes, genomes or species.

“Phylogenetic trees — a subset of phylogenetic networks — are the paradigm for how evolution is depicted; the idea is that evolutionary processes are bifurcating, which creates a tree-like phenomenon. But there’s a new awareness that evolutionary networks are more appropriate, particularly when studying evolution in a particular population, such as humans.

“We differentiate not through mutation or bifurcation, but through what is known as recombination. The chromosomes that we pass on to our children aren’t exact copies of those passed to us by our parents, but instead are a mixture: That’s recombination. In order to depict the history of one’s family, you’ll see mixtures of two DNA sequences that previously had been separate. This requires something that has cycles: a network, rather than a tree.”

“The whole field — called ‘association mapping’ — is being used to determine which gene contributed to a given disease, and where, ancestrally, it came from. This has become a huge area in genetics research today, although it was quite controversial when first suggested.”

Putting theory into practice, Gusfield has been spending time at UC Berkeley, working with a geneticist who hopes to use such techniques to better understand Type 2 diabetes.

Not content to rest on his laurels, Gusfield already is working on his next book.

“People often ask me: How does one write an entire book? The answer is obvious: You write yourself a postcard every day, and after a few years, you’ll have a book.

“But if you’re intimidated, and you never write that first postcard, you’ll never write a book!”