UA team creates program to read through libraries of scientific articles and come to cancer conclusions
As college students, the perils of over-reading are well known. The text is flowing by, and you vaguely imagine that these shapes all mean something in some wonderful realm of well-rested alertness. Too bad it’s 1 a.m. and you’re so focused on pushing through the pages of words that you’re not absorbing any of it. At this point, most people either fall asleep on their textbook or reiterate some version of the infamous “I wish I could just download this into my brain.” Well, if you’re a cancer researcher, now you can. Rather, a computer can, and you get to edit the products.
REACH, or Reading and Assembling Contextual and Holistic Mechanisms from Text, is a UA project funded and proposed by the Defense Advanced Research Projects Agency, or DARPA. The ultimate goal of the program is to be able to scan a huge body of publicly available, peer-reviewed scientific papers indexed by PubMed, namely biological papers relating to cancer, and have the computer make inferences on its own.
Why aren’t humans enough? There are two factors that lead to poor human comprehension of large bodies of knowledge. First, sheer human boredom in the face of a huge body of knowledge.
“I think since 2010 there are a million papers being generated a year,” said Clayton Morrison, associate professor at the UA School of Information. “For a human, that’s going to be incredibly boring.” Not to mention impractical. Since most humans can only read widely in a specific area that they specialize in, researchers say a lot of experts in different cancer fields do not necessarily talk to each other, which is the other reason large bodies of knowledge aren't comprehended.
“Although almost everything has been started to a certain degree, research doesn’t interact; […] I’m the expert in this protein — we don’t talk to each other." said Mihai Surdeanu, associate professor of computer science at the UA. "We develop drugs on our own. So the whole idea of this project was to provide a holistic understanding by reading all the publications that have been published on cancer research, and integrating all the paper[s] into a single picture of cancer.”
The ultimate goal of the program is to speed up human comprehension of areas with large bodies of data, as well as find causal chains that might be overlooked.
First, the program reads fragments of papers. Next, the plan is to assemble the fragments into a causal chain of protein signaling pathways that will help to detect the cancer. Hopefully, at least a few of these pathways will be unknown to researchers beforehand. Or, perhaps other strange patterns will emerge giving hints about how cancer behaves.
One of the successes that REACH has already had in this department is resolving the problem of certain redundant signaling pathways. For example, in one cancer patient, there may be a change to his or her DNA. And in another cancer patient, there is another alteration but to a different area of the DNA that makes a different molecular machine. These two gene alterations seem to never happen together, they are mutually exclusive. The program was able to detect this because both pathways activate the same protein, rendering them redundant. Perhaps this result will alert other researchers to similar issues as they continue to construct a holistic mechanism of the disease.
The ongoing usefulness of the program is always a concern. Morrison is working with biologists Guang Yao and Ryan Gutenkunst, assistant professors of molecular and cellular biology at the UA. They are still tying to determine how clear the data is, which has been one of the challenges, according to Morrison.
“[The biologist’s] job is to keep us honest," Morrison said. "We’re actively trying to understand, as we’re building up these different levels, what are the additional kinds of [language] that we need to respect and how things get connected together and how they behave. And it’s very, very challenging.”
The enormous impact that this program may have for not only cancer but any large body of research is exciting, especially considering the success it’s already had.
“I’ve been in a lot of DARPA projects, this one has felt — to me — very productive in a way I haven’t seen in other projects," Morrison said. "Again, why that is, [is] a combination of luck, having the right people, timing, certain tools having been developed far enough that when you bring them together the sweet spot of enormous amounts of data are already together.”
What's the biggest challenge? Synthesizing the biological and computational, according to Surdeanu.
“It forced us to very, very quickly pick up stuff we didn’t know. Instead of focusing on the trending models in machine learning, for example, 'Stimulate the brain with big graphs of neurons,'it forced us to focus on models that the biologists understand," Surdeanu said. "It forced us to focus on machine learning that can service a bridge between these two different tribes.”
The year 2016 will mark year two for the project. Though REACH has a long way to go, it may bridge the gap between the literature and the lab and help scientists find new insights into treating cancer.
Follow Alexandria Farrar on Twitter.