Biochemistry in the age of big data

Artist's depiction of scientist working on an electronic chip with genetic information.

The complexity of life makes it difficult to study. In biochemistry, there are often just too many processes and reactions taking place in a cell for humans to wrap their heads around. What helps biochemists make sense of it all?

Cue computational biology and biochemistry. Computation has been used in biology and biochemistry since the dawn of computers and is used today by many researchers in the University of Wisconsin–Madison Department of Biochemistry.

With the advent of high throughput technologies, researchers are able to collect more and more data — the era of “big data” is here to stay. Computational biochemistry can be defined as the use of computational methods and simulations to make sense of all of that data in order to predict and understand various biological processes.

“The main reason we use computers in biochemistry is because biological systems are composed of thousands of interacting components and predicting the behaviors of these complex systems becomes way too difficult to grasp within our brains,” says Assistant Professor Philip Romero, who uses computation in his lab. “Computers are much better at handling large amounts of information.”

Still classic biochemistry: Sequence begets structure begets function

One of the most important applications of computation in biochemistry is the prediction of the structure and function of a protein or other macromolecule.

“We investigate how proteins in the cell membrane come together to form complexes,” Associate Professor Alessandro Senes explains. “We use molecular modeling tools to study something that is often very hard to do with conventional experimental structural methods.”

Photo of Hridindu Roychowdhury working in the lab. — Hridindu Roychowdhury, an Integrated Program in Biochemistry (IPiB) graduate student in the Romero Lab, performs a microfluidic experiment to analyze the impact of mutations on human caspases. Caspases are enzymes involved in executing programmed cell death.

The Senes Lab uses structural prediction to obtain an “educated guess” of what a certain membrane protein may look like. Often, the lab integrates the modeling with experimental information — such as the result of a mutation or data suggested by analyzing the evolutionary variation of the protein — which makes the prediction more reliable. The computation informs the experiments and vice versa. The prediction process can also go a step further and predict a protein’s function based on the sequence and structure in a similar way.

One project in the Senes Lab is the study of the bacterial divisome, a large complex of membrane proteins which helps bacterial cells divide — but that has an unknown structure. An understanding of how the complex’s mechanism functions could come in handy for other researchers looking for a way to prevent bacteria from dividing, and potentially lead to the discovery of new antibiotics. Another project in the lab focuses on the biophysics regulating how membrane proteins interact with each other.

“One of my favorite parts of learning how to program was looking at the physical code and then treating it like a puzzle and figuring out where things went wrong to make it work,” says Samantha Anderson, an Integrated Program in Biochemistry (IPiB) graduate student in the Senes Lab. IPiB is the joint graduate program of the Department of Biochemistry and Department of Biomolecular Chemistry. “It’s fun because it’s a logic puzzle. A lot of people think computation and coding are very abstract but they are actually tangible things you can see and work with concretely, which makes them really rewarding.”

Computation is for biochemistry what Pandora is for music

Many often hear of algorithms when talking about social media platforms like Facebook or music streaming services like Pandora. These function by learning from the user what they want to see or listen to and tailoring what is presented. A lot of computational work in biochemistry functions in the same way.

“It’s actually really similar,” Romero says. “You have to teach the machine what is good and bad so you can make a prediction or design. We look at a sequence and know there are some parts that work poorly for what we want — we give those a thumbs down — and others that maybe produce something we are interested in — those we might give a thumbs up. The computer can then slowly learn what makes a good sequence good and a bad sequence bad. The examples allow the computer to extrapolate what makes a really good sequence and deliver that to us.”

Like the Senes Lab, the Romero Lab’s broad interest is in trying to understand the relationships between protein sequence, structure, and function and how they can learn about these relationships from large data sets. They then apply those principles to design new proteins with optimized properties for applied uses in areas such as bioenergy, chemical production, or human health.

“Computer power is getting faster and faster and cheaper and cheaper,” Romero says. “This technology is only getting better and getting your foot in the door and investing in these tools and skills can be very valuable. They will play an increasingly important role in biological research. All of us, and the department as a whole, have a mission to stay ahead of this rapidly evolving technology.”

Using computation to design new proteins with novel functions

While Assistant Professor Vatsan Raman also works on large scale experiments and protein design, he is exploring a third angle: precision medicine. He was recently awarded a $2.2 million-dollar grant from the NIH to support his research on allostery — the process by which a protein senses and conveys a signal that causes a change in a different part of itself.

Photo of Gladys Diaz-Vazquez. — Gladys Díaz-Vázquez, a graduate student in the lab of biochemistry professor Alessandro Senes, analyzes a molecular dynamics simulation on a computer. The use of computational methods, along with experiments, help her better understand the structural properties and the function of membrane proteins.

“Another thing we can do is pick proteins relevant to disease and work to predict the pathogenic consequences of new mutations that may occur in those proteins,” he explains. “That is super important. If we could build this across the top ten or twenty highly mutated cancer genes that would be a repository worth its weight in gold.”

For example, if a physician finds an arbitrary mutation in a patient he or she could then use a kind of database to make an educated guess about what the mutation might do to the patient’s health and possibly how to treat it. A database of this size is only possible with high throughput methods and computation, Raman says.

“In my lab we are picking nuclear receptors that are highly relevant in disease and analyzing them to ask how we can figure out the rules that govern them so if we have a new mutation we can figure out what it does,” he says. “We are working to get enough data to be able to make predictions.”

Along with Senes, Romero, and Raman, many other labs use computational methods in their research. For example, Assistant Professor Ophelia Venturelli also works in this area. The department’s expansion of these techniques is creating new opportunities, including coursework on quantitative approaches, for researchers and students interested in this area of biochemistry.

“Many students don’t come in with a lot of knowledge in this area but are able to learn on the job,” Raman says. “At this point computation is a necessity. Students might not come into the program knowing how to code but if you have a large dataset from a big experiment, you can’t just stare at it. You’ve got to start writing your own code to begin making sense of it. And so we dive in.”

Story: Kaine Korzekwa, Photos: Robin Davies.

Read more about computational biochemistry in the UW–Madison Department of Biochemistry:

A Tale of Three Sams: Talented Students Share a Name in the Senes Lab

Raman Earns Prestigious NIH Award to Fund Research on Protein Function

Romero Targets Wide-ranging Applications with Data-driven Protein Engineering