Hertz Fellow Emma Pierson Looks to Computational Statistical Analysis to Stem the Tide of Cancer Deaths

January 28, 2016
Jeremy Thomas
Livermore, Calif

In 2016, the American Cancer Society estimates 1.7 million new cancer diagnoses and more than 595,000 deaths from cancer in the U.S. alone. For Hertz Fellow Emma Pierson, a PhD candidate in computer science at Stanford University, analyzing genetic statistics to improve a person’s odds of surviving cancer is more than just academic work, it’s a personal quest.

When Pierson was 13, her mother was diagnosed with breast cancer. She recovered, but in 2011, Pierson discovered she had inherited the mutated BRCA1 gene from her mother and grandfather, predisposing her to a greatly increased risk of developing breast and ovarian cancer. Rather than worry about it, Pierson has turned the knowledge into a catalyst for her academic research.

“It definitely made me better at what I do,” Pierson said. “It’s made me live much more urgently.”

“I’m very impatient when I want something to get done, but it also means things get done.” In the cancer research lab at Stanford, Pierson, 24, and her colleagues are trying to make sense of large datasets of breast cancer. She is fresh off a year of study at Oxford University, where she helped develop a new statistical method to analyze single-cell gene expression data. The method gives researchers better tools to analyze RNA levels in individual cells. The paper was published in Genome Biology in November. Sadly, just a week before she was to submit her thesis, her grandfather was diagnosed with brain cancer. He passed away in May 2015.

“I am not going to cure cancer, not even the BRCA cancers,” Pierson wrote in her November 2015 New York Times blog, “Seeking a Cancer-Free world”. And I am going to watch the people I love die from diseases I cannot understand or prevent. I would be lying if I told you I have made my peace with that. It gives me hope only to fight, as my grandfather did, for futures unseen: to strive, to seek, to find and not to yield.”

After obtaining her master’s from Stanford, Pierson was awarded the Fannie and John Hertz Foundation Fellowship, but deferred the prize to become a Rhodes Scholar. Before heading to Oxford, she worked for 23andMe, a company that is known for its analyzing genetic databases. Then, with the Genotype-Tissue Expression (GTEx) Project, she helped develop a new type of statistical method to more accurately infer gene coexpression networks involving 35 types of tissues. Their findings were published in PLOS Computational Biology last May. See image below.

Emma Pierson data analysis chart
Chart from PLOS Computational Biology journal paper “Sharing and Specificity of Co-expression Networks across 35 Human Tissues”. The hierarchy was created using hierarchical clustering: for each tissue, the mean expression of each gene in the tissue was computed, and tissues with similar gene expression patterns were merged into clusters. Lower branching points represent clusters with more similar gene expression patterns. Many biologically plausible clusters are apparent: the brain and non-brain cluster, and clusters for the basal ganglia, cortex, adipose tissue, heart, artery, and skin.

“Imagine a genetic network with 10,000 nodes where every node can be connected to every other node,” Pierson said. “You’re trying to find how these nodes are all connected to each other using the data that you have. But the problem is you don’t have very much data and there’s a huge number of possible connections. And to make the problem even more complicated you’re not trying to figure out a single network, you’re trying to learn a network for every single tissue in the human body.”

Data analysis isn’t all work for Pierson. On her blog, Obsession with Regression, she often examines politically-charged social issues such as police shootings, abortion, and campus sexual assaults. Her analyses have appeared in the New York Times, the statistical website FiveThirtyEight, and the Washington Post.

“If you’re a statistician you have an obligation to focus on questions which are important and not just stuff which is really trivial,“ Pierson said. “If you can remove yourself from an issue and just look at the numbers it’s easier to have a debate that is less politically charged.”

As she continues to use data to understand the world around her, Pierson, while at Stanford, said she would like to continue researching BRCA1 carriers like herself, and work on the challenging problem of early detection for ovarian cancer, a particularly deadly form of cancer which is often not caught until the late stages.

“I would really like to do research that actually makes people’s lives better,” Pierson said.

Chart from PLOS Computational Biology journal paper “Sharing and Specificity of Co-expression Networks across 35 Human Tissues”. The hierarchy was created using hierarchical clustering: for each tissue, the mean expression of each gene in the tissue was computed, and tissues with similar gene expression patterns were merged into clusters. Lower branching points represent clusters with more similar gene expression patterns. Many biologically plausible clusters are apparent: the brain and non-brain cluster, and clusters for the basal ganglia, cortex, adipose tissue, heart, artery, and skin.