Ryan N Gutenkunst

Department Head, Molecular and Cellular Biology
Professor, Molecular and Cellular Biology
Professor, Ecology and Evolutionary Biology
Professor, BIO5 Institute
Professor, Applied BioSciences - GIDP
Professor, Applied Mathematics - GIDP
Professor, Genetics - GIDP
Professor, Statistics-GIDP
Professor, Public Health
Professor, Cancer Biology - GIDP
Member of the Graduate Faculty

Contact

rgutenk@arizona.edu

Degrees

Ph.D. Physics

Cornell University, Ithaca, New York, USA
Sloppiness, Modeling, and Evolution in Biochemical Networks

B.S. Physics

California Institute of Technology, Pasadena, California, USA

Work Experience

Los Alamos National Laboratory (2009 - 2010)
Cornell University, Ithaca, New York (2007 - 2008)

Awards

Kavli Fellow

US National Academy of Sciences and The Kavli Foundation, Summer 2014

Distinguished Early-Career Teaching Award

College of Science, Summer 2013

Interests

Research

My central research goal is to understand the evolution and function of the complex molecular networks that underlie life. The connection between changes in genes and changes in organism function has long been a critical gap in molecular evolution research. Systems biology research is now revealing the mechanisms that bridge that gap. My research program thus integrates population genomics and systems biology to incorporate molecular mechanisms into evolutionary genetics.

Teaching

Prof. Gutenkunst’s central educational goal is to prepare undergraduate and graduate students for the increasingly quantitative world of biology. He furthers this goal through his teaching at the graduate and undergraduate levels at the University of Arizona and through his teaching at international workshops.

Courses

2025-26 Courses

APPL Research

APPL 900 (Spring 2026)
Genetics and Race

MCB 295E (Spring 2026)
Honors Thesis

MATH 498H (Spring 2026)
Honors Thesis

MCB 498H (Spring 2026)
Honors Thesis

PSIO 498H (Spring 2026)
MCB Journal Club

MCB 595 (Spring 2026)
Rsrch Ecology+Evolution

ECOL 610A (Spring 2026)
Spc Tps Ecol+Evol B

ECOL 596X (Spring 2026)
APPL Research

APPL 900 (Fall 2025)
Big Data Molecular Biology

MCB 447 (Fall 2025)
Big Data Molecular Biology

MCB 547 (Fall 2025)
Honors Thesis

MATH 498H (Fall 2025)
Honors Thesis

MCB 498H (Fall 2025)
Honors Thesis

PSIO 498H (Fall 2025)
MCB Journal Club

MCB 595 (Fall 2025)

2024-25 Courses

Thesis

MCB 910 (Summer I 2025)
Honors Independent Study

MATH 399H (Spring 2025)
Independent Study

MCB 599 (Spring 2025)
MCB Journal Club

MCB 595 (Spring 2025)
Research

MCB 900 (Spring 2025)
Thesis

MCB 910 (Spring 2025)
Genetics and Race

MCB 295E (Fall 2024)
Independent Study

MCB 599 (Fall 2024)
MCB Journal Club

MCB 595 (Fall 2024)
Spc Tps Ecol+Evol A

ECOL 596W (Fall 2024)

2023-24 Courses

Directed Research

ABBS 792 (Spring 2024)
Dissertation

GENE 920 (Spring 2024)
Honors Thesis

MCB 498H (Spring 2024)
Independent Study

MCB 599 (Spring 2024)
Big Data Molecular Biology

MCB 447 (Fall 2023)
Big Data Molecular Biology

MCB 547 (Fall 2023)
Directed Rsrch

MCB 392 (Fall 2023)
Dissertation

GENE 920 (Fall 2023)
Genetics and Race

MCB 295E (Fall 2023)
Honors Thesis

MCB 498H (Fall 2023)
Spc Tps Ecol+Evol B

ECOL 596X (Fall 2023)

2022-23 Courses

Dissertation

GENE 920 (Spring 2023)
Genetics and Race

MCB 295E (Spring 2023)
Honors Thesis

DATA 498H (Spring 2023)
Internship

MCB 493 (Spring 2023)
Master's Report

ABS 909 (Spring 2023)
Research

MCB 900 (Spring 2023)
Rsrch Ecology+Evolution

ECOL 610A (Spring 2023)
Spc Tps Ecol+Evol B

ECOL 596X (Spring 2023)
Dissertation

GENE 920 (Fall 2022)
Genomic Medicine Colloquium

MCB 195B (Fall 2022)
Honors Thesis

DATA 498H (Fall 2022)
Master's Report

ABS 909 (Fall 2022)
Quantitative Biology

MCB 315 (Fall 2022)
Research

MCB 900 (Fall 2022)
Scientific Communication

MCB 575 (Fall 2022)
Spc Tps Ecol+Evol B

ECOL 596X (Fall 2022)

2021-22 Courses

Internship

MCB 493 (Summer I 2022)
Master's Report

ABS 909 (Summer I 2022)
Directed Research

MCB 792 (Spring 2022)
Internship

MCB 493 (Spring 2022)
Internship in Applied Biosci

ABS 593A (Spring 2022)
Master's Report

ABS 909 (Spring 2022)
Research

GENE 900 (Spring 2022)
Rsrch Ecology+Evolution

ECOL 610A (Spring 2022)
Spc Tps Ecol+Evol B

ECOL 596X (Spring 2022)
Big Data Molecular Biology

MCB 447 (Fall 2021)
Big Data Molecular Biology

MCB 547 (Fall 2021)
Directed Research

MCB 792 (Fall 2021)
Genomic Medicine Colloquium

MCB 195B (Fall 2021)
Internship

MCB 493 (Fall 2021)
Internship in Applied Biosci

ABS 593A (Fall 2021)
Research

GENE 900 (Fall 2021)
Scientific Communication

MCB 575 (Fall 2021)
Spc Tps Ecol+Evol B

ECOL 596X (Fall 2021)

2020-21 Courses

Internship

MCB 493 (Summer I 2021)
Internship in Applied Biosci

ABS 593A (Summer I 2021)
Directed Research

ECOL 492 (Spring 2021)
Independent Study

MCB 499 (Spring 2021)
Internship

MCB 493 (Spring 2021)
Master's Report

ABS 909 (Spring 2021)
Research

GENE 900 (Spring 2021)
Spc Tps Ecol+Evol B

ECOL 596X (Spring 2021)
Internship

MCB 493 (Winter 2020)
Genomic Medicine Colloquium

MCB 195B (Fall 2020)
Independent Study

ECOL 499 (Fall 2020)
Internship

MCB 493 (Fall 2020)
Master's Report

ABS 909 (Fall 2020)
Quantitative Biology

MCB 315 (Fall 2020)
Research

GENE 900 (Fall 2020)
Scientific Communication

MCB 575 (Fall 2020)
Spc Tps Ecol+Evol B

ECOL 596X (Fall 2020)

2019-20 Courses

Community Ecology

ECOL 596F (Spring 2020)
Honors Thesis

MCB 498H (Spring 2020)
Master's Report

ABS 909 (Spring 2020)
Big Data Molecular Biology

MCB 447 (Fall 2019)
Dissertation

BIOS 920 (Fall 2019)
Honors Thesis

MCB 498H (Fall 2019)
Internship in Applied Biosci

ABS 593A (Fall 2019)
Master's Report

ABS 909 (Fall 2019)

2018-19 Courses

Internship

MCB 493 (Summer I 2019)
Internship in Applied Biosci

ABS 593A (Summer I 2019)
Directed Research

MCB 392C (Spring 2019)
Dissertation

BIOS 920 (Spring 2019)
Internship in Applied Biosci

ABS 593A (Spring 2019)
Introduction to Research

MCB 795A (Spring 2019)
MCB Journal Club

MCB 595 (Spring 2019)
Master's Report

ABS 909 (Spring 2019)
Community Ecology

ECOL 596F (Fall 2018)
Dissertation

BIOS 920 (Fall 2018)
Genomic Medicine Colloquium

MCB 195B (Fall 2018)
Internship in Applied Biosci

ABS 593A (Fall 2018)
Quantitative Biology

MCB 315 (Fall 2018)
Scientific Communication

MCB 575 (Fall 2018)

2017-18 Courses

Internship in Applied Biosci

ABS 593A (Summer I 2018)
Master's Report

ABS 909 (Summer I 2018)
Honors Thesis

MCB 498H (Spring 2018)
Internship in Applied Biosci

ABS 593A (Spring 2018)
Master's Report

ABS 909 (Spring 2018)
Research

BIOS 900 (Spring 2018)
Honors Thesis

MCB 498H (Fall 2017)
Internship in Applied Biosci

ABS 593A (Fall 2017)
Master's Report

ABS 909 (Fall 2017)
Research

BIOS 900 (Fall 2017)

2016-17 Courses

Research

CPH 900 (Spring 2017)
Senior Capstone

BIOC 498 (Spring 2017)
Cell Systems

MCB 572A (Fall 2016)
Dissertation

MATH 920 (Fall 2016)
Introduction to Research

MCB 795A (Fall 2016)
Key Concepts:Quantitative Bio

MCB 315 (Fall 2016)
Research

EPID 900 (Fall 2016)

2015-16 Courses

Dissertation

MATH 920 (Spring 2016)
Independent Study

MCB 599 (Spring 2016)
Research

MCB 900 (Spring 2016)
Thesis

MCB 910 (Spring 2016)

Scholarly Contributions

Chapters

Mannakee, B. K., Ragsdale, A. P., Transtrum, M. K., & Gutenkunst, R. N. (2016). Sloppiness and the geometry of parameter space. In Uncertainty in Biology: a Computational Modeling Approach. Springer International.

Journals/Publications

Struck, T. J., Vaughn, A. H., Daigle, A., Ray, D. D., Noskova, E., Sequeira, J. J., Antonets, S., Alekseevskaya, E., Grigoreva, E., Raines, E., McMaster, E. S., Kovacs, T. G., Ragsdale, A. P., Moreno-Estrada, A., Lotterhos, K. E., Siepel, A., & Gutenkunst, R. N. (2025). GHIST 2024: The 1st Genomic History Inference Strategies Tournament. bioRxiv : the preprint server for biology.
More info
Evaluating population genetic inference methods is challenging due to the complexity of evolutionary histories, potential model misspecification, and unconscious biases in self-assessment. The Genomic History Inference Strategies Tournament (GHIST) is a community-driven competition designed to evaluate methods for inferring evolutionary history from population genomic data. The inaugural GHIST competition ran from July to November 2024 and featured four demographic history inference challenges of varying complexity: a bottleneck model, a split with isolation model, a secondary contact model with demographic complexity, and an archaic admixture model. Data were provided as error-free VCF files, and participants submitted numerical parameter estimates that were scored by relative root mean squared error. Approximately 60 participants competed, using diverse approaches. Results revealed the current dominance of methods based on site frequency spectra, while highlighting the advantages of flexible model-building approaches for complex demographic histories. We discuss insights regarding the competition and outline the next iteration, which is ongoing with expanded challenge diversity. By providing standardized benchmarks and highlighting areas for improvement, GHIST represents a substantial step toward more reliable inference of evolutionary history from genomic data.
Prata, K. E., Riginos, C., Gutenkunst, R. N., Latijnhouwers, K. R., Sánchez, J. A., Englebert, N., Hay, K. B., & Bongaerts, P. (2022). Deep connections: Divergence histories with gene flow in mesophotic Agaricia corals. Molecular ecology.
More info
Largely understudied, mesophotic coral ecosystems lie below shallow reefs (at >30 m depth) and comprise ecologically distinct communities. Brooding reproductive modes appear to predominate among mesophotic-specialist corals and may limit genetic connectivity among populations. Using reduced representation genomic sequencing, we assessed spatial population genetic structure at 50 m depth in an ecologically important mesophotic-specialist species Agaricia grahamae, among locations in the Southern Caribbean. We also tested for hybridisation with the closely related (but depth-generalist) species Agaricia lamarcki, within their sympatric depth zone (50 m). In contrast to our expectations, no spatial genetic structure was detected between the reefs of Curaçao and Bonaire (~40 km apart) within A. grahamae. However, cryptic taxa were discovered within both taxonomic species, with those in A. lamarcki (incompletely) partitioned by depth and those in A. grahamae occurring sympatrically (at the same depth). Hybrid analyses and demographic modelling identified contemporary and historical gene flow among cryptic taxa, both within and between A. grahamae and A. lamarcki. These results (1) indicate that spatial connectivity and subsequent replenishment may be possible between islands of moderate geographic distances for A. grahamae, an ecologically important mesophotic species, (2) that cryptic taxa occur in the mesophotic zone and environmental selection along shallow to mesophotic depth gradients may drive divergence in depth-generalists such as A. lamarcki, and (3) highlight that gene flow links taxa within this relativity diverse Caribbean genus.
Shaheen, M. F., Tse, J. Y., Sokol, E. S., Masterson, M., Bansal, P., Rabinowitz, I., Tarleton, C. A., Dobroff, A. S., Smith, T. L., Bocklage, T. J., Mannakee, B. K., Gutenkunst, R. N., Bischoff, J. E., Ness, S. A., Riedlinger, G. M., Groisberg, R., Pasqualini, R., Ganesan, S., & Arap, W. (2022). Genomic Landscape of Lymphatic Malformations: A Case Series and Response to the PI3Kα Inhibitor Alpelisib in an N-of-One Clinical Trial. medRxiv, 2022.01.03.21267856.
Blischak, P. D., Barker, M. S., & Gutenkunst, R. N. (2021). Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks. Molecular ecology resources, 21(8), 2676-2688.
More info
Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P , P ), P ), Out) and a matrix of pairwise nucleotide divergence (d ) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.
Gutenkunst, R. N. (2021). dadi.CUDA: Accelerating Population Genetics Inference with Graphics Processing Units. Molecular biology and evolution, 38, 2177.
More info
dadi is a popular but computationally intensive program for inferring models of demographic history and natural selection from population genetic data. I show that running dadi on a Graphics Processing Unit can dramatically speed computation compared to the CPU implementation, with minimal user burden. Motivated by this speed increase, I also extended dadi to four- and five-population models. This functionality is available in dadi version 2.1.0, https://bitbucket.org/gutenkunstlab/dadi/.
Huang, X., Fortier, A. L., Coffman, A. J., Struck, T. J., Irby, M. N., James, J. E., León-Burguete, J. E., Ragsdale, A. P., & Gutenkunst, R. N. (2021). Inferring Genome-Wide Correlations of Mutation Fitness Effects between Populations. Molecular biology and evolution, 38(10), 4588-4602.
More info
The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.
Marchi, N., Winkelbach, L., Schulz, I., Brami, M., Hofmanová, Z., Blöcher, J., Reyna-Blanco, C. S., Diekmann, Y., Thiéry, A., Kapopoulou, A., Link, V., Piuz, V., Kreutzer, S., Figarska, S. M., Ganiatsou, E., Pukaj, A., Struck, T. J., Gutenkunst, R. N., Karul, N., , Gerritsen, F., et al. (2021). Demogenomic modeling of the timing and the processes of early European farmers differentiation. bioRxiv, 2020.11.23.394502.
Adrion, J. R., Cole, C. B., Dukler, N., Galloway, J. G., Gladstein, A. L., Gower, G., Kyriazis, C. C., Ragsdale, A. P., Tsambos, G., Baumdicker, F., Carlson, J., Cartwright, R. A., Durvasula, A., Gronau, I., Kim, B. Y., McKenzie, P., Messer, P. W., Noskova, E., Ortega-Del Vecchyo, D., , Racimo, F., et al. (2020). A community-maintained standard library of population genetic models. eLife, 9.
More info
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Blischak, P. D., Barker, M. S., & Gutenkunst, R. N. (2020). Inferring the Demographic History of Inbred Species from Genome-Wide SNP Frequency Data. Molecular biology and evolution, 37(7), 2124-2136.
More info
Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model's ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.
Mahadevan, D., Hammer, M. F., Mannakee, B. K., Gutenkunst, R. N., Placencia, C., Ramos, K., Lau, B., Johnstone, L., Chalasani, P., Babiker, H. M., Perkins, B., & Sprissler, R. S. (2020). Whole exome sequencing of rare tumor and matched germline DNA identifies somatic and inherited variants of clinical significance. Cancers.
Mannakee, B. K., & Gutenkunst, R. N. (2020). BATCAVE: calling somatic mutations with a tumor- and site-specific prior. NAR genomics and bioinformatics, 2(1), lqaa004.
More info
Detecting somatic mutations withins tumors is key to understanding treatment resistance, patient prognosis and tumor evolution. Mutations at low allelic frequency, those present in only a small portion of tumor cells, are particularly difficult to detect. Many algorithms have been developed to detect such mutations, but none models a key aspect of tumor biology. Namely, every tumor has its own profile of mutation types that it tends to generate. We present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), an algorithm that first learns the individual tumor mutational profile and mutation rate then uses them in a prior for evaluating potential mutations. We also present an R implementation of the algorithm, built on the popular caller MuTect. Using simulations, we show that adding the BATCAVE algorithm to MuTect improves variant detection. It also improves the calibration of posterior probabilities, enabling more principled tradeoff between precision and recall. We also show that BATCAVE performs well on real data. Our implementation is computationally inexpensive and straightforward to incorporate into existing MuTect pipelines. More broadly, the algorithm can be added to other variant callers, and it can be extended to include additional biological features that affect mutation generation.
Mannakee, B. K., Balaji, U., Witkiewicz, A. K., Gutenkunst, R. N., & Knudsen, E. S. (2018). Sensitive and specific post-call filtering of genetic variants in xenograft and primary tumors. Bioinformatics (Oxford, England), 34(10), 1713-1718.
More info
Tumor genome sequencing offers great promise for guiding research and therapy, but spurious variant calls can arise from multiple sources. Mouse contamination can generate many spurious calls when sequencing patient-derived xenografts. Paralogous genome sequences can also generate spurious calls when sequencing any tumor. We developed a BLAST-based algorithm, Mouse And Paralog EXterminator (MAPEX), to identify and filter out spurious calls from both these sources.
Struck, T. J., Mannakee, B. K., & Gutenkunst, R. N. (2018). The impact of genome-wide association studies on biomedical research publications. Human genomics, 12(1), 38.
More info
The past decade has seen major investment in genome-wide association studies (GWAS). Among the many goals of GWAS, a major one is to identify and motivate research on novel genes involved in complex human disease. To assess whether this goal is being met, we quantified the effect of GWAS on the overall distribution of biomedical research publications and on the subsequent publication history of genes newly associated with complex disease. We found that the historical skew of publications toward genes involved in Mendelian disease has not changed since the advent of GWAS. Genes newly implicated by GWAS in complex disease do experience additional publications compared to control genes, and they are more likely to become exceptionally studied. But the magnitude of both effects has declined over the past decade. Our results suggest that reforms to encourage follow-up studies may be needed for GWAS to most successfully guide biomedical research toward the molecular mechanisms underlying complex human disease.
Stuck, T. J., Mannakee, B. K., & Gutenkunst, R. N. (2018). The impact of genome-wide association studies on biomedical research publications. Human Genomics.
Hsieh, P., Hallmark, B., Watkins, J. C., Karafet, T. C., Osipova, L. P., Gutenkunst, R. N., & Hammer, M. F. (2017). Exome sequencing provides evidence of polygenic adaptation to a fat-rich animal diet in indigenous Siberian populations. Molecular Biology and Evolution, 34, 2914.
Hsieh, P., Hallmark, B., Watkins, J., Karafet, T. M., Osipova, L. P., Gutenkunst, R. N., & Hammer, M. F. (2017). Exome Sequencing Provides Evidence of Polygenic Adaptation to a Fat-Rich Animal Diet in Indigenous Siberian Populations. Molecular biology and evolution, 34(11), 2913-2926.
More info
Siberia is one of the coldest environments on Earth and has great seasonal temperature variation. Long-term settlement in northern Siberia undoubtedly required biological adaptation to severe cold stress, dramatic variation in photoperiod, and limited food resources. In addition, recent archeological studies show that humans first occupied Siberia at least 45,000 years ago; yet our understanding of the demographic history of modern indigenous Siberians remains incomplete. In this study, we use whole-exome sequencing data from the Nganasans and Yakuts to infer the evolutionary history of these two indigenous Siberian populations. Recognizing the complexity of the adaptive process, we designed a model-based test to systematically search for signatures of polygenic selection. Our approach accounts for stochasticity in the demographic process and the hitchhiking effect of classic selective sweeps, as well as potential biases resulting from recombination rate and mutation rate heterogeneity. Our demographic inference shows that the Nganasans and Yakuts diverged ∼12,000-13,000 years ago from East-Asian ancestors in a process involving continuous gene flow. Our polygenic selection scan identifies seven candidate gene sets with Siberian-specific signals. Three of these gene sets are related to diet, especially to fat metabolism, consistent with the hypothesis of adaptation to a fat-rich animal diet. Additional testing rejects the effect of hitchhiking and favors a model in which selection yields small allele frequency changes at multiple unlinked genes.
Qi, X., An, H., Ragsdale, A. P., Hall, T. E., Gutenkunst, R. N., Chris Pires, J., & Barker, M. S. (2017). Genomic inferences of domestication events are corroborated by written records in Brassica rapa. Molecular ecology, 26(13), 3373-3388.
More info
Demographic modelling is often used with population genomic data to infer the relationships and ages among populations. However, relatively few analyses are able to validate these inferences with independent data. Here, we leverage written records that describe distinct Brassica rapa crops to corroborate demographic models of domestication. Brassica rapa crops are renowned for their outstanding morphological diversity, but the relationships and order of domestication remain unclear. We generated genomewide SNPs from 126 accessions collected globally using high-throughput transcriptome data. Analyses of more than 31,000 SNPs across the B. rapa genome revealed evidence for five distinct genetic groups and supported a European-Central Asian origin of B. rapa crops. Our results supported the traditionally recognized South Asian and East Asian B. rapa groups with evidence that pak choi, Chinese cabbage and yellow sarson are likely monophyletic groups. In contrast, the oil-type B. rapa subsp. oleifera and brown sarson were polyphyletic. We also found no evidence to support the contention that rapini is the wild type or the earliest domesticated subspecies of B. rapa. Demographic analyses suggested that B. rapa was introduced to Asia 2,400-4,100 years ago, and that Chinese cabbage originated 1,200-2,100 years ago via admixture of pak choi and European-Central Asian B. rapa. We also inferred significantly different levels of founder effect among the B. rapa subspecies. Written records from antiquity that document these crops are consistent with these inferences. The concordance between our age estimates of domestication events with historical records provides unique support for our demographic inferences.
Qi, X., An, H., Ragsdale, A. P., Hall, T. E., Gutenkunst, R. N., Pires, J. C., & Barker, M. S. (2017). Genome wide analyses of diverse Brassica rapa cultivars reveal significant genetic structure and corroborate historical record of domestication. Molecular Ecology, 206, 315.
Ragsdale, A. P., & Gutenkunst, R. N. (2017). Inferring Demographic History Using Two-Locus Statistics. Genetics, 206(2), 1037-1048.
More info
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference.
Ragsdale, A. P., & Gutenkunst, R. N. (2017). Inferring demographic history using two-locus statistics. Genetics, 206, 1037.
Edwards, T., Tollis, M., Hsieh, P., Gutenkunst, R. N., Liu, Z., Kusumi, K., Culver, M., & Murphy, R. W. (2016). Assessing models of speciation under different biogeographic scenarios; an empirical study using multi-locus and RNA-seq analyses. Ecology and evolution, 6(2), 379-96.
More info
Evolutionary biology often seeks to decipher the drivers of speciation, and much debate persists over the relative importance of isolation and gene flow in the formation of new species. Genetic studies of closely related species can assess if gene flow was present during speciation, because signatures of past introgression often persist in the genome. We test hypotheses on which mechanisms of speciation drove diversity among three distinct lineages of desert tortoise in the genus Gopherus. These lineages offer a powerful system to study speciation, because different biogeographic patterns (physical vs. ecological segregation) are observed at opposing ends of their distributions. We use 82 samples collected from 38 sites, representing the entire species' distribution and generate sequence data for mtDNA and four nuclear loci. A multilocus phylogenetic analysis in *BEAST estimates the species tree. RNA-seq data yield 20,126 synonymous variants from 7665 contigs from two individuals of each of the three lineages. Analyses of these data using the demographic inference package ∂a∂i serve to test the null hypothesis of no gene flow during divergence. The best-fit demographic model for the three taxa is concordant with the *BEAST species tree, and the ∂a∂i analysis does not indicate gene flow among any of the three lineages during their divergence. These analyses suggest that divergence among the lineages occurred in the absence of gene flow and in this scenario the genetic signature of ecological isolation (parapatric model) cannot be differentiated from geographic isolation (allopatric model).
Hsieh, P., Veeramah, K. R., Lachance, J., Tishkoff, S. A., Wall, J. D., Hammer, M. F., & Gutenkunst, R. N. (2016). Whole-genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection. Genome research, 26(3), 279-90.
More info
African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 yr ago. We also find that bidirectional asymmetric gene flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors.
Hsieh, P., Woerner, A. E., Wall, J. D., Lachance, J., Tishkoff, S. A., Gutenkunst, R. N., & Hammer, M. F. (2016). Model-based analyses of whole-genome data reveal a complex evolutionary history involving archaic introgression in Central African Pygmies. Genome research, 26(3), 291-300.
More info
Comparisons of whole-genome sequences from ancient and contemporary samples have pointed to several instances of archaic admixture through interbreeding between the ancestors of modern non-Africans and now extinct hominids such as Neanderthals and Denisovans. One implication of these findings is that some adaptive features in contemporary humans may have entered the population via gene flow with archaic forms in Eurasia. Within Africa, fossil evidence suggests that anatomically modern humans (AMH) and various archaic forms coexisted for much of the last 200,000 yr; however, the absence of ancient DNA in Africa has limited our ability to make a direct comparison between archaic and modern human genomes. Here, we use statistical inference based on high coverage whole-genome data (greater than 60×) from contemporary African Pygmy hunter-gatherers as an alternative means to study the evolutionary history of the genus Homo. Using whole-genome simulations that consider demographic histories that include both isolation and gene flow with neighboring farming populations, our inference method rejects the hypothesis that the ancestors of AMH were genetically isolated in Africa, thus providing the first whole genome-level evidence of African archaic admixture. Our inferences also suggest a complex human evolutionary history in Africa, which involves at least a single admixture event from an unknown archaic population into the ancestors of AMH, likely within the last 30,000 yr.
Lynch, M., Gutenkunst, R., Ackerman, M., Spitze, K., Ye, Z., Maruki, T., & Jia, Z. (2017). Population Genomics of Daphnia pulex. Genetics, 206, 315.
More info
Using data from 83 isolates from a single population, the population genomics of the microcrustacean Daphnia pulex are described and compared to current knowledge for the only other well-studied invertebrate, Drosophila melanogaster These two species are quite similar with respect to effective population sizes and mutation rates, although some features of recombination appear to be different, with linkage disequilibrium being elevated at short (< 100 bp) distances in D. melanogaster and at long distances in D. pulex The study population adheres closely to the expectations under Hardy-Weinberg equilibrium, and reflects a past population history of no more than a two-fold range of variation in effective population size. Four-fold redundant silent sites and a restricted region of intronic sites appear to evolve in a nearly neutral fashion, providing a powerful tool for population-genetic analyses. Amino-acid replacement sites are predominantly under strong purifying selection, as are a large fraction of sites in UTRs and intergenic regions, but the majority of SNPs at such sites that rise to frequencies > 0:05 appear to evolve in a nearly neutral fashion. All forms of genomic sites (including replacement sites within codons, and intergenic and UTR regions) appear to be experiencing an ~ 2x higher level of selection scaled to the power of drift in D. melanogaster, but this may in part be a consequence of recent demographic changes. These results establish D. pulex as an excellent system for future work on the evolutionary genomics of natural populations.
Mannakee, B. K., & Gutenkunst, R. N. (2016). Selection on Network Dynamics Drives Differential Rates of Protein Domain Evolution. PLoS genetics, 12(7), e1006132.
More info
The long-held principle that functionally important proteins evolve slowly has recently been challenged by studies in mice and yeast showing that the severity of a protein knockout only weakly predicts that protein's rate of evolution. However, the relevance of these studies to evolutionary changes within proteins is unknown, because amino acid substitutions, unlike knockouts, often only slightly perturb protein activity. To quantify the phenotypic effect of small biochemical perturbations, we developed an approach to use computational systems biology models to measure the influence of individual reaction rate constants on network dynamics. We show that this dynamical influence is predictive of protein domain evolutionary rate within networks in vertebrates and yeast, even after controlling for expression level and breadth, network topology, and knockout effect. Thus, our results not only demonstrate the importance of protein domain function in determining evolutionary rate, but also the power of systems biology modeling to uncover unanticipated evolutionary forces.
Ragsdale, A. P., Coffman, A. J., Hsieh, P., Struck, T. J., & Gutenkunst, R. N. (2016). Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations. Genetics, 203(1), 513-23.
More info
The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.
Coffman, A. J., Hsieh, P. H., Gravel, S., & Gutenkunst, R. N. (2016). Computationally Efficient Composite Likelihood Statistics for Demographic Inference. Molecular biology and evolution, 35, 591.
More info
Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.
Edwards, T., Tollis, M., Hsieh, P., Gutenkunst, R. N., Liu, Z., Kusumi, K., Culver, M., & Murphy, R. W. (2016). Assessing models of speciation under different biogeographic scenarios; an empirical study using multi-locus and RNA-seq analyses. Ecology and Evolution.
Hermansen, R. A., Mannakee, B. K., Knecht, W., Liberles, D. A., & Gutenkunst, R. N. (2015). Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC Evolutionary Biology, 15.
Hermansen, R. A., Mannakee, B. K., Knecht, W., Liberles, D. A., & Gutenkunst, R. N. (2015). Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC evolutionary biology, 15, 232.
More info
Selection on proteins is typically measured with the assumption that each protein acts independently. However, selection more likely acts at higher levels of biological organization, requiring an integrative view of protein function. Here, we built a kinetic model for de novo pyrimidine biosynthesis in the yeast Saccharomyces cerevisiae to relate pathway function to selective pressures on individual protein-encoding genes.
Hsieh, P., Veeramah, K. R., Lachance, J., Tishkoff, S. A., Wall, J. D., Hammer, M. F., & Gutenkunst, R. N. (2016). Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection. Genome Research.
Wall, J. D., Lachance, J., Tishkoff, S. A., Hammer, M. F., Hsieh, P., & Gutenkunst, R. N. (2016). Model-based analyses of whole genome data reveal a complex evolutionary history involving archaic introgression in Central African Pygmies. Genome Research.
Holmes, W. M., Mannakee, B. K., Gutenkunst, R. N., & Serio, T. R. (2014). Loss of amino-terminal acetylation suppresses a prion phenotype by modulating global protein folding. Nature communications, 5, 4383.
More info
Amino-terminal acetylation is among the most ubiquitous of protein modifications in eukaryotes. Although loss of N-terminal acetylation is associated with many abnormalities, the molecular basis of these effects is known for only a few cases, where acetylation of single factors has been linked to binding avidity or metabolic stability. In contrast, the impact of N-terminal acetylation for the majority of the proteome, and its combinatorial contributions to phenotypes, are unknown. Here, by studying the yeast prion [PSI(+)], an amyloid of the Sup35 protein, we show that loss of N-terminal acetylation promotes general protein misfolding, a redeployment of chaperones to these substrates, and a corresponding stress response. These proteostasis changes, combined with the decreased stability of unacetylated Sup35 amyloid, reduce the size of prion aggregates and reverse their phenotypic consequences. Thus, loss of N-terminal acetylation, and its previously unanticipated role in protein biogenesis, globally resculpts the proteome to create a unique phenotype.
Jilkine, A., & Gutenkunst, R. N. (2014). Effect of dedifferentiation on time to mutation acquisition in stem cell-driven cancers. PLoS computational biology, 10(3), e1003481.
More info
Accumulating evidence suggests that many tumors have a hierarchical organization, with the bulk of the tumor composed of relatively differentiated short-lived progenitor cells that are maintained by a small population of undifferentiated long-lived cancer stem cells. It is unclear, however, whether cancer stem cells originate from normal stem cells or from dedifferentiated progenitor cells. To address this, we mathematically modeled the effect of dedifferentiation on carcinogenesis. We considered a hybrid stochastic-deterministic model of mutation accumulation in both stem cells and progenitors, including dedifferentiation of progenitor cells to a stem cell-like state. We performed exact computer simulations of the emergence of tumor subpopulations with two mutations, and we derived semi-analytical estimates for the waiting time distribution to fixation. Our results suggest that dedifferentiation may play an important role in carcinogenesis, depending on how stem cell homeostasis is maintained. If the stem cell population size is held strictly constant (due to all divisions being asymmetric), we found that dedifferentiation acts like a positive selective force in the stem cell population and thus speeds carcinogenesis. If the stem cell population size is allowed to vary stochastically with density-dependent reproduction rates (allowing both symmetric and asymmetric divisions), we found that dedifferentiation beyond a critical threshold leads to exponential growth of the stem cell population. Thus, dedifferentiation may play a crucial role, the common modeling assumption of constant stem cell population size may not be adequate, and further progress in understanding carcinogenesis demands a more detailed mechanistic understanding of stem cell homeostasis.
Pandya, S., Struck, T. J., Mannakee, B. K., Paniscus, M., & Gutenkunst, R. N. (2015). Testing whether Metazoan Tyrosine Loss Was Driven by Selection against Promiscuous Phosphorylation. Molecular biology and evolution, 32, 144.
More info
Protein tyrosine phosphorylation is a key regulatory modification in metazoans, and the corresponding kinase enzymes have diversified dramatically. This diversification is correlated with a genome-wide reduction in protein tyrosine content, and it was recently suggested that this reduction was driven by selection to avoid promiscuous phosphorylation that might be deleterious. We tested three predictions of this intriguing hypothesis. 1) Selection should be stronger on residues that are more likely to be phosphorylated due to local solvent accessibility or structural disorder. 2) Selection should be stronger on proteins that are more likely to be promiscuously phosphorylated because they are abundant. We tested these predictions by comparing distributions of tyrosine within and among human and yeast orthologous proteins. 3) Selection should be stronger against mutations that create tyrosine versus remove tyrosine. We tested this prediction using human population genomic variation data. We found that all three predicted effects are modest for tyrosine when compared with the other amino acids, suggesting that selection against deleterious phosphorylation was not dominant in driving metazoan tyrosine loss.
Robinson, J. D., Coffman, A. J., Hickerson, M. J., & Gutenkunst, R. N. (2014). Sampling strategies for frequency spectrum-based population genomic inference. BMC evolutionary biology, 14(1), 254.
More info
BackgroundThe allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module ¿a¿i, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.ResultsOur simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.ConclusionsHere, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.
Veeramah, K. R., Gutenkunst, R. N., Woerner, A. E., Watkins, J. C., & Hammer, M. F. (2014). Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans. Molecular biology and evolution, 31(9), 2267-82.
More info
Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.
Ma, X., Kelley, J. L., Eilertson, K., Musharoff, S., Degenhardt, J. D., Martins, A. L., Vinar, T., Kosiol, C., Siepel, A., Gutenkunst, R. N., & Bustamante, C. D. (2013). Population genomic analysis reveals a rich speciation and demographic history of orang-utans (Pongo pygmaeus and Pongo abelii). PloS one, 8(10), e77175.
More info
To gain insights into evolutionary forces that have shaped the history of Bornean and Sumatran populations of orang-utans, we compare patterns of variation across more than 11 million single nucleotide polymorphisms found by previous mitochondrial and autosomal genome sequencing of 10 wild-caught orang-utans. Our analysis of the mitochondrial data yields a far more ancient split time between the two populations (~3.4 million years ago) than estimates based on autosomal data (0.4 million years ago), suggesting a complex speciation process with moderate levels of primarily male migration. We find that the distribution of selection coefficients consistent with the observed frequency spectrum of autosomal non-synonymous polymorphisms in orang-utans is similar to the distribution in humans. Our analysis indicates that 35% of genes have evolved under detectable negative selection. Overall, our findings suggest that purifying natural selection, genetic drift, and a complex demographic history are the dominant drivers of genome evolution for the two orang-utan populations.
Smith, A. M., Adler, F. R., Ribeiro, R. M., Gutenkunst, R. N., McAuley, J. L., McCullers, J. A., & Perelson, A. S. (2013). Kinetics of Coinfection with Influenza A Virus and Streptococcus pneumoniae. PLoS Pathogens, 9(3).
More info
PMID: 23555251;PMCID: PMC3605146;Abstract: Secondary bacterial infections are a leading cause of illness and death during epidemic and pandemic influenza. Experimental studies suggest a lethal synergism between influenza and certain bacteria, particularly Streptococcus pneumoniae, but the precise processes involved are unclear. To address the mechanisms and determine the influences of pathogen dose and strain on disease, we infected groups of mice with either the H1N1 subtype influenza A virus A/Puerto Rico/8/34 (PR8) or a version expressing the 1918 PB1-F2 protein (PR8-PB1-F2(1918)), followed seven days later with one of two S. pneumoniae strains, type 2 D39 or type 3 A66.1. We determined that, following bacterial infection, viral titers initially rebound and then decline slowly. Bacterial titers rapidly rise to high levels and remain elevated. We used a kinetic model to explore the coupled interactions and study the dominant controlling mechanisms. We hypothesize that viral titers rebound in the presence of bacteria due to enhanced viral release from infected cells, and that bacterial titers increase due to alveolar macrophage impairment. Dynamics are affected by initial bacterial dose but not by the expression of the influenza 1918 PB1-F2 protein. Our model provides a framework to investigate pathogen interaction during coinfections and to uncover dynamical differences based on inoculum size and strain. © 2013 Smith et al.
Smith, A. M., Adler, F. R., Ribeiro, R. M., Gutenkunst, R. N., McAuley, J. L., McCullers, J. A., & Perelson, A. S. (2013). Kinetics of coinfection with influenza A virus and Streptococcus pneumoniae. PLoS pathogens, 9(3), e1003238.
More info
Secondary bacterial infections are a leading cause of illness and death during epidemic and pandemic influenza. Experimental studies suggest a lethal synergism between influenza and certain bacteria, particularly Streptococcus pneumoniae, but the precise processes involved are unclear. To address the mechanisms and determine the influences of pathogen dose and strain on disease, we infected groups of mice with either the H1N1 subtype influenza A virus A/Puerto Rico/8/34 (PR8) or a version expressing the 1918 PB1-F2 protein (PR8-PB1-F2(1918)), followed seven days later with one of two S. pneumoniae strains, type 2 D39 or type 3 A66.1. We determined that, following bacterial infection, viral titers initially rebound and then decline slowly. Bacterial titers rapidly rise to high levels and remain elevated. We used a kinetic model to explore the coupled interactions and study the dominant controlling mechanisms. We hypothesize that viral titers rebound in the presence of bacteria due to enhanced viral release from infected cells, and that bacterial titers increase due to alveolar macrophage impairment. Dynamics are affected by initial bacterial dose but not by the expression of the influenza 1918 PB1-F2 protein. Our model provides a framework to investigate pathogen interaction during coinfections and to uncover dynamical differences based on inoculum size and strain.
Xin, M. a., Kelley, J. L., Eilertson, K., Musharoff, S., Degenhardt, J. D., Martins, A. L., Vinar, T., Kosiol, C., Siepel, A., Gutenkunst, R. N., & Bustamante, C. D. (2013). Population Genomic Analysis Reveals a Rich Speciation and Demographic History of Orang-utans (Pongo pygmaeus and Pongo abelii). PLoS ONE, 8(10).
More info
PMID: 24194868;PMCID: PMC3806739;Abstract: To gain insights into evolutionary forces that have shaped the history of Bornean and Sumatran populations of orang-utans, we compare patterns of variation across more than 11 million single nucleotide polymorphisms found by previous mitochondrial and autosomal genome sequencing of 10 wild-caught orang-utans. Our analysis of the mitochondrial data yields a far more ancient split time between the two populations (∼3.4 million years ago) than estimates based on autosomal data (0.4 million years ago), suggesting a complex speciation process with moderate levels of primarily male migration. We find that the distribution of selection coefficients consistent with the observed frequency spectrum of autosomal non-synonymous polymorphisms in orang-utans is similar to the distribution in humans. Our analysis indicates that 35% of genes have evolved under detectable negative selection. Overall, our findings suggest that purifying natural selection, genetic drift, and a complex demographic history are the dominant drivers of genome evolution for the two orang-utan populations. © 2013 Ma et al.
Chylek, L. A., Bin, H. u., Blinov, M. L., Emonet, T., Faeder, J. R., Goldstein, B., Gutenkunst, R. N., Haugh, J. M., Lipniacki, T., Posner, R. G., Yang, J., & Hlavacek, W. S. (2011). Guidelines for visualizing and annotating rule-based models. Molecular BioSystems, 7(10), 2779-2795.
More info
PMID: 21647530;PMCID: PMC3168731;Abstract: Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models. © 2011 The Royal Society of Chemistry.
Chylek, L. A., Hu, B., Blinov, M. L., Emonet, T., Faeder, J. R., Goldstein, B., Gutenkunst, R. N., Haugh, J. M., Lipniacki, T., Posner, R. G., Yang, J., & Hlavacek, W. S. (2011). Guidelines for visualizing and annotating rule-based models. Molecular bioSystems, 7(10), 2779-95.
More info
Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models.
Gravel, S., Henn, B. M., Gutenkunst, R. N., Indap, A. R., Marth, G. T., Clark, A. G., Fuli, Y. u., Gibbs, R. A., & Bustamante, C. D. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences of the United States of America, 108(29), 11983-11988.
More info
PMID: 21730125;PMCID: PMC3142009;Abstract: High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Gravel, S., Henn, B. M., Gutenkunst, R. N., Indap, A. R., Marth, G. T., Clark, A. G., Yu, F., Gibbs, R. A., , 1. G., & Bustamante, C. D. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences of the United States of America, 108(29), 11983-8.
More info
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Gutenkunst, R. N., Coombs, D., Starr, T., Dustin, M. L., & Goldstein, B. (2011). A biophysical model of cell adhesion mediated by immunoadhesin drugs and antibodies. PLoS ONE, 6(5).
More info
PMID: 21629715;PMCID: PMC3100730;Abstract: A promising direction in drug development is to exploit the ability of natural killer cells to kill antibody-labeled target cells. Monoclonal antibodies and drugs designed to elicit this effect typically bind cell-surface epitopes that are overexpressed on target cells but also present on other cells. Thus it is important to understand adhesion of cells by antibodies and similar molecules. We present an equilibrium model of such adhesion, incorporating heterogeneity in target cell epitope density, nonspecific adhesion forces, and epitope immobility. We compare with experiments on the adhesion of Jurkat T cells to bilayers containing the relevant natural killer cell receptor, with adhesion mediated by the drug alefacept. We show that a model in which all target cell epitopes are mobile and available is inconsistent with the data, suggesting that more complex mechanisms are at work. We hypothesize that the immobile epitope fraction may change with cell adhesion, and we find that such a model is more consistent with the data, although discrepancies remain. We also quantitatively describe the parameter space in which binding occurs. Our model elaborates substantially on previous work, and our results offer guidance for the refinement of therapeutic immunoadhesins. Furthermore, our comparison with data from Jurkat T cells also points toward mechanisms relating epitope immobility to cell adhesion. © 2011 Gutenkunst et al.
Gutenkunst, R. N., Coombs, D., Starr, T., Dustin, M. L., & Goldstein, B. (2011). A biophysical model of cell adhesion mediated by immunoadhesin drugs and antibodies. PloS one, 6(5), e19701.
More info
A promising direction in drug development is to exploit the ability of natural killer cells to kill antibody-labeled target cells. Monoclonal antibodies and drugs designed to elicit this effect typically bind cell-surface epitopes that are overexpressed on target cells but also present on other cells. Thus it is important to understand adhesion of cells by antibodies and similar molecules. We present an equilibrium model of such adhesion, incorporating heterogeneity in target cell epitope density, nonspecific adhesion forces, and epitope immobility. We compare with experiments on the adhesion of Jurkat T cells to bilayers containing the relevant natural killer cell receptor, with adhesion mediated by the drug alefacept. We show that a model in which all target cell epitopes are mobile and available is inconsistent with the data, suggesting that more complex mechanisms are at work. We hypothesize that the immobile epitope fraction may change with cell adhesion, and we find that such a model is more consistent with the data, although discrepancies remain. We also quantitatively describe the parameter space in which binding occurs. Our model elaborates substantially on previous work, and our results offer guidance for the refinement of therapeutic immunoadhesins. Furthermore, our comparison with data from Jurkat T cells also points toward mechanisms relating epitope immobility to cell adhesion.
Locke, D. P., Hillier, L. W., Warren, W. C., Worley, K. C., Nazareth, L. V., Muzny, D. M., Yang, S., Wang, Z., Chinwalla, A. T., Minx, P., Mitreva, M., Cook, L., Delehaunty, K. D., Fronick, C., Schmidt, H., Fulton, L. A., Fulton, R. S., Nelson, J. O., Magrini, V., , Pohl, C., et al. (2011). Comparative and demographic analysis of orang-utan genomes. Nature, 469(7331), 529-33.
More info
'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
Locke, D. P., Hillier, L. W., Warren, W. C., Worley, K. C., Nazareth, L. V., Muzny, D. M., Yang, S., Wang, Z., Chinwalla, A. T., Minx, P., Mitreva, M., Cook, L., Delehaunty, K. D., Fronick, C., Schmidt, H., Fulton, L. A., Fulton, R. S., Nelson, J. O., Magrini, V., , Pohl, C., et al. (2011). Comparative and demographic analysis of orang-utan genomes. Nature, 469(7331), 529-533.
More info
PMID: 21270892;PMCID: PMC3060778;Abstract: Orang-utan- is derived from a Malay term meaning man of the forest- and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N e) expanded exponentially relative to the ancestral N e after the split, while Bornean N e declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts. © 2011 Macmillan Publishers Limited. All rights reserved.
Skar, H., Gutenkunst, R. N., Ramsay, K. W., Alaeus, A., Albert, J., & Leitner, T. (2011). Daily sampling of an HIV-1 patient with slowly progressing disease displays persistence of multiple env subpopulations consistent with neutrality. PLoS ONE, 6(8).
More info
PMID: 21829600;PMCID: PMC3149046;Abstract: The molecular evolution of HIV-1 is characterized by frequent substitutions, indels and recombination events. In addition, a HIV-1 population may adapt through frequency changes of its variants. To reveal such population dynamics we analyzed HIV-1 subpopulation frequencies in an untreated patient with stable, low plasma HIV-1 RNA levels and close to normal CD4+ T-cell levels. The patient was intensively sampled during a 32-day period as well as approximately 1.5 years before and after this period (days -664, 1, 2, 3, 11, 18, 25, 32 and 522). 77 sequences of HIV-1 env (approximately 3100 nucleotides) were obtained from plasma by limiting dilution with 7-11 sequences per time point, except day -664. Phylogenetic analysis using maximum likelihood methods showed that the sequences clustered in six distinct subpopulations. We devised a method that took into account the relatively coarse sampling of the population. Data from days 1 through 32 were consistent with constant within-patient subpopulation frequencies. However, over longer time periods, i.e. between days 1...32 and 522, there were significant changes in subpopulation frequencies, which were consistent with evolutionarily neutral fluctuations. We found no clear signal of natural selection within the subpopulations over the study period, but positive selection was evident on the long branches that connected the subpopulations, which corresponds to >3 years as the subpopulations already were established when we started the study. Thus, selective forces may have been involved when the subpopulations were established. Genetic drift within subpopulations caused by de novo substitutions could be resolved after approximately one month. Overall, we conclude that subpopulation frequencies within this patient changed significantly over a time period of 1.5 years, but that this does not imply directional or balancing selection. We show that the short-term evolution we study here is likely representative for many patients of slow and normal disease progression.
Skar, H., Gutenkunst, R. N., Wilbe Ramsay, K., Alaeus, A., Albert, J., & Leitner, T. (2011). Daily sampling of an HIV-1 patient with slowly progressing disease displays persistence of multiple env subpopulations consistent with neutrality. PloS one, 6(8), e21747.
More info
The molecular evolution of HIV-1 is characterized by frequent substitutions, indels and recombination events. In addition, a HIV-1 population may adapt through frequency changes of its variants. To reveal such population dynamics we analyzed HIV-1 subpopulation frequencies in an untreated patient with stable, low plasma HIV-1 RNA levels and close to normal CD4+ T-cell levels. The patient was intensively sampled during a 32-day period as well as approximately 1.5 years before and after this period (days -664, 1, 2, 3, 11, 18, 25, 32 and 522). 77 sequences of HIV-1 env (approximately 3100 nucleotides) were obtained from plasma by limiting dilution with 7-11 sequences per time point, except day -664. Phylogenetic analysis using maximum likelihood methods showed that the sequences clustered in six distinct subpopulations. We devised a method that took into account the relatively coarse sampling of the population. Data from days 1 through 32 were consistent with constant within-patient subpopulation frequencies. However, over longer time periods, i.e. between days 1...32 and 522, there were significant changes in subpopulation frequencies, which were consistent with evolutionarily neutral fluctuations. We found no clear signal of natural selection within the subpopulations over the study period, but positive selection was evident on the long branches that connected the subpopulations, which corresponds to >3 years as the subpopulations already were established when we started the study. Thus, selective forces may have been involved when the subpopulations were established. Genetic drift within subpopulations caused by de novo substitutions could be resolved after approximately one month. Overall, we conclude that subpopulation frequencies within this patient changed significantly over a time period of 1.5 years, but that this does not imply directional or balancing selection. We show that the short-term evolution we study here is likely representative for many patients of slow and normal disease progression.
Smith, A. M., Adler, F. R., McAuley, J. L., Gutenkunst, R. N., Ribeiro, R. M., McCullers, J. A., & Perelson, A. S. (2011). Effect of 1918 PB1-F2 expression on influenza A virus infection kinetics. PLoS computational biology, 7(2), e1001081.
More info
Relatively little is known about the viral factors contributing to the lethality of the 1918 pandemic, although its unparalleled virulence was likely due in part to the newly discovered PB1-F2 protein. This protein, while unnecessary for replication, increases apoptosis in monocytes, alters viral polymerase activity in vitro, enhances inflammation and increases secondary pneumonia in vivo. However, the effects the PB1-F2 protein have in vivo remain unclear. To address the mechanisms involved, we intranasally infected groups of mice with either influenza A virus PR8 or a genetically engineered virus that expresses the 1918 PB1-F2 protein on a PR8 background, PR8-PB1-F2(1918). Mice inoculated with PR8 had viral concentrations peaking at 72 hours, while those infected with PR8-PB1-F2(1918) reached peak concentrations earlier, 48 hours. Mice given PR8-PB1-F2(1918) also showed a faster decline in viral loads. We fit a mathematical model to these data to estimate parameter values. The model supports a higher viral production rate per cell and a higher infected cell death rate with the PR8-PB1-F2(1918) virus. We discuss the implications these mechanisms have during an infection with a virus expressing a virulent PB1-F2 on the possibility of a pandemic and on the importance of antiviral treatments.
Smith, A. M., Adler, F. R., McAuley, J. L., Gutenkunst, R. N., Ribeiro, R. M., McCullers, J. A., & Perelson, A. S. (2011). Effect of 1918 PB1-F2 expression on influenza a virus infection kinetics. PLoS Computational Biology, 7(2).
More info
PMID: 21379324;PMCID: PMC3040654;Abstract: Relatively little is known about the viral factors contributing to the lethality of the 1918 pandemic, although its unparalleled virulence was likely due in part to the newly discovered PB1-F2 protein. This protein, while unnecessary for replication, increases apoptosis in monocytes, alters viral polymerase activity in vitro, enhances inflammation and increases secondary pneumonia in vivo. However, the effects the PB1-F2 protein have in vivo remain unclear. To address the mechanisms involved, we intranasally infected groups of mice with either influenza A virus PR8 or a genetically engineered virus that expresses the 1918 PB1-F2 protein on a PR8 background, PR8-PB1-F2(1918). Mice inoculated with PR8 had viral concentrations peaking at 72 hours, while those infected with PR8-PB1-F2(1918) reached peak concentrations earlier, 48 hours. Mice given PR8-PB1-F2(1918) also showed a faster decline in viral loads. We fit a mathematical model to these data to estimate parameter values. The model supports a higher viral production rate per cell and a higher infected cell death rate with the PR8-PB1-F2(1918) virus. We discuss the implications these mechanisms have during an infection with a virus expressing a virulent PB1-F2 on the possibility of a pandemic and on the importance of antiviral treatments.
Xu, X., Liu, X., Ge, S., Jensen, J. D., Hu, F., Li, X., Dong, Y., Gutenkunst, R. N., Fang, L., Huang, L., Li, J., He, W., Zhang, G., Zheng, X., Zhang, F., Li, Y., Yu, C., Kristiansen, K., Zhang, X., , Wang, J., et al. (2011). Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nature biotechnology, 30(1), 105-11.
More info
Rice is a staple crop that has undergone substantial phenotypic and physiological changes during domestication. Here we resequenced the genomes of 40 cultivated accessions selected from the major groups of rice and 10 accessions of their wild progenitors (Oryza rufipogon and Oryza nivara) to >15 × raw data coverage. We investigated genome-wide variation patterns in rice and obtained 6.5 million high-quality single nucleotide polymorphisms (SNPs) after excluding sites with missing data in any accession. Using these population SNP data, we identified thousands of genes with significantly lower diversity in cultivated but not wild rice, which represent candidate regions selected during domestication. Some of these variants are associated with important biological features, whereas others have yet to be functionally characterized. The molecular markers we have identified should be valuable for breeding and for identifying agronomically important genes in rice.
Altshuler, D. L., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., M., F., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D. A., , Peltonen, L., et al. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073.
More info
PMID: 20981092;PMCID: PMC3042601;Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10 g-8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. © 2010 Macmillan Publishers Limited. All rights reserved. © 2010 Macmillan Publishers Limited. All rights reserved.
Colvin, J., Monine, M. I., Gutenkunst, R. N., Hlavacek, W. S., D., D., & Posner, R. G. (2010). RuleMonkey: Software for stochastic simulation of rule-based models. BMC Bioinformatics, 11.
More info
PMID: 20673321;PMCID: PMC2921409;Abstract: Background: The system-level dynamics of many molecular interactions, particularly protein-protein interactions, can be conveniently represented using reaction rules, which can be specified using model-specification languages, such as the BioNetGen language (BNGL). A set of rules implicitly defines a (bio)chemical reaction network. The reaction network implied by a set of rules is often very large, and as a result, generation of the network implied by rules tends to be computationally expensive. Moreover, the cost of many commonly used methods for simulating network dynamics is a function of network size. Together these factors have limited application of the rule-based modeling approach. Recently, several methods for simulating rule-based models have been developed that avoid the expensive step of network generation. The cost of these "network-free" simulation methods is independent of the number of reactions implied by rules. Software implementing such methods is now needed for the simulation and analysis of rule-based models of biochemical systems.Results: Here, we present a software tool called RuleMonkey, which implements a network-free method for simulation of rule-based models that is similar to Gillespie's method. The method is suitable for rule-based models that can be encoded in BNGL, including models with rules that have global application conditions, such as rules for intramolecular association reactions. In addition, the method is rejection free, unlike other network-free methods that introduce null events, i.e., steps in the simulation procedure that do not change the state of the reaction system being simulated. We verify that RuleMonkey produces correct simulation results, and we compare its performance against DYNSTOC, another BNGL-compliant tool for network-free simulation of rule-based models. We also compare RuleMonkey against problem-specific codes implementing network-free simulation methods.Conclusions: RuleMonkey enables the simulation of rule-based models for which the underlying reaction networks are large. It is typically faster than DYNSTOC for benchmark problems that we have examined. RuleMonkey is freely available as a stand-alone application http://public.tgen.org/rulemonkey. It is also available as a simulation engine within GetBonNie, a web-based environment for building, analyzing and sharing rule-based models. © 2010 Colvin et al; licensee BioMed Central Ltd.
Colvin, J., Monine, M. I., Gutenkunst, R. N., Hlavacek, W. S., Von Hoff, D. D., & Posner, R. G. (2010). RuleMonkey: software for stochastic simulation of rule-based models. BMC bioinformatics, 11, 404.
More info
The system-level dynamics of many molecular interactions, particularly protein-protein interactions, can be conveniently represented using reaction rules, which can be specified using model-specification languages, such as the BioNetGen language (BNGL). A set of rules implicitly defines a (bio)chemical reaction network. The reaction network implied by a set of rules is often very large, and as a result, generation of the network implied by rules tends to be computationally expensive. Moreover, the cost of many commonly used methods for simulating network dynamics is a function of network size. Together these factors have limited application of the rule-based modeling approach. Recently, several methods for simulating rule-based models have been developed that avoid the expensive step of network generation. The cost of these "network-free" simulation methods is independent of the number of reactions implied by rules. Software implementing such methods is now needed for the simulation and analysis of rule-based models of biochemical systems.
Andrés, A. M., Hubisz, M. J., Indap, A., Torgerson, D. G., Degenhardt, J. D., Boyko, A. R., Gutenkunst, R. N., White, T. J., Green, E. D., Bustamante, C. D., Clark, A. G., & Nielsen, R. (2009). Targets of balancing selection in the human genome. Molecular Biology and Evolution, 26(12), 2755-2764.
More info
PMID: 19713326;PMCID: PMC2782326;Abstract: Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
Andrés, A. M., Hubisz, M. J., Indap, A., Torgerson, D. G., Degenhardt, J. D., Boyko, A. R., Gutenkunst, R. N., White, T. J., Green, E. D., Bustamante, C. D., Clark, A. G., & Nielsen, R. (2009). Targets of balancing selection in the human genome. Molecular biology and evolution, 26(12), 2755-64.
More info
Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
Auton, A., Bryc, K., Boyko, A. R., Lohmueller, K. E., Novembre, J., Reynolds, A., Indap, A., Wright, M. H., Degenhardt, J. D., Gutenkunst, R. N., King, K. S., Nelson, M. R., & Bustamante, C. D. (2009). Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Research, 19(5), 795-803.
More info
PMID: 19218534;PMCID: PMC2675968;Abstract: Characterizing patterns of genetic variation within and among human populations is important for understanding human evolutionary history and for careful design of medical genetic studies. Here, we analyze patterns of variation across 443,434 single nucleotide polymorphisms (SNPs) genotyped in 3845 individuals from four continental regions. This unique resource allows us to illuminate patterns of diversity in previously under-studied populations at the genome-wide scale including Latin America, South Asia, and Southern Europe. Key insights afforded by our analysis include quantifying the degree of admixture in a large collection of individuals from Guadalajara, Mexico; identifying language and geography as key determinants of population structure within India; and elucidating a north-south gradient in haplotype diversity within Europe. We also present a novel method for identifying long-range tracts of homozygosity indicative of recent common ancestry. Application of our approach suggests great variation within and among populations in the extent of homozygosity, suggesting both demographic history (such as population bottlenecks) and recent ancestry events (such as consanguinity) play an important role in patterning variation in large modern human populations. © 2009 by Cold Spring Harbor Laboratory Press.
Auton, A., Bryc, K., Boyko, A. R., Lohmueller, K. E., Novembre, J., Reynolds, A., Indap, A., Wright, M. H., Degenhardt, J. D., Gutenkunst, R. N., King, K. S., Nelson, M. R., & Bustamante, C. D. (2009). Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome research, 19(5), 795-803.
More info
Characterizing patterns of genetic variation within and among human populations is important for understanding human evolutionary history and for careful design of medical genetic studies. Here, we analyze patterns of variation across 443,434 single nucleotide polymorphisms (SNPs) genotyped in 3845 individuals from four continental regions. This unique resource allows us to illuminate patterns of diversity in previously under-studied populations at the genome-wide scale including Latin America, South Asia, and Southern Europe. Key insights afforded by our analysis include quantifying the degree of admixture in a large collection of individuals from Guadalajara, Mexico; identifying language and geography as key determinants of population structure within India; and elucidating a north-south gradient in haplotype diversity within Europe. We also present a novel method for identifying long-range tracts of homozygosity indicative of recent common ancestry. Application of our approach suggests great variation within and among populations in the extent of homozygosity, suggesting both demographic history (such as population bottlenecks) and recent ancestry events (such as consanguinity) play an important role in patterning variation in large modern human populations.
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics, 5(10).
More info
PMID: 19851460;PMCID: PMC2760211;Abstract: Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40-270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17-43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3-26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS genetics, 5(10), e1000695.
More info
Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40-270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17-43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3-26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).
Nielsen, R., Hubisz, M. J., Hellmann, I., Torgerson, D., Andrés, A. M., Albrechtsen, A., Gutenkunst, R., Adams, M. D., Cargill, M., Boyko, A., Indap, A., Bustamante, C. D., & Clark, A. G. (2009). Darwinian and demographic forces affecting human protein coding genes. Genome Research, 19(5), 838-849.
More info
PMID: 19279335;PMCID: PMC2675972;Abstract: Past demographic changes can produce distortions in patterns of genetic variation that can mimic the appearance of natural selection unless the demographic effects are explicitly removed. Here we fit a detailed model of human demography that incorporates divergence, migration, admixture, and changes in population size to directly sequenced data from 13,400 protein coding genes from 20 European-American and 19 African-American individuals. Based on this demographic model, we use several new and established statistical methods for identifying genes with extreme patterns of polymorphism likely to be caused by Darwinian selection, providing the first genome-wide analysis of allele frequency distributions in humans based on directly sequenced data. The tests are based on observations of excesses of high frequency-derived alleles, excesses of low frequency-derived alleles, and excesses of differences in allele frequencies between populations. We detect numerous new genes with strong evidence of selection, including a number of genes related to psychiatric and other diseases. We also show that microRNA controlled genes evolve under extremely high constraints and are more likely to undergo negative selection than other genes. Furthermore, we show that genes involved in muscle development have been subject to positive selection during recent human history. In accordance with previous studies, we find evidence for negative selection against mutations in genes associated with Mendelian disease and positive selection acting on genes associated with several complex diseases. © 2009 by Cold Spring Harbor Laboratory Press.
Nielsen, R., Hubisz, M. J., Hellmann, I., Torgerson, D., Andrés, A. M., Albrechtsen, A., Gutenkunst, R., Adams, M. D., Cargill, M., Boyko, A., Indap, A., Bustamante, C. D., & Clark, A. G. (2009). Darwinian and demographic forces affecting human protein coding genes. Genome research, 19(5), 838-49.
More info
Past demographic changes can produce distortions in patterns of genetic variation that can mimic the appearance of natural selection unless the demographic effects are explicitly removed. Here we fit a detailed model of human demography that incorporates divergence, migration, admixture, and changes in population size to directly sequenced data from 13,400 protein coding genes from 20 European-American and 19 African-American individuals. Based on this demographic model, we use several new and established statistical methods for identifying genes with extreme patterns of polymorphism likely to be caused by Darwinian selection, providing the first genome-wide analysis of allele frequency distributions in humans based on directly sequenced data. The tests are based on observations of excesses of high frequency-derived alleles, excesses of low frequency-derived alleles, and excesses of differences in allele frequencies between populations. We detect numerous new genes with strong evidence of selection, including a number of genes related to psychiatric and other diseases. We also show that microRNA controlled genes evolve under extremely high constraints and are more likely to undergo negative selection than other genes. Furthermore, we show that genes involved in muscle development have been subject to positive selection during recent human history. In accordance with previous studies, we find evidence for negative selection against mutations in genes associated with Mendelian disease and positive selection acting on genes associated with several complex diseases.
Casey, F. P., Waterfall, J. J., Gutenkunst, R. N., Myers, C. R., & Sethna, J. P. (2008). Variational method for estimating the rate of convergence of Markov-chain Monte Carlo algorithms. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 78(4).
More info
Abstract: We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov chain Monte Carlo (MCMC) algorithms as a function of the target density and proposal density. The bound relies on approximating the second largest eigenvalue in the spectrum of the MCMC operator using a variational principle and the approach is applicable to problems with continuous state spaces. We apply the method to one dimensional examples with Gaussian and quartic target densities, and we contrast the performance of the random walk Metropolis-Hastings algorithm with a "smart" variant that incorporates gradient information into the trial moves, a generalization of the Metropolis adjusted Langevin algorithm. We find that the variational method agrees quite closely with numerical simulations. We also see that the smart MCMC algorithm often fails to converge geometrically in the tails of the target density except in the simplest case we examine, and even then care must be taken to choose the appropriate scaling of the deterministic and random parts of the proposed moves. Again, this calls into question the utility of smart MCMC in more complex problems. Finally, we apply the same method to approximate the rate of convergence in multidimensional Gaussian problems with and without importance sampling. There we demonstrate the necessity of importance sampling for target densities which depend on variables with a wide range of scales. © 2008 The American Physical Society.
Casey, F. P., Waterfall, J. J., Gutenkunst, R. N., Myers, C. R., & Sethna, J. P. (2008). Variational method for estimating the rate of convergence of Markov-chain Monte Carlo algorithms. Physical review. E, Statistical, nonlinear, and soft matter physics, 78(4 Pt 2), 046704.
More info
We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov chain Monte Carlo (MCMC) algorithms as a function of the target density and proposal density. The bound relies on approximating the second largest eigenvalue in the spectrum of the MCMC operator using a variational principle and the approach is applicable to problems with continuous state spaces. We apply the method to one dimensional examples with Gaussian and quartic target densities, and we contrast the performance of the random walk Metropolis-Hastings algorithm with a "smart" variant that incorporates gradient information into the trial moves, a generalization of the Metropolis adjusted Langevin algorithm. We find that the variational method agrees quite closely with numerical simulations. We also see that the smart MCMC algorithm often fails to converge geometrically in the tails of the target density except in the simplest case we examine, and even then care must be taken to choose the appropriate scaling of the deterministic and random parts of the proposed moves. Again, this calls into question the utility of smart MCMC in more complex problems. Finally, we apply the same method to approximate the rate of convergence in multidimensional Gaussian problems with and without importance sampling. There we demonstrate the necessity of importance sampling for target densities which depend on variables with a wide range of scales.
Daniels, B. C., Chen, Y., Sethna, J. P., Gutenkunst, R. N., & Myers, C. R. (2008). Sloppiness, robustness, and evolvability in systems biology. Current Opinion in Biotechnology, 19(4), 389-395.
More info
PMID: 18620054;Abstract: The functioning of many biochemical networks is often robust - remarkably stable under changes in external conditions and internal reaction parameters. Much recent work on robustness and evolvability has focused on the structure of neutral spaces, in which system behavior remains invariant to mutations. Recently we have shown that the collective behavior of multiparameter models is most often sloppy: insensitive to changes except along a few 'stiff' combinations of parameters, with an enormous sloppy neutral subspace. Robustness is often assumed to be an emergent evolved property, but the sloppiness natural to biochemical networks offers an alternative nonadaptive explanation. Conversely, ideas developed to study evolvability in robust systems can be usefully extended to characterize sloppy systems. © 2008 Elsevier Ltd. All rights reserved.
Daniels, B. C., Chen, Y., Sethna, J. P., Gutenkunst, R. N., & Myers, C. R. (2008). Sloppiness, robustness, and evolvability in systems biology. Current opinion in biotechnology, 19(4), 389-95.
More info
The functioning of many biochemical networks is often robust-remarkably stable under changes in external conditions and internal reaction parameters. Much recent work on robustness and evolvability has focused on the structure of neutral spaces, in which system behavior remains invariant to mutations. Recently we have shown that the collective behavior of multiparameter models is most often sloppy: insensitive to changes except along a few 'stiff' combinations of parameters, with an enormous sloppy neutral subspace. Robustness is often assumed to be an emergent evolved property, but the sloppiness natural to biochemical networks offers an alternative nonadaptive explanation. Conversely, ideas developed to study evolvability in robust systems can be usefully extended to characterize sloppy systems.
Casey, F. P., Baird, D., Feng, Q., Gutenkunst, R. N., Waterfall, J. J., Myers, C. R., Brown, K. S., Cerione, R. A., & Sethna, J. P. (2007). Optimal experimental design in an epidermal growth factor receptor signalling and down-regulation model. IET Systems Biology, 1(3), 190-202.
More info
PMID: 17591178;Abstract: We apply the methods of optimal experimental design to a differential equation model for epidermal growth factor receptor signalling, trafficking and down-regulation. The model incorporates the role of a recently discovered protein complex made up of the E3 ubiquitin ligase, Cbl, the guanine exchange factor (GEF), Cool-1 (β-Pix) and the Rho family G protein Cdc42. The complex has been suggested to be important in disrupting receptor down-regulation. We demonstrate that the model interactions can accurately reproduce the experimental observations, that they can be used to make predictions with accompanying uncertainties, and that we can apply ideas of optimal experimental design to suggest new experiments that reduce the uncertainty on unmeasurable components of the system. © 2007 The Institution of Engineering and Technology.
Casey, F. P., Baird, D., Feng, Q., Gutenkunst, R. N., Waterfall, J. J., Myers, C. R., Brown, K. S., Cerione, R. A., & Sethna, J. P. (2007). Optimal experimental design in an epidermal growth factor receptor signalling and down-regulation model. IET systems biology, 1(3), 190-202.
More info
We apply the methods of optimal experimental design to a differential equation model for epidermal growth factor receptor signalling, trafficking and down-regulation. The model incorporates the role of a recently discovered protein complex made up of the E3 ubiquitin ligase, Cbl, the guanine exchange factor (GEF), Cool-1 (beta -Pix) and the Rho family G protein Cdc42. The complex has been suggested to be important in disrupting receptor down-regulation. We demonstrate that the model interactions can accurately reproduce the experimental observations, that they can be used to make predictions with accompanying uncertainties, and that we can apply ideas of optimal experimental design to suggest new experiments that reduce the uncertainty on unmeasurable components of the system.
Gutenkunst, R. N., Casey, F. P., Waterfall, J. J., Myers, C. R., & Sethna, J. P. (2007). Extracting falsifiable predictions from sloppy models. Annals of the New York Academy of Sciences, 1115, 203-11.
More info
Successful predictions are among the most compelling validations of any model. Extracting falsifiable predictions from nonlinear multiparameter models is complicated by the fact that such models are commonly sloppy, possessing sensitivities to different parameter combinations that range over many decades. Here we discuss how sloppiness affects the sorts of data that best constrain model predictions, makes linear uncertainty approximations dangerous, and introduces computational difficulties in Monte-Carlo uncertainty analysis. We also present a useful test problem and suggest refinements to the standards by which models are communicated.
Gutenkunst, R. N., Casey, F. P., Waterfall, J. J., Myers, C. R., & Sethna, J. P. (2007). Extracting falsifiable predictions from sloppy models. Annals of the New York Academy of Sciences, 1115, 203-211.
More info
PMID: 17925353;Abstract: Successful predictions are among the most compelling validations of any model. Extracting falsifiable predictions from nonlinear multiparameter models is complicated by the fact that such models are commonly sloppy, possessing sensitivities to different parameter combinations that range over many decades. Here we discuss how sloppiness affects the sorts of data that best constrain model predictions, makes linear uncertainty approximations dangerous, and introduces computational difficulties in Monte-Carlo uncertainty analysis. We also present a useful test problem and suggest refinements to the standards by which models are communicated. © 2007 New York Academy of Sciences.
Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R., & Sethna, J. P. (2007). Universally sloppy parameter sensitivities in systems biology models. PLoS Computational Biology, 3(10), 1871-1878.
More info
PMID: 17922568;PMCID: PMC2000971;Abstract: Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor- signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a "sloppy" spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our model collection are consistent with this suggestion. This difficulty with collective fits may seem to argue for direct parameter measurements, but sloppiness also implies that such measurements must be formidably precise and complete to usefully constrain many model predictions. We confirm this implication in our growth-factor-signaling model. Our results suggest that sloppy sensitivity spectra are universal in systems biology models. The prevalence of sloppiness highlights the power of collective fits and suggests that modelers should focus on predictions rather than on parameters. © 2007 Gutenkunst et al.
Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R., & Sethna, J. P. (2007). Universally sloppy parameter sensitivities in systems biology models. PLoS computational biology, 3(10), 1871-78.
More info
Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a "sloppy" spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our model collection are consistent with this suggestion. This difficulty with collective fits may seem to argue for direct parameter measurements, but sloppiness also implies that such measurements must be formidably precise and complete to usefully constrain many model predictions. We confirm this implication in our growth-factor-signaling model. Our results suggest that sloppy sensitivity spectra are universal in systems biology models. The prevalence of sloppiness highlights the power of collective fits and suggests that modelers should focus on predictions rather than on parameters.
Gutenkunst, R., Newlands, N., Lutcavage, M., & Edelstein-Keshet, L. (2007). Inferring resource distributions from Atlantic bluefin tuna movements: An analysis based on net displacement and length of track. Journal of Theoretical Biology, 245(2), 243-257.
More info
PMID: 17140603;Abstract: We use observed movement tracks of Atlantic bluefin tuna in the Gulf of Maine and mathematical modeling of this movement to identify possible resource patches. We infer bounds on the overall sizes and distribution of such patches, even though they are difficult to quantify by direct observation in situ. To do so, we segment individual fish tracks into intervals of distinct motion types based on the ratio of net displacement to length of track (Δ D / Δ L) over a time window Δ t. To find the best segmentation, we optimize the fit of a random-walk movement model to each motion type. We compare results from two distinct movement models: biased turning and biased speed, to check the model-dependence of our inferences, and find that uncertainty in choice of movement model dominates the uncertainties of our conclusions. We find that our data are best described using two motion types: "localized" (Δ D / Δ L small) and "long-ranged" (Δ D / Δ L large). The biased turning model leads to significantly better resolution of localized movement intervals than the biased speed model. We hypothesize that localized movement corresponds to exploitation of resource patches. Comparison with visual behavior observations made during tracking suggests that many inferred intervals of localized motion do indeed correspond to feeding activity. From our analysis, we estimate that, on average, bluefin tuna in the Gulf of Maine encounter a resource patch every 2 h, that those patches have an average radius of 0.7-1.2 km, and that, overall, there are at most 5-9 such patches per 100 km2 in the region studied. © 2006 Elsevier Ltd. All rights reserved.
Gutenkunst, R., Newlands, N., Lutcavage, M., & Edelstein-Keshet, L. (2007). Inferring resource distributions from Atlantic bluefin tuna movements: an analysis based on net displacement and length of track. Journal of theoretical biology, 245(2), 243-57.
More info
We use observed movement tracks of Atlantic bluefin tuna in the Gulf of Maine and mathematical modeling of this movement to identify possible resource patches. We infer bounds on the overall sizes and distribution of such patches, even though they are difficult to quantify by direct observation in situ. To do so, we segment individual fish tracks into intervals of distinct motion types based on the ratio of net displacement to length of track (DeltaD/DeltaL) over a time window Deltat. To find the best segmentation, we optimize the fit of a random-walk movement model to each motion type. We compare results from two distinct movement models: biased turning and biased speed, to check the model-dependence of our inferences, and find that uncertainty in choice of movement model dominates the uncertainties of our conclusions. We find that our data are best described using two motion types: "localized" (DeltaD/DeltaL small) and "long-ranged" (DeltaD/DeltaL large). The biased turning model leads to significantly better resolution of localized movement intervals than the biased speed model. We hypothesize that localized movement corresponds to exploitation of resource patches. Comparison with visual behavior observations made during tracking suggests that many inferred intervals of localized motion do indeed correspond to feeding activity. From our analysis, we estimate that, on average, bluefin tuna in the Gulf of Maine encounter a resource patch every 2h, that those patches have an average radius of 0.7-1.2 km, and that, overall, there are at most 5-9 such patches per 100 km(2) in the region studied.
Myers, C. R., Gutenkunst, R. N., & Sethna, J. P. (2007). Python unleashed on systems biology. Computing in Science and Engineering, 9(3), 34-37.
More info
Abstract: Cornell University has developed an open source software system called SloppyCell, written in Python, to model biomolecular reaction networks. SloppyCell improves standard dynamical modeling by focusing on inference of model parameters from data and quantification of the uncertainties of model prediction. An important role in the software is to combine together many diverse modules that provide specific functionality. NumPy and SciPy were used for numeric, particularly for integrating differential equations, optimizing parameters by least squares fits to data, and analyzing the Hessian matrix about a best-fit set of parameters. Models are read and written in a standardized XML-based file format and the Systems Biology Markup Language (SBML) with assistance from a Python interface to the libSBML library.
Waterfall, J. J., Casey, F. P., Gutenkunst, R. N., Brown, K. S., Myers, C. R., Brouwer, P. W., Elser, V., & Sethna, J. P. (2006). Sloppy-model universality class and the Vandermonde matrix. Physical review letters, 97(15), 150601.
More info
In a variety of contexts, physicists study complex, nonlinear models with many unknown or tunable parameters to explain experimental data. We explain why such systems so often are sloppy: the system behavior depends only on a few "stiff" combinations of the parameters and is unchanged as other "sloppy" parameter combinations vary by orders of magnitude. We observe that the eigenvalue spectra for the sensitivity of sloppy models have a striking, characteristic form with a density of logarithms of eigenvalues which is roughly constant over a large range. We suggest that the common features of sloppy models indicate that they may belong to a common universality class. In particular, we motivate focusing on a Vandermonde ensemble of multiparameter nonlinear models and show in one limit that they exhibit the universal features of sloppy models.
Waterfall, J. J., Casey, F. P., Gutenkunst, R. N., Brown, K. S., Myers, C. R., Brouwer, P. W., Elser, V., & Sethna, J. P. (2006). Sloppy-model universality class and the vandermonde matrix. Physical Review Letters, 97(15).
More info
PMID: 17155311;Abstract: In a variety of contexts, physicists study complex, nonlinear models with many unknown or tunable parameters to explain experimental data. We explain why such systems so often are sloppy: the system behavior depends only on a few "stiff" combinations of the parameters and is unchanged as other "sloppy" parameter combinations vary by orders of magnitude. We observe that the eigenvalue spectra for the sensitivity of sloppy models have a striking, characteristic form with a density of logarithms of eigenvalues which is roughly constant over a large range. We suggest that the common features of sloppy models indicate that they may belong to a common universality class. In particular, we motivate focusing on a Vandermonde ensemble of multiparameter nonlinear models and show in one limit that they exhibit the universal features of sloppy models. © 2006 The American Physical Society.
Black, E. D., & Gutenkunst, R. N. (2003). An introduction to signal extraction in interferometric gravitational wave detectors. American Journal of Physics, 71(4), 365-378.
More info
Abstract: In the very near future gravitational wave astronomy is expected to become a reality, giving us a completely new tool for exploring the universe around us. We provide an introduction to how interferometric gravitational wave detectors work, suitable for students entering the field and teachers who wish to cover the subject matter in an advanced undergraduate or beginning graduate level course. © 2003 American Association of Physics Teachers.

Presentations

Gutenkunst, R. N. (2021). DFEnitely Different: Correlations of mutation fitness effects across populations and proteins. Washington State University, School of Biological Sciences.
Gutenkunst, R. N. (2018, Spring). Mutation fitness effects acrosspopulations and proteins. Bioinformatics Forum. Philadelphia, PA: University of Pennsylvania.
Gutenkunst, R. N. (2018, Spring). Mutation fitness effects acrosspopulations and proteins. Institute for Genomics and Evolutionary Medicine. Philadelphia, PA: Temple University.
Gutenkunst, R. N. (2018, Summer). Joint Distribution of Fitness Effects. Population, Evolutionary, and Quantitative Genetics.
Gutenkunst, R. N. (2018, Winter). DFEnitely DifferentThe joint distribution of mutation fitness effects between populations. Probabilistic Modeling in Genetics. Cold Spring Harbor Laboratory.
Gutenkunst, R. N. (2018, Winter). Elephants in the Room Challenges in inferring demographic history. PopSim meeting.
Gutenkunst, R. N. (2017, October). Inferring natural selection on proteins using network- and population- scale models. Systems Biology seminar, Boston University (invited departmental seminar).
Gutenkunst, R. N. (2017, Spring). Inferring natural selectionfrom network-and population-scale models. Center for Bioinformatics Research, Indiana University (invited departmental seminar).
Gutenkunst, R. N. (2016, March). Inferring selection via population- and network-scale models. Department of Physics, Brigham Young University (invited departmental seminar).
Gutenkunst, R. N. (2016, Spring). Demographic history inference using dadi. Workshop on Population and Speciation Genomics (teaching at international workshop).
Gutenkunst, R. N. (2016, Summer). Selection on network dynamics constrains protein evolution in signaling and metabolic networks. Symposium on Cell Signaling (invited conference presentation). Santa Fe, NM.
Gutenkunst, R. N. (2015, March). Inferring selection via network- and population-scale models. Program in Computational Biology, U Pittsburgh/Carnegie Mellon U (invited departmental seminar).
Gutenkunst, R. N. (2015, Spring). Inferring selection vianetwork- and population-scale models. Department of Biology, Temple University (invited departmental seminar).
Gutenkunst, R. N. (2015, Spring). Inferring selection viapopulation- and network-scale models. Center for Computational Biology, UC-Berkeley (invited departmental seminar).
Gutenkunst, R. N. (2015, Summer). Inferring selection via population- and network-scale models. Society for Molecular Biology and Evolution annual meeting (invited international conference presentation).
Gutenkunst, R. N. (2014, Summer). Natural selection and mutations affecting biomolecular networks. Indonesian-American Kavli Frontiers of Science Symposium (international invited conference presentation). Medan, Indonesia.
Gutenkunst, R. N. (2014, Summer). Was metazoan tyrosine loss driven by selection against promiscuous phosphorylation?. Society for Molecular Biology and Evolution Annual Meeting (contributed abstract chosen for talk).
Gutenkunst, R. N. (2013, April). Parameter-rich models of biochemical networks: Fitting, prediction, and evolution. Networks Seminar, University of Houston (invited departmental seminar).
Gutenkunst, R. N. (2013, Jan). Parameters in biochemical models: Fitting, prediction, and evolution. Mathematical Biology Research Program, University of Utah (invited departmental seminar).
Gutenkunst, R. N. (2013, June). How network dynamics constrain protein evolution. Society for Mathematical Biology Annual Meeting.
Gutenkunst, R. N. (2012, Feb). Protein domain evolution is constrained by network robustness. Mathematical Biosciences Institute workshop: Robustness in Biological Systems (invited conference presentation).
Gutenkunst, R. N. (2012, Oct). Protein Domains with Greater Influence on Network Dynamics Evolve More Slowly. American Mathematical Society Fall Western Sectional Meeting (invited conference presentation).
Gutenkunst, R. N. (2011, Dec). Protein domains with Greater Influence on Network Dynamics Evolve More Slowly. Mechanisms of Protein Evolution meeting.
Gutenkunst, R. N. (2011, Feb). Sloppy modeling and natural selection in biochemical networks. Department of Engineering Sciences and Applied Mathematics, Northwestern University (invited departmental seminar).
Gutenkunst, R. N. (2011, July). Protein domains with Greater Influence on Network Dynamics Evolve More Slowly. Society for Molecular Biology and Evolution Annual Meeting (contributed abstract chosen for talk). Kyoto, Japan.
Gutenkunst, R. N. (2011, June). Population genomic inferences of natural history and natural selection in orangutans. Society for the Study of Evolution Annual Meeting.
Gutenkunst, R. N. (2010, Aug). Inferring the demographic history of multiple populations from genomic polymorphism data. Fourth Annual q-bio Conference on Cellular Information Processing (contributed abstract selected for talk).
Gutenkunst, R. N. (2010, Feb). Selection in biochemical networks and the history of human populations. Department of Biology, Boston College (invited departmental seminar).
Gutenkunst, R. N. (2010, Feb). Sloppy modeling in biochemical networks and human genetic history. Department of Physics, Emory University (invited departmental seminar).
Gutenkunst, R. N. (2010, July). A model of cell adhesion mediated by immunoadhesin drugs and antibodies. FASEB Summer Research Conference: Immunoreceptors (contributed abstract selected for talk).
Gutenkunst, R. N. (2010, June). Diffusion Approximations for Demographic Inference: ∂a∂i. iEvoBio (Informatics for Phylogenetics, Evolution and Biodiversity) conference (contributed abstract chosen for talk).
Gutenkunst, R. N. (2009, Dec). Sloppy modeling and natural selection in biochemical networks. Program in Bioinformatics and Integrative Biology, University of Massachusetts, Worcester (invited departmental seminar).
Gutenkunst, R. N. (2009, Jun). Sloppiness in biochemical modeling and evolution. Centre for Integrative Bioinformatics, Vrije University (international invited departmental seminar).
Gutenkunst, R. N. (2009, June). Sloppiness in biochemical modeling and evolution. Lorentz workshop: Data Analysis, Parameter Identification and Experimental Design in Systems Biology (international invited conference presentation). Leiden, Netherlands.
Gutenkunst, R. N. (2009, Sept). Inferring the joint demographic history of multiple populations. Banff International Research Station Workshop: New Mathematical Challenges from Molecular Biology and Genetics (international invited conference presentation). Banff, Canada.
Gutenkunst, R. N. (2008, Feb). Sloppiness in Biochemical Modeling and Evolution. Santa Fe Institute (invited departmental seminar).
Gutenkunst, R. N. (2008, Feb). Sloppiness in Biochemical Modeling and Evolution. Department of Computational and Systems Biology, University of Pittsburgh (invited departmental seminar).
Gutenkunst, R. N. (2008, Jan). Sloppiness in Biochemical Modeling and Evolution. Mathematical Biology Program, University of British Columbia (international invited departmental seminar).
Gutenkunst, R. N. (2007, Jun). “Sloppy” biochemistry and evolution. Society for Molecular Biology and Evolution Annual Meeting (contributed abstract chosen for talk).
Gutenkunst, R. N. (2005, Oct). Biological Models are Sloppy: Constraining Predictions Without Constraining Parameters. Sixth International Conference on Systems Biology (contributed abstract selected for talk).

Poster Presentations

Gutenkunst, R. N. (2013, July). Diffusion approach to the non-equilibrium two-locus samplingdistribution of allele frequencies and linkage disequilibrium. Society for Molecular Biology and Evolution Annual Meeting (contributed abstract chosen for talk).
Gutenkunst, R. N. (2008, Jun). Inferring Human History from the Joint Site-Frequency Spectrum. Cornell Postdoc Research Day (best poster presentation award).

Edit my profile

Profiles search form

Ryan N Gutenkunst

Degrees

Work Experience

Awards

Related Links

Interests

Research

Teaching

Courses

2025-26 Courses

APPL Research

Genetics and Race

Honors Thesis

Honors Thesis

Honors Thesis

MCB Journal Club

Rsrch Ecology+Evolution

Spc Tps Ecol+Evol B

APPL Research

Big Data Molecular Biology

Big Data Molecular Biology

Honors Thesis

Honors Thesis

Honors Thesis

MCB Journal Club

2024-25 Courses

Thesis

Honors Independent Study

Independent Study

MCB Journal Club

Research

Thesis

Genetics and Race

Independent Study

MCB Journal Club

Spc Tps Ecol+Evol A

2023-24 Courses

Directed Research

Dissertation

Honors Thesis

Independent Study

Big Data Molecular Biology

Big Data Molecular Biology

Directed Rsrch

Dissertation

Genetics and Race

Honors Thesis

Spc Tps Ecol+Evol B

2022-23 Courses

Dissertation

Genetics and Race

Honors Thesis

Internship

Master's Report

Research

Rsrch Ecology+Evolution

Spc Tps Ecol+Evol B

Dissertation

Genomic Medicine Colloquium

Honors Thesis

Master's Report

Quantitative Biology

Research

Scientific Communication

Spc Tps Ecol+Evol B

2021-22 Courses

Internship

Master's Report

Directed Research

Internship

Internship in Applied Biosci

Master's Report

Research

Rsrch Ecology+Evolution

Spc Tps Ecol+Evol B

Big Data Molecular Biology

Big Data Molecular Biology

Directed Research

Genomic Medicine Colloquium