Joseph C Watkins

Professor, Mathematics
Professor, BIO5 Institute
Professor, Applied Mathematics - GIDP
Professor, Genetics - GIDP
Professor, Public Health
Member of the Graduate Faculty
Professor, Statistics-GIDP

Contact

jwatkins@arizona.edu

Degrees

Ph.D. Mathematics

University of Wisconsin, Madison, Wisconsin, United States
A Central Limit Problem in Random Evolutions

M.S. Mathematics

University of Wisconsin, Madison, Wisconsin, United States
none

M.A. Mathematics

University of Tennessee, Knoxville, Tennessee, United States
Second Quantization

B.A. Mathematics

University of Tennessee, Knoxville, Tennessee, United States

Work Experience

University of Arizona, Tucson, Arizona (2007 - Ongoing)
University of Arizona, Tucson, Arizona (1996 - 2007)
University of Arizona, Tucson, Arizona (1992 - 1996)
Northwestern University, Evanston, Illinois (1987)
University of Southern California, Los Angeles, California (1986 - 1992)
Institute for Mathematics and its Applications, University of Minnesota (1985)
University of British Columbia, Vancouver, British Columbia (1982 - 1985)
Freie Universität Berlin (1980)

Interests

Teaching

probability and statistics, stochastic processes, quantitative and mathematical biology

Research

probability theory and stochastic process, statistics, applications to the life sciences, especially to genetics and genomics

Courses

2026-27 Courses

Capstone for Data Science

DATA 498D (Fall 2026)
Honors Thesis

DATA 498H (Fall 2026)
Theory of Probability

MATH 564 (Fall 2026)
Theory of Probability

STAT 564 (Fall 2026)

2025-26 Courses

Capstone for Data Science

DATA 498D (Spring 2026)
Honors Thesis

DATA 498H (Spring 2026)
Intro Statistical Method

DATA 363 (Spring 2026)
Intro Statistical Method

MATH 363 (Spring 2026)
Thesis

STAT 910 (Spring 2026)
Honors Thesis

DATA 498H (Fall 2025)
Intro Statistical Method

DATA 363 (Fall 2025)
Intro Statistical Method

MATH 363 (Fall 2025)
Senior Capstone

DATA 498 (Fall 2025)

2024-25 Courses

Honors Independent Study

MATH 499H (Spring 2025)
Honors Thesis

DATA 498H (Spring 2025)
Honors Thesis

MATH 498H (Spring 2025)
Intro Statistical Method

DATA 363 (Spring 2025)
Intro Statistical Method

MATH 363 (Spring 2025)
Honors Independent Study

MATH 399H (Fall 2024)
Honors Thesis

DATA 498H (Fall 2024)
Honors Thesis

MATH 498H (Fall 2024)
Intro Statistical Method

DATA 363 (Fall 2024)
Intro Statistical Method

MATH 363 (Fall 2024)

2023-24 Courses

Honors Thesis

DATA 498H (Spring 2024)
Honors Thesis

MATH 498H (Spring 2024)
Intro Statistical Method

DATA 363 (Spring 2024)
Intro Statistical Method

MATH 363 (Spring 2024)
Thesis

STAT 910 (Spring 2024)
Topics in Math

MATH 596A (Spring 2024)
Honors Thesis

DATA 498H (Fall 2023)
Honors Thesis

MATH 498H (Fall 2023)
Topics in Math

MATH 596A (Fall 2023)

2022-23 Courses

Honors Thesis

DATA 498H (Spring 2023)
Topics in Math

MATH 596A (Spring 2023)
Honors Thesis

DATA 498H (Fall 2022)
Topics in Math

MATH 596A (Fall 2022)

2021-22 Courses

Topics in Math

MATH 596A (Spring 2022)
Topics in Math

MATH 596A (Fall 2021)

2020-21 Courses

Thesis

STAT 910 (Spring 2021)
Topics in Math

MATH 596A (Spring 2021)
Theory of Probability

MATH 564 (Fall 2020)
Theory of Probability

STAT 564 (Fall 2020)
Topics in Math

MATH 596A (Fall 2020)

2019-20 Courses

Honors Thesis

DATA 498H (Spring 2020)
Intro to Statistical Computing

DATA 375 (Spring 2020)
Theory of Statistics

MATH 566 (Spring 2020)
Theory of Statistics

STAT 566 (Spring 2020)
Thesis

STAT 910 (Spring 2020)
Topics in Math

MATH 596A (Spring 2020)
Honors Thesis

MATH 498H (Fall 2019)
Independent Study

STAT 599 (Fall 2019)
Research

STAT 900 (Fall 2019)
Theory of Probability

MATH 564 (Fall 2019)
Theory of Probability

STAT 564 (Fall 2019)
Thesis

STAT 910 (Fall 2019)
Topics in Math

MATH 596A (Fall 2019)

2018-19 Courses

Intro Statistical Method

DATA 363 (Spring 2019)
Research

STAT 900 (Spring 2019)
Topics in Math

MATH 596A (Spring 2019)
Intro Statistical Method

MATH 363 (Fall 2018)
Research

STAT 900 (Fall 2018)
Topics in Math

MATH 596A (Fall 2018)

2017-18 Courses

Thesis

STAT 910 (Summer I 2018)
Directed Research

MATH 492 (Spring 2018)
Dissertation

STAT 920 (Spring 2018)
Independent Study

MATH 499 (Spring 2018)
Intro Statistical Method

MATH 363 (Spring 2018)
Thesis

STAT 910 (Spring 2018)
Topics in Math

MATH 596A (Spring 2018)
Dissertation

STAT 920 (Fall 2017)
Intro Statistical Method

MATH 363 (Fall 2017)
Thesis

STAT 910 (Fall 2017)
Topics in Math

MATH 596A (Fall 2017)

2016-17 Courses

Thesis

STAT 910 (Summer I 2017)
Dissertation

STAT 920 (Spring 2017)
Honors Thesis

MATH 498H (Spring 2017)
Intro Statistical Method

MATH 363 (Spring 2017)
Thesis

STAT 910 (Spring 2017)
Topics in Math

MATH 596A (Spring 2017)
Topics in Undergrad Math

MATH 396T (Spring 2017)
Directed Research

MATH 392 (Fall 2016)
Dissertation

GENE 920 (Fall 2016)
Dissertation

STAT 920 (Fall 2016)
Honors Thesis

MATH 498H (Fall 2016)
Independent Study

GENE 699 (Fall 2016)
Intro Statistical Method

MATH 363 (Fall 2016)
Thesis

STAT 910 (Fall 2016)
Topics in Math

MATH 596A (Fall 2016)

2015-16 Courses

Intro Ord Diff Equations

MATH 254 (Summer I 2016)
Directed Research

MATH 392 (Spring 2016)
Dissertation

GENE 920 (Spring 2016)
Dissertation

STAT 920 (Spring 2016)
Independent Study

STAT 599 (Spring 2016)
Research

STAT 900 (Spring 2016)
Topics in Math

MATH 596A (Spring 2016)

Scholarly Contributions

Books

Harris, T. E., Alexander, K. S., & Watkins, J. C. (1991).
Spatial stochastic processes : a festschrift in honor of Ted Harris on his seventieth birthday
.
More info
This volume has been created in honor of the seventieth birthday of Ted Harris, which was celebrated on January 11th, 1989. The papers rep resent the wide range of subfields of probability theory in which Ted has made profound and fundamental contributions. This breadth in Ted's research complicates the task of putting together in his honor a book with a unified theme. One common thread noted was the spatial, or geometric, aspect of the phenomena Ted investigated. This volume has been organized around that theme, with papers covering four major subject areas of Ted's research: branching processes, percola tion, interacting particle systems, and stochastic flows. These four topics do not. exhaust his research interests; his major work on Markov chains is commemorated in the standard technology Harris chain and Harris recurrent . The editors would like to take this opportunity to thank the speakers at the symposium and the contributors to this volume. Their enthusi astic support is a tribute to Ted Harris. We would like to express our appreciation to Annette Mosley for her efforts in typing the manuscripts and to Arthur Ogawa for typesetting the volume. Finally, we gratefully acknowledge the National Science Foundation and the University of South ern California for their financial support.

Chapters

Didelot, X., Taylor, J. E., & Watkins, J. C. (2008). A Duality Identity between a Model of Bacterial Recombination and the Wright–Fisher Diffusion. In Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz. Institute of Mathematical Statistics,. Institute of Mathematical Statistics. doi:10.1214/074921708000000453
More info
In this article, we establish, using a duality argument, an iden- tity stating that the Laplace transform of the length of a contiguous bacterial recombination region equals the probability of choosing a given allele in a sta- tionary population evolving according to the one-dimensional Wright-Fisher diffusion model. Beyond giving us an improved inferential strategy for pa- rameter estimation in bacterial recombination, the matching of the selection and recombination parameters in the identity also suggests the existence of an intriguing formal relationship between gene conversion and the ancestral selection graph.

Journals/Publications

Chilton, F., Zhang, H., Yao, G., Umans, J., Sergeant, S., Schembre, S. M., Thomson, C. A., Watkins, J. C., Liu, E., Hallmark, B. R., Johnstone, L., Hara, A., & Sun, S. (2024). Optimal Pair Matching Combined with Machine Learning Predicts that Omega-3 Fatty Acid Supplementation Markedly Reduces the Risk of Myocardial Infarction in African Americans. DK.
More info
Conflicting results from clinical trials have contributed to a lack of consensus about cardioprotective effects of omega-3 (n-3) highly unsaturated fatty acids (HUFA). Although the VITAL trial did not demonstrate an overall benefit of n-3 HUFA supplementation on composite cardiovascular disease (CVD) and cancer outcomes, it afforded an unique opportunity for a post-hoc analysis of racial/ethnic differences in the supplementation response, given the substantial enrollment of African Americans (AfAm). We employed propensity score matching to address potential covariate imbalances between AfAm and European American (EuAm) subgroups, analyzing data from 3,766 participants across both groups. Using Kaplan-Meier curves and two machine learning methodologies, we found that n-3 HUFA supplementation was significantly associated with a reduced risk of myocardial infarction (MI) exclusively in the AfAm subgroup, as evidenced by an odds ratio of 0.17 (95% CI [0.048, 0.59]). These findings indicate a potential cardioprotective benefit of n-3 supplementation in AfAm, specifically in reducing MI risks. Considering the significant association identified, further investigation through a hypothesis-driven randomized clinical trial is needed to explore the possibility of race-specific recommendations for n-3 HUFA supplementation.
Chilton, F., Zhang, H., Yao, G., Umans, J., Sergeant, S., Schembre, S. M., Thomson, C. A., Watkins, J. C., Liu, E., Hallmark, B. R., Johnstone, L., Hara, A., Sun, S., Chilton, F., Zhang, H., Yao, G., Umans, J., Sergeant, S., Schembre, S. M., , Thomson, C. A., et al. (2024). Optimal Pair Matching Combined with Machine Learning Predicts that Omega-3 Fatty Acid Supplementation Markedly Reduces the Risk of Myocardial Infarction in African Americans. Nutrients, 16(2933).
More info
Conflicting results from clinical trials have contributed to a lack of consensus about cardioprotective effects of omega-3 (n-3) highly unsaturated fatty acids (HUFA). Although the VITAL trial did not demonstrate an overall benefit of n-3 HUFA supplementation on composite cardiovascular disease (CVD) and cancer outcomes, it afforded an unique opportunity for a post-hoc analysis of racial/ethnic differences in the supplementation response, given the substantial enrollment of African Americans (AfAm). We employed propensity score matching to address potential covariate imbalances between AfAm and European American (EuAm) subgroups, analyzing data from 3,766 participants across both groups. Using Kaplan-Meier curves and two machine learning methodologies, we found that n-3 HUFA supplementation was significantly associated with a reduced risk of myocardial infarction (MI) exclusively in the AfAm subgroup, as evidenced by an odds ratio of 0.17 (95% CI [0.048, 0.59]). These findings indicate a potential cardioprotective benefit of n-3 supplementation in AfAm, specifically in reducing MI risks. Considering the significant association identified, further investigation through a hypothesis-driven randomized clinical trial is needed to explore the possibility of race-specific recommendations for n-3 HUFA supplementation.
Hack, J. B., Watkins, J. C., & Hammer, M. F. (2024). Machine learning models reveal distinct disease subgroups and improve diagnostic and prognostic accuracy for individuals with pathogenic SCN8A gain-of-function variants. Biology Open, 13(Issue 4). doi:10.1242/bio.060286
More info
Distinguishing clinical subgroups for patients suffering with diseases characterized by a wide phenotypic spectrum is essential for developing precision therapies. Patients with gain-of-function (GOF) variants in the SCN8A gene exhibit substantial clinical heterogeneity, viewed historically as a linear spectrum ranging from mild to severe. To test for hidden clinical subgroups, we applied two machine-learning algorithms to analyze a dataset of patient features collected by the International SCN8A Patient Registry. We used two research methodologies: a supervised approach that incorporated feature severity cutoffs based on clinical conventions, and an unsupervised approach employing an entirely data-driven strategy. Both approaches found statistical support for three distinct subgroups and were validated by correlation analyses using external variables. However, distinguishing features of the three subgroups within each approach were not concordant, suggesting a more complex phenotypic landscape. The unsupervised approach yielded strong support for a model involving three partially ordered subgroups rather than a linear spectrum. Application of these machine-learning approaches may lead to improved prognosis and clinical management of individuals with SCN8A GOF variants and provide insights into the underlying mechanisms of the disease.
Hack, J., Watkins, J., Schreiber, J., & Hammer, M. (2024). Patients carrying pathogenic SCN8A variants with loss- and gain-of-function effects can be classified into five subgroups exhibiting varying developmental and epileptic components of encephalopathy. Epilepsia, 65(11). doi:10.1111/epi.18118
More info
Objective: Phenotypic heterogeneity presents challenges in providing clinical care to patients with pathogenic SCN8A variants, which underly a wide disease spectrum ranging from neurodevelopmental delays without seizures to a continuum of mild to severe developmental and epileptic encephalopathies (DEEs). An important unanswered question is whether there are clinically important subgroups within this wide spectrum. Using both supervised and unsupervised machine learning (ML) approaches, we previously found statistical support for two and three subgroups associated with loss- and gain- of- function vari-ants, respectively. Here, we test the hypothesis that the unsupervised subgroups (U1–U3) are distinguished by differential contributions of developmental and epileptic components. Methods: We predicted that patients in the U1 and U2 subgroups would differ in timing of developmental delay and seizure onset, with earlier and concurrent onset of both features for the U3 subgroup. Standard statistical procedures were used to test these predictions, as well as to investigate clinically relevant associations among all five subgroups. Results: Two-population proportion and Kruskal–Wallis tests supported the hypothesis of a reversed order of developmental delay and seizure onset for patients in U1 and U2, and nearly synchronous developmental delay/seizure onset for the U3 (termed DEE) subgroup. Association testing identified subgroup variation in treatment response, frequency of initial seizure type, and comorbidities, as well as different median ages of developmental delay onset for all five subgroups. Significance: Unsupervised ML approaches discern differential developmental and epileptic components among patients with SCN8A-related epilepsy. Patients in U1 (termed developmental encephalopathy) typically gain seizure control yet rarely experience improvements in development, whereas those in U2 (termed epileptic encephalopathy) have fewer if any developmental impairments despite difficulty in achieving seizure control. This understanding improves prognosis and clinical management and provides a framework to discover mechanisms underlying variability in clinical outcome of patients with SCN8A-related disorders.
Hammer, M. F., Krzyzaniak, C. T., Bahramnejad, E., Smelser, K. J., Hack, J. B., Watkins, J. C., & Ronaldson, P. T. (2024). Sex differences in physiological response to increased neuronal excitability in a knockin mouse model of pediatric epilepsy. Clinical Science. doi:10.1042/cs20231572
Hammer, M., Bahramnejad, E., Watkins, J., & Ronaldson, P. (2024). Candesartan restores blood-brain barrier dysfunction, mitigates aberrant gene expression, and extends lifespan in a knockin mouse model of epileptogenesis. Clinical Science, 138(17). doi:10.1042/CS20240771
More info
Blockade of Angiotensin type 1 receptor (AT1R) has potential therapeutic utility in the treatment of numerous detrimental consequences of epileptogenesis, including oxidative stress, neuroinflammation, and blood-brain barrier (BBB) dysfunction.We have recently shown that many of these pathological processes play a critical role in seizure onset and propagation in the Scn8a-N1768D mouse model. Here we investigate the efficacy and potential mechanism( s) of action of candesartan (CND), an FDA-approved angiotensin receptor blocker (ARB) indicated for hypertension, in improving outcomes in this model of pediatric epilepsy. We compared length of lifespan, seizure frequency, and BBB permeability in juvenile (D/D) and adult (D/+) mice treated with CND at times after seizure onset. We performed RNAseq on hippocampal tissue to quantify differences in genome-wide patterns of transcript abundance and inferred beneficial and detrimental effects of canonical pathways identified by enrichment methods in untreated and treated mice. Our results demonstrate that treatment with CND gives rise to increased survival, longer periods of seizure freedom, and diminished BBB permeability. CND treatment also partially reversed or 'normalized' disease-induced genome-wide gene expression profiles associated with inhibition of NF-κB, TNFα, IL-6, and TGF-β signaling in juvenile and adult mice. Pathway analyses reveal that efficacy of CND is due to its known dual mechanism of action as both an AT1R antagonist and a PPARγ agonist. The robust effectiveness of CND across ages, sexes and mouse strains is a positive indication for its translation to humans and its suitability of use for clinical trials in children with SCN8A epilepsy.
Hara, A., Lu, E., Johnstone, L., Wei, M., Sun, S., Hallmark, B., Watkins, J., Zhang, H., Yao, G., & Chilton, F. (2024). Identification of an Allele-Specific Transcription Factor Binding Interaction that May Regulate PLA2G2A Gene Expression. Bioinformatics and Biology Insights, 18. doi:10.1177/11779322241261427
More info
The secreted phospholipase A2 (sPLA2) isoform, sPLA2-IIA, has been implicated in a variety of diseases and conditions, including bacteremia, cardiovascular disease, COVID-19, sepsis, adult respiratory distress syndrome, and certain cancers. Given its significant role in these conditions, understanding the regulatory mechanisms impacting its levels is crucial. Genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs), including rs11573156, that are associated with circulating levels of sPLA2-IIA. The work in the manuscript leveraged 4 publicly available datasets to investigate the mechanism by which rs11573156 influences sPLA2-IIA levels via bioinformatics and modeling analysis. Through genotype-tissue expression (GTEx), 234 expression quantitative trait loci (eQTLs) were identified for the gene that encodes for sPLA2-IIA, PLA2G2A. SNP2TFBS was used to ascertain the binding affinities between transcription factors (TFs) to both the reference and alternative alleles of identified eQTL SNPs. Subsequently, candidate TF-SNP interactions were cross-referenced with the ChIP-seq results in matched tissues from ENCODE. SP1-rs11573156 emerged as the significant TF-SNP pair in the liver. Further analysis revealed that the upregulation of PLA2G2A transcript levels through the rs11573156 variant was likely affected by tissue SP1 protein levels. Using an ordinary differential equation based on Michaelis-Menten kinetic assumptions, we modeled the dependence of PLA2G2A transcription on SP1 protein levels, incorporating the SNP influence. Collectively, our analysis strongly suggests that the difference in the binding dynamics of SP1 to different rs11573156 alleles may underlie the allele-specific PLA2G2A expression in different tissues, a mechanistic model that awaits future direct experimental validation. This mechanism likely contributes to the variation in circulating sPLA2-IIA protein levels in the human population, with implications for a wide range of human diseases.
Lu, E., Hara, A., Sun, S., Hallmark, B., Snider, J., Seeds, M., Watkins, J., McCall, C., Zhang, H., Yao, G., & Chilton, F. (2024). Temporal associations of plasma levels of the secreted phospholipase A2 family and mortality in severe COVID-19. European Journal of Immunology, 54(6). doi:10.1002/eji.202350721
More info
Previous research suggests that group IIA-secreted phospholipase A2 (sPLA2-IIA) plays a role in and predicts lethal COVID-19 disease. The current study reanalyzed a longitudinal proteomic data set to determine the temporal relationship between levels of several members of a family of sPLA2 isoforms and the severity of COVID-19 in 214 ICU patients. The levels of six secreted PLA2 isoforms, sPLA2-IIA, sPLA2-V, sPLA2-X, sPLA2-IB, sPLA2-IIC, and sPLA2-XVI, increased over the first 7 ICU days in those who succumbed to the disease but attenuated over the same time period in survivors. In contrast, a reversed pattern in sPLA2-IID and sPLA2-XIIB levels over 7 days suggests a protective role of these two isoforms. Furthermore, decision tree models demonstrated that sPLA2-IIA outperformed top-ranked cytokines and chemokines as a predictor of patient outcome. Taken together, proteomic analysis revealed temporal sPLA2 patterns that reflect the critical roles of sPLA2 isoforms in severe COVID-19 disease.
Sun, S., Hara, A., Johnstone, L., Hallmark, B., Watkins, J., Thomson, C., Schembre, S., Sergeant, S., Umans, J., Yao, G., Zhang, H., & Chilton, F. (2024). Optimal Pair Matching Combined with Machine Learning Predicts a Significant Reduction in Myocardial Infarction Risk in African Americans Following Omega-3 Fatty Acid Supplementation. Nutrients, 16(17). doi:10.3390/nu16172933
More info
Conflicting clinical trial results on omega-3 highly unsaturated fatty acids (n-3 HUFA) have prompted uncertainty about their cardioprotective effects. While the VITAL trial found no overall cardiovascular benefit from n-3 HUFA supplementation, its substantial African American (AfAm) enrollment provided a unique opportunity to explore racial differences in response to n-3 HUFA supplementation. The current observational study aimed to simulate randomized clinical trial (RCT) conditions by matching 3766 AfAm and 15,553 non-Hispanic White (NHW) individuals from the VITAL trial utilizing propensity score matching to address the limitations related to differences in confounding variables between the two groups. Within matched groups (3766 AfAm and 3766 NHW), n-3 HUFA supplementation’s impact on myocardial infarction (MI), stroke, and cardiovascular disease (CVD) mortality was assessed. A weighted decision tree analysis revealed belonging to the n-3 supplementation group as the most significant predictor of MI among AfAm but not NHW. Further logistic regression using the LASSO method and bootstrap estimation of standard errors indicated n-3 supplementation significantly lowered MI risk in AfAm (OR 0.17, 95% CI [0.048, 0.60]), with no such effect in NHW. This study underscores the critical need for future RCT to explore racial disparities in MI risk associated with n-3 HUFA supplementation and highlights potential causal differences between supplementation health outcomes in AfAm versus NHW populations.
Andrews, J., Galindo, M. K., Hack, J. B., Watkins, J. C., Conecker, G. A., & Hammer, M. F. (2023).
The International SCN8A Patient Registry: A Scientific Resource to Advance the Understanding and Treatment of a Rare Pediatric Neurodevelopmental Syndrome.
. Journal of Registry Management, 50.
More info
Genetic variants in the SCN8A gene underlie a wide spectrum of neurodevelopmental phenotypes that range from severe epileptic encephalopathy to benign familial infantile epilepsy to neurodevelopmental delays with or without seizures. A host of additional comorbidities also contribute to the phenotypic spectrum. As a result of the recent identification of the genetic etiology and the length of time it often takes to diagnose patients, little data are available on the natural history of these conditions. The International SCN8A Patient Registry was developed in 2015 to fill gaps in understanding the spectrum of the disease and its natural history, as well as the lived experiences of individuals with SCN8A syndrome. Another goal of the registry is to collect longitudinal data from participants on a regular basis. In this article, we describe the construction and structure of the International SCN8A Patient Registry, present the type of information available, and highlight particular analyses that demonstrate how registry data can provide insights into the clinical management of SCN8A syndrome.
Bahramnejad, E., Barney, E. R., Lester, S., Hurtado, A., Thompson, T., Watkins, J. C., & Hammer, M. F. (2023).
Greater female than male resilience to mortality and morbidity in the Scn8a mouse model of pediatric epilepsy
. International Journal of Neuroscience, 1-13. doi:10.1080/00207454.2023.2279497
More info
ABSTRACTAims Females and males of all ages are affected by epilepsy; however, unlike many clinical studies, most preclinical research has focused on males. Genetic variants in the voltage-gated sodium channel gene, SCN8A, are associated with a broad spectrum of neurological and epileptic syndromes. Here we investigate sex differences in the natural history of the Scn8a-N1768D knockin mouse model of pediatric epilepsy.Methods We utilize 24/7 video to monitor juveniles and adults of both sexes to investigate variability in seizure activity (e.g., onset and frequency), mortality and morbidity, response to cannabinoids, and mode of death. We also monitor sleep architecture using a non-invasive piezoelectric method in order to identify factors that influence seizure severity and outcome.Results Both sexes had nearly 100% penetrance in seizure onset and early mortality. However, adult heterozygous (D/+) females were more resilient as exhibited by the ability to tolerate more seizures over a longer lifespan. Homozygous (D/D) juveniles did not exhibit a sex difference in overall survival. Female estrus cycle was disrupted before seizure onset, while sleep was disrupted in both sexes in association with seizure onset. Females typically died while in convulsive status epilepticus, while a high proportion of males died while not experiencing behavioral seizures. Only juvenile and adult males benefited from cannabinoid administration.Conclusions These results support the hypothesis that factors associated with sexual differentiation play a role in the neurobiology of epilepsy and point to the importance of including both sexes in the design of studies to identify new epilepsy therapies.Key Words: mouse epilepsy modeltonic-clonic seizuresSUDEPsleep architecturecannabinoidsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also. FundingThe author(s) reported there is no funding associated with the work featured in this article.
Chung, K. M., Hack, J., Andrews, J., Galindo‐Kelly, M., Schreiber, J., Watkins, J., & Hammer, M. F. (2023).
Clinical severity is correlated with age at seizure onset and biophysical properties of recurrent gain of function variants associated with SCN8A‐related epilepsy
. Epilepsia, 84, 3365-3376. doi:10.1111/epi.17747
More info
Genetic variants in the SCN8A gene underlie a wide spectrum of neurodevelopmental phenotypes including several distinct seizure types and a host of comorbidities. One of the major challenges facing clinicians and researchers alike is to identify genotype-phenotype (G-P) correlations that may improve prognosis, guide treatment decisions, and lead to precision medicine approaches.We investigated G-P correlations among 270 participants harboring gain-of-function (GOF) variants enrolled in the International SCN8A Registry, a patient-driven online database. We performed correlation analyses stratifying the cohort by clinical phenotypes to identify diagnostic features that differ among patients with varying levels of clinical severity, and that differ among patients with distinct GOF variants.Our analyses confirm positive correlations between age at seizure onset and developmental skills acquisition (developmental quotient), rate of seizure freedom, and percentage of cohort with developmental delays, and identify negative correlations with number of current and weaned antiseizure medications. This set of features is more detrimentally affected in individuals with a priori expectations of more severe clinical phenotypes. Our analyses also reveal a significant correlation between a severity index combining clinical features of individuals with a particular highly recurrent variant and an independent electrophysiological score assigned to each variant based on in vitro testing.This is one of the first studies to identify statistically significant G-P correlations for individual SCN8A variants with GOF properties. The results suggest that individual GOF variants (1) are predictive of clinical severity for individuals carrying those variants and (2) may underlie distinct clinical phenotypes of SCN8A disease, thus helping to explain the wide SCN8A-related epilepsy disease spectrum. These results also suggest that certain features present at initial diagnosis are predictive of clinical severity, and with more informed treatment plans, may serve to improve prognosis for patients with SCN8A GOF variants.
Hack, J. B., Horning, K. J., Short, D. M., Schreiber, J. M., Watkins, J. C., & Hammer, M. F. (2023).
Distinguishing Loss-of-Function and Gain-of-FunctionSCN8AVariants Using a Random Forest Classification Model Trained on Clinical Features
. Neurology: Genetics, 9. doi:10.1212/nxg.0000000000200060
More info
Background and Objectives Pathogenic variants at the voltage-gated sodium channel gene, SCN8A , are associated with a wide spectrum of clinical disease outcomes. A critical challenge for neurologists is to determine whether patients carry gain-of-function (GOF) or loss-of-function (LOF) variants to guide treatment decisions, yet in vitro studies to infer channel function are often not feasible in the clinic. In this study, we develop a predictive modeling approach to classify variants based on clinical features present at initial diagnosis. Methods We performed an exhaustive search for individuals deemed to carry SCN8A GOF and LOF variants by means of in vitro studies in heterologous cell systems, or because the variant was classified as truncating, and recorded clinical features. This resulted in a total of 69 LOF variants: 34 missense and 35 truncating variants, including 9 nonsense, 13 frameshift, 6 splice site, 6 indels, and 1 large deletion. We then assembled a truth set of variants with known functional effects, excluding individuals carrying variants at other loci associated with epilepsy. We then trained a predictive model based on random forest using this truth set of 45 LOF variants and 45 GOF variants randomly selected from a set of variants tested by in vitro methods. Results Phenotypic categories assigned to individuals correlated strongly with GOF or LOF variants. All patients with GOF variants experienced early-onset seizures (mean age at onset = 4.5 ± 3.1 months) while only 64.4% patients with LOF variants had seizures, most of which were late-onset absence seizures (mean age at onset = 40.0 ± 38.1 months). With high accuracy (95.4%), our model including 5 key clinical features classified individuals with GOF and LOF variants into 2 distinct cohorts differing in age at seizure onset, development of seizures, seizure type, intellectual disability, and developmental and epileptic encephalopathy. Discussion The results support the hypothesis that patients with SCN8A GOF and LOF variants represent distinct clinical phenotypes. The clinical model developed in this study has great utility because it provides a rapid and highly accurate platform for predicting the functional class of patient variants during SCN8A diagnosis, which can aid in initial treatment decisions and improve prognosis.
Watkins, J. C., Hack, J. B., Hammer, M. F., Screiber, J. M., Horning, K., & Juroske Short, D. M. (2023). Distinguishing Loss- and Gain-of-Function SCN8A Variants Using a Random Forest Classification Model Trained on Clinical Features. Neurology Genetics, 15.
Sahneh, F. D., Fries, W., Watkins, J. C., & Lega, J. (2022).
Epidemics from the Eye of the Pathogen
. SIAM Journal on Applied Mathematics, 82, 2036-2056. doi:10.1137/21m1450719
More info
While a common trend in disease modeling is to develop models of increasing complexity, it was recently pointed out that outbreaks appear remarkably simple when viewed in the incidence vs. cumulative cases (ICC) plane. This article details the theory behind this phenomenon by analyzing the stochastic SIR (Susceptible, Infected, Recovered) model in the cumulative cases domain. We prove that the Markov chain associated with this model reduces, in the ICC plane, to a pure birth chain for the cumulative number of cases, whose limit leads to an independent increments Gaussian process that fluctuates about a deterministic ICC curve. We calculate the associated variance and quantify the additional variability due to estimating incidence over a finite period of time. We also illustrate the universality brought forth by the ICC concept on real-world data for Influenza A and for the COVID-19 outbreak in Arizona.
Watkins, J. C., Chilton, F. H., Yao, G., Zhang, H. H., McCall, C. E., Seeds, M. C., Hallmark, J. M., Hallmark, B., Sun, S., Hara, A., & Lu, E. (2022). Temporal Associations of Plasma Levels of the Secreted Phospholipase A₂Family and Mortality in Severe COVID-19. Cold Spring Harbor Laboratory - medRxiv. doi:10.1101/2022.11.21.22282595
More info
Abstract Previous research suggests that group IIA secreted phospholipase A 2 (sPLA 2 -IIA) plays a role in and predicts severe COVID-19 disease. The current study reanalyzed a longitudinal proteomic data set to determine the temporal (days 0, 3 and 7) relationship between the levels of several members of a family of sPLA 2 isoforms and the severity of COVID-19 in 214 ICU patients. The levels of six secreted PLA 2 isoforms, sPLA 2 -IIA, sPLA 2 -V, sPLA 2 -X, sPLA 2 -IB, sPLA 2 -IIC, and sPLA 2 -XVI, increased over the first 7 ICU days in those who succumbed to the disease. sPLA 2 -IIA outperformed top ranked cytokines and chemokines as predictors of patient outcome. A decision tree corroborated these results with day 0 to day 3 kinetic changes of sPLA 2 -IIA that separated the death and severe categories from the mild category and increases from day 3 to day 7 significantly enriched the lethal category. In contrast, there was a time-dependent decrease in sPLA 2 -IID and sPLA 2 -XIIB in patients with severe or lethal disease, and these two isoforms were at higher levels in mild patients. Taken together, proteomic analysis revealed temporal sPLA 2 patterns that reflect the critical roles of sPLA 2 isoforms in severe COVID-19 disease.
Gentry, B., Richardson, M., Lopez, D. P., & Watkins, J. (2021). Indigenous Language Migration along the US Southwestern Border?the View from Arizona. Chance, 34(3), 47--55.
Sahneh, F. D., Fries, W., Watkins, J. C., & Lega, J. (2021). Epidemics from the Eye of the Pathogen. arXiv preprint arXiv:2103.12848.
Watkins, J. C., Zhou, J., Zhou, H., Liu, Y., & Zhang, M. (2021). A novel non-linear dimension reduction approach to infer population structure for low-coverage sequencing data. BMC Bioinformatics.
Watkins, J., Gentry, B., Richardson, M., & Lopez, D. P. (2021). Indigenous Language Migration along the U.S. Southwestern Border—the View from Arizona. CHANCE, 34(3), 47-55. doi:10.1080/09332480.2021.1979814
Zhang, M., Liu, Y., Zhou, H., Zhou, J., & Watkins, J. C. (2021). A novel non-linear dimension reduction approach to infer population structure for low-coverage sequencing data. BMC Bioinformatics, 22. doi:https://doi.org/10.1186/s12859-021-04265-
Watkins, J. C., Longoria, I. A., Johnson, J. P., Hammer, M. F., & Encinas, A. C. (2020). Variable patterns of mutation density among NaV1.1, NaV1.2 and NaV1.6 point to channel-specific functional differences associated with childhood epilepsy.. PloS one, 15(8), e0238121. doi:10.1371/journal.pone.0238121
More info
Variants implicated in childhood epilepsy have been identified in all four voltage-gated sodium channels that initiate action potentials in the central nervous system. Previous research has focused on the functional effects of particular variants within the most studied of these channels (NaV1.1, NaV1.2 and NaV1.6); however, there have been few comparative studies across channels to infer the impact of mutations in patients with epilepsy. Here we compare patterns of variation in patient and public databases to test the hypothesis that regions of known functional significance within voltage-gated sodium (NaV) channels have an increased burden of deleterious variants. We assessed mutational burden in different regions of the Nav channels by (1) performing Fisher exact tests on odds ratios to infer excess variants in domains, segments, and loops of each channel in patient databases versus public "control" databases, and (2) comparing the cumulative distribution of variant sites along DNA sequences of each gene in patient and public databases (i.e., independent of protein structure). Patient variant density was concordant among channels in regions known to play a role in channel function, with statistically significant higher patient variant density in S4-S6 and DIII-DIV and an excess of public variants in SI-S3, DI-DII, DII-DIII. On the other hand, channel-specific patterns of patient burden were found in the NaV1.6 inactivation gate and NaV1.1 S5-S6 linkers, while NaV1.2 and NaV1.6 S4-S5 linkers and S5 segments shared patient variant patterns that contrasted with those in NaV1.1. These different patterns may reflect different roles played by the NaV1.6 inactivation gate in action potential propagation, and by NaV1.1 S5-S6 linkers in loss of function and haploinsufficiency. Interestingly, NaV1.2 and NaV1.6 both lack amino acid substitutions over significantly long stretches in both the patient and public databases suggesting that new mutations in these regions may cause embryonic lethality or a non-epileptic disease phenotype.
Ahmed, R., Angelini, P., Sahneh, F. D., Efrat, A., Glickenstein, D., Gronemann, M., Heinsohn, N., Kobourov, S. G., Spence, R. K., Watkins, J. C., & Wolff, A. (2019).
Multi-Level Steiner Trees
. Journal of Experimental Algorithmics, 24, 1-22. doi:10.48550/arxiv.1804.02627
More info
In the classical Steiner tree problem, given an undirected, connected graph $G=(V,E)$ with non-negative edge costs and a set of \emph{terminals} $T\subseteq V$, the objective is to find a minimum-cost tree $E' \subseteq E$ that spans the terminals. The problem is APX-hard; the best known approximation algorithm has a ratio of $\rho = \ln(4)+\varepsilon < 1.39$. In this paper, we study a natural generalization, the \emph{multi-level Steiner tree} (MLST) problem: given a nested sequence of terminals $T_{\ell} \subset \dots \subset T_1 \subseteq V$, compute nested trees $E_{\ell}\subseteq \dots \subseteq E_1\subseteq E$ that span the corresponding terminal sets with minimum total cost. The MLST problem and variants thereof have been studied under various names including Multi-level Network Design, Quality-of-Service Multicast tree, Grade-of-Service Steiner tree, and Multi-Tier tree. Several approximation results are known. We first present two simple $O(\ell)$-approximation heuristics. Based on these, we introduce a rudimentary composite algorithm that generalizes the above heuristics, and determine its approximation ratio by solving a linear program. We then present a method that guarantees the same approximation ratio using at most $2\ell$ Steiner tree computations. We compare these heuristics experimentally on various instances of up to 500 vertices using three different network generation models. We also present various integer linear programming (ILP) formulations for the MLST problem, and compare their running times on these instances. To our knowledge, the composite algorithm achieves the best approximation ratio for up to $\ell=100$ levels, which is sufficient for most applications such as network visualization or designing multi-level infrastructure.
Ahmed, R., Watkins, J. C., Wolff, A., Angelini, P., Efrat, A., Gronemann, M., Heinsohn, N., Kobourov, S. G., Spence, R., Sahneh, F. D., & Glickenstein, D. A. (2019). Multi-level Steiner Trees. Journal of Experimental Algorithmics (JEA). doi:10.1145/3368621
Encinas, A. C., Moore, I. K., Watkins, J. C., & Hammer, M. F. (2019). Influence of age at seizure onset on the acquisition of neurodevelopmental skills in an SCN8A cohort. Epilepsia, 60(8), 1711-1720.
More info
To characterize a cohort of patients with SCN8A-related epilepsy and to perform analyses to identify correlations involving the acquisition of neurodevelopmental skills.
Hammer, M. F., Watkins, J. C., Encinas, A. C., & Moore, I. (. (2019). Influence of age at seizure onset on the acquisition of neurodevelopmental skills in an SCN8A cohort. Epilepsia, 60(8), 1711-1720. doi:10.1111/epi.16288
Osipova, L. P., Lichman, D. V., Hallmark, B., Karafet, T. M., Hsieh, P. H., Watkins, J. C., & Hammer, M. F. (2019).
Genomic evidence of local adaptation to climate and diet in indigenour Siberians
. Molecular Biology and Evolution, 36, 315-327. doi:10.18413/2658-6533-2020-6-3-0-4
Sahneh, F. D., Efrat, A., Glickenstein, D., Wolff, A., Watkins, J. C., Spence, R., Sahneh, F. D., Kobourov, S. G., Heinsohn, N., Gronemann, M., Glickenstein, D., Efrat, A., Angelini, P., & Ahmed, R. (2019). Multi-level Steiner Trees. ACM Journal of Experimental Algorithms, 24(1), 1-22. doi:10.1145/3368621
More info
In the classical Steiner tree problem, given an undirected, connected graph G=(V,E) with non-negative edge costs and a set of terminalsT⊆ V, the objective is to find a minimum-cost tree Ep the best-known approximation algorithm has a ratio of ρ = ln (4)+e
Watkins, J. C., Osipova, L. P., Karafet, T. M., Hsieh, P., Hammer, M. F., & Hallmark, B. (2019). Genomic Evidence of Local Adaptation to Climate and Diet in Indigenous Siberians.. Molecular biology and evolution, 36(2), 315-327. doi:10.1093/molbev/msy211
More info
The indigenous inhabitants of Siberia live in some of the harshest environments on earth, experiencing extended periods of severe cold temperatures, dramatic variation in photoperiod, and limited and highly variable food resources. While the successful long-term settlement of this area by humans required multiple behavioral and cultural innovations, the nature of the underlying genetic changes has generally remained elusive. In this study, we used a three-part approach to identify putative targets of positive natural selection in Siberians. We first performed selection scans on whole exome and genome-wide single nucleotide polymorphism array data from multiple Siberian populations. We then annotated candidates in the tails of the empirical distributions, focusing on candidates with evidence linking them to biological processes and phenotypes previously identified as relevant to adaptation in circumpolar groups. The top candidates were then genotyped in additional populations to determine their spatial allele frequency distributions and associations with climate variables. Our analysis reveals missense mutations in three genes involved in lipid metabolism (PLA2G2A, PLIN1, and ANGPTL8) that exhibit genomic and spatial patterns consistent with selection for cold climate and/or diet. These variants are unified by their connection to brown adipose tissue and may help to explain previously observed physiological differences in Siberians such as low serum lipid levels and increased basal metabolic rate. These results support the hypothesis that indigenous Siberians have genetically adapted to their local environment by selection on multiple genes.
Ahmed, R., Angelini, P., Sahneh, F. D., Efrat, A., Glickenstein, D., Gronemann, M., Heinsohn, N., Kobourov, S. G., Spence, R., Watkins, J. C., & Wolff, A. (2018). Multi-Level Steiner Trees. Journal of Experimental Algorithmics.
More info
In the classical Steiner tree problem, given an undirected, connected graph $G=(V,E)$ with non-negative edge costs and a set of \emph{terminals} $T\subseteq V$, the objective is to find a minimum-cost tree $E' \subseteq E$ that spans the terminals. The problem is APX-hard; the best known approximation algorithm has a ratio of $\rho = \ln(4)+\varepsilon < 1.39$. In this paper, we study a natural generalization, the \emph{multi-level Steiner tree} (MLST) problem: given a nested sequence of terminals $T_{\ell} \subset \dots \subset T_1 \subseteq V$, compute nested trees $E_{\ell}\subseteq \dots \subseteq E_1\subseteq E$ that span the corresponding terminal sets with minimum total cost. The MLST problem and variants thereof have been studied under various names including Multi-level Network Design, Quality-of-Service Multicast tree, Grade-of-Service Steiner tree, and Multi-Tier tree. Several approximation results are known. We first present two simple $O(\ell)$-approximation heuristics. Based on these, we introduce a rudimentary composite algorithm that generalizes the above heuristics, and determine its approximation ratio by solving a linear program. We then present a method that guarantees the same approximation ratio using at most $2\ell$ Steiner tree computations. We compare these heuristics experimentally on various instances of up to 500 vertices using three different network generation models. We also present various integer linear programming (ILP) formulations for the MLST problem, and compare their running times on these instances. To our knowledge, the composite algorithm achieves the best approximation ratio for up to $\ell=100$ levels, which is sufficient for most applications such as network visualization or designing multi-level infrastructure.
Quinto-Cortés, C. D., Woerner, A. E., Watkins, J. C., & Hammer, M. F. (2018). Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference. Scientific reports, 8(1), 10209.
More info
Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.
Watkins, J. C., Woerner, A. E., Veeramah, K. R., & Hammer, M. F. (2018). The Role of Phylogenetically Conserved Elements in Shaping Patterns of Human Genomic Diversity. Molecular Biology and Evolution, 35(9), 2284-2295. doi:10.1093/molbev/msy145
Woerner, A. E., Veeramah, K. R., Watkins, J. C., & Hammer, M. F. (2018). The role of phylogenetically conserved elements in shaping patterns of human genomic diversity. Molecular biology and evolution.
More info
Evolutionary genetic studies have shown a positive correlation between levels of nucleotide diversity and either rates of recombination or genetic distance to genes. Both positive-directional and purifying selection have been offered as the source of these correlations via genetic hitchhiking and background selection, respectively. Phylogenetically conserved elements (CEs) are short (∼100bp), widely distributed (comprising ∼5% of genome), sequences that are often found far from genes. While the function of many CEs is unknown, CEs also are associated with reduced diversity at linked sites. Using high coverage (>80x) whole genome data from two human populations, the Yoruba and the CEU, we perform fine scale evaluations of diversity, rates of recombination, and linkage to genes. We find that the local rate of recombination has a stronger effect on levels of diversity than linkage to genes, and that these effects of recombination persist even in regions far from genes. Our whole genome modeling demonstrates that, rather than recombination or GC-biased gene conversion, selection on sites within or linked to CEs better explains the observed genomic diversity patterns. A major implication is that very few sites in the human genome are predicted to be free of the effects of selection. These sites, which we refer to as the human "neutralome", comprise only 1.2% of the autosomes and 5.1% of the X chromosome. Demographic analysis of the neutralome reveals larger population sizes and lower rates of growth for ancestral human populations than inferred by previous analyses.
Hammer, M. F., Ishii, A., Johnstone, L., Tchourbanov, A., Lau, B., Sprissler, R., Hallmark, B., Zhang, M., Zhou, J., Watkins, J., & Hirose, S. (2017). Rare variants of small effect size in neuronal excitability genes influence clinical outcome in Japanese cases of SCN1A truncation-positive Dravet syndrome. PloS one, 12(7), e0180485.
More info
Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A.
Hsieh, P., Hallmark, B., Watkins, J. C., Karafet, T. C., Osipova, L. P., Gutenkunst, R. N., & Hammer, M. F. (2017). Exome sequencing provides evidence of polygenic adaptation to a fat-rich animal diet in indigenous Siberian populations. Molecular Biology and Evolution, 34, 2914.
Hsieh, P., Hallmark, B., Watkins, J., Karafet, T. M., Osipova, L. P., Gutenkunst, R. N., & Hammer, M. F. (2017). Exome Sequencing Provides Evidence of Polygenic Adaptation to a Fat-Rich Animal Diet in Indigenous Siberian Populations. Molecular biology and evolution, 34(11), 2913-2926.
More info
Siberia is one of the coldest environments on Earth and has great seasonal temperature variation. Long-term settlement in northern Siberia undoubtedly required biological adaptation to severe cold stress, dramatic variation in photoperiod, and limited food resources. In addition, recent archeological studies show that humans first occupied Siberia at least 45,000 years ago; yet our understanding of the demographic history of modern indigenous Siberians remains incomplete. In this study, we use whole-exome sequencing data from the Nganasans and Yakuts to infer the evolutionary history of these two indigenous Siberian populations. Recognizing the complexity of the adaptive process, we designed a model-based test to systematically search for signatures of polygenic selection. Our approach accounts for stochasticity in the demographic process and the hitchhiking effect of classic selective sweeps, as well as potential biases resulting from recombination rate and mutation rate heterogeneity. Our demographic inference shows that the Nganasans and Yakuts diverged ∼12,000-13,000 years ago from East-Asian ancestors in a process involving continuous gene flow. Our polygenic selection scan identifies seven candidate gene sets with Siberian-specific signals. Three of these gene sets are related to diet, especially to fat metabolism, consistent with the hypothesis of adaptation to a fat-rich animal diet. Additional testing rejects the effect of hitchhiking and favors a model in which selection yields small allele frequency changes at multiple unlinked genes.
Ishii, A., Kang, J. Q., Schornak, C. C., Hernandez, C. C., Shen, W., Watkins, J. C., Macdonald, R. L., & Hirose, S. (2017). A de novo missense mutation of GABRB2 causes early myoclonic encephalopathy. Journal of medical genetics, 54(3), 202-211.
More info
Early myoclonic encephalopathy (EME), a disease with a devastating prognosis, is characterised by neonatal onset of seizures and massive myoclonus accompanied by a continuous suppression-burst EEG pattern. Three genes are associated with EMEs that have metabolic features. Here, we report a pathogenic mutation of an ion channel as a cause of EME for the first time.
Ishii, A., Watkins, J. C., Chen, D., Hirose, S., & Hammer, M. F. (2017). Clinical implications of SCN1A missense and truncation variants in a large Japanese cohort with Dravet syndrome. Epilepsia, 58(2), 282-290.
More info
Two major classes of SCN1A variants are associated with Dravet syndrome (DS): those that result in haploinsufficiency (truncating) and those that result in an amino acid substitution (missense). The aim of this retrospective study was to describe the first large cohort of Japanese patients with SCN1A mutation-positive DS (n = 285), and investigate the relationship between variant (type and position) and clinical expression and response to treatment.
Alberts, D. S., Watkins, J. C., Patel, C., Glazer, E. S., Zhang, H. H., Hill, K. A., Kha, S. T., Yozwiak, M. L., Bartels, H., Nafissi, N. N., & Krouse, R. S. (2016). Evaluating IPMN and pancreatic carcinoma utilizing quantitative histopathology. Cancer Medicine, 5(10), 2841-2847. doi:10.1002/cam4.923
Glazer, E. S., Zhang, H. H., Hill, K. A., Patel, C., Kha, S. T., Yozwiak, M. L., Bartels, H., Nafissi, N. N., Watkins, J. C., Alberts, D. S., & Krouse, R. S. (2016). Evaluating IPMN and pancreatic carcinoma utilizing quantitative histopathology. Cancer medicine, 5(10), 2841-2847.
More info
Intraductal papillary mucinous neoplasms (IPMN) are pancreatic lesions with uncertain biologic behavior. This study sought objective, accurate prediction tools, through the use of quantitative histopathological signatures of nuclear images, for classifying lesions as chronic pancreatitis (CP), IPMN, or pancreatic carcinoma (PC). Forty-four pancreatic resection patients were retrospectively identified for this study (12 CP; 16 IPMN; 16 PC). Regularized multinomial regression quantitatively classified each specimen as CP, IPMN, or PC in an automated, blinded fashion. Classification certainty was determined by subtracting the smallest classification probability from the largest probability (of the three groups). The certainty function varied from 1.0 (perfectly classified) to 0.0 (random). From each lesion, 180 ± 22 nuclei were imaged. Overall classification accuracy was 89.6% with six unique nuclear features. No CP cases were misclassified, 1/16 IPMN cases were misclassified, and 4/16 PC cases were misclassified. Certainty function was 0.75 ± 0.16 for correctly classified lesions and 0.47 ± 0.10 for incorrectly classified lesions (P = 0.0005). Uncertainty was identified in four of the five misclassified lesions. Quantitative histopathology provides a robust, novel method to distinguish among CP, IPMN, and PC with a quantitative measure of uncertainty. This may be useful when there is uncertainty in diagnosis.
Hammer, M. F., Watkins, J. C., Ishii, A., Chen, D., & Hirose, S. (2016). Clinical implications ofSCN1Amissense and truncation variants in a large Japanese cohort with Dravet syndrome. Epilepsia, 58(2), 282-290. doi:10.1111/epi.13639
Ishii, A., Kang, J., Schornak, C. C., Hernández, C. C., Shen, W., Watkins, J. C., Macdonald, R. L., & Hirose, S. (2016).
Ade novomissense mutation ofGABRB2causes early myoclonic encephalopathy
. Journal of medical genetics, 54, 202-211. doi:10.1136/jmedgenet-2016-104083
More info
Background Early myoclonic encephalopathy (EME), a disease with a devastating prognosis, is characterised by neonatal onset of seizures and massive myoclonus accompanied by a continuous suppression-burst EEG pattern. Three genes are associated with EMEs that have metabolic features. Here, we report a pathogenic mutation of an ion channel as a cause of EME for the first time. Methods Sequencing was performed for 214 patients with epileptic seizures using a gene panel with 109 genes that are known or suspected to cause epileptic seizures. Functional assessments were demonstrated by using electrophysiological experiments and immunostaining for mutant γ-aminobutyric acid-A (GABAA) receptor subunits in HEK293T cells. Results We discovered a de novo heterozygous missense mutation (c.859A>C [p.Thr287Pro]) in the GABRB2-encoded β2 subunit of the GABAA receptor in an infant with EME. No GABRB2 mutations were found in three other EME cases or in 166 patients with infantile spasms. GABAA receptors bearing the mutant β2 subunit were poorly trafficked to the cell membrane and prevented γ2 subunits from trafficking to the cell surface. The peak amplitudes of currents from GABAA receptors containing only mutant β2 subunits were smaller than that of those from receptors containing only wild-type β2 subunits. The decrease in peak current amplitude (96.4% reduction) associated with the mutant GABAA receptor was greater than expected, based on the degree to which cell surface expression was reduced (66% reduction). Conclusion This mutation has complex functional effects on GABAA receptors, including reduction of cell surface expression and attenuation of channel function, which would significantly perturb GABAergic inhibition in the brain.
Ishii, A., Kang, J., Schornak, C. C., Hernández, C. C., Shen, W., Watkins, J. C., Macdonald, R. L., Hirose, S., Ishii, A., Kang, J., Schornak, C. C., Hernández, C. C., Shen, W., Watkins, J. C., Macdonald, R. L., & Hirose, S. (2016).
Ade novomissense mutation ofGABRB2causes early myoclonic encephalopathy
. Journal of Medical Genetics, 54, 202-211. doi:10.1136/jmedgenet-2016-104083
More info
Background Early myoclonic encephalopathy (EME), a disease with a devastating prognosis, is characterised by neonatal onset of seizures and massive myoclonus accompanied by a continuous suppression-burst EEG pattern. Three genes are associated with EMEs that have metabolic features. Here, we report a pathogenic mutation of an ion channel as a cause of EME for the first time. Methods Sequencing was performed for 214 patients with epileptic seizures using a gene panel with 109 genes that are known or suspected to cause epileptic seizures. Functional assessments were demonstrated by using electrophysiological experiments and immunostaining for mutant γ-aminobutyric acid-A (GABAA) receptor subunits in HEK293T cells. Results We discovered a de novo heterozygous missense mutation (c.859A>C [p.Thr287Pro]) in the GABRB2-encoded β2 subunit of the GABAA receptor in an infant with EME. No GABRB2 mutations were found in three other EME cases or in 166 patients with infantile spasms. GABAA receptors bearing the mutant β2 subunit were poorly trafficked to the cell membrane and prevented γ2 subunits from trafficking to the cell surface. The peak amplitudes of currents from GABAA receptors containing only mutant β2 subunits were smaller than that of those from receptors containing only wild-type β2 subunits. The decrease in peak current amplitude (96.4% reduction) associated with the mutant GABAA receptor was greater than expected, based on the degree to which cell surface expression was reduced (66% reduction). Conclusion This mutation has complex functional effects on GABAA receptors, including reduction of cell surface expression and attenuation of channel function, which would significantly perturb GABAergic inhibition in the brain.
Bartels, P. H., Zhang, H. H., Yozwiak, M. L., Watkins, J. C., Patel, C., Krouse, R. S., Kha, S. T., Hill, K. A., Glazer, E. S., Bartels, P. H., Bartels, H. G., & Alberts, D. S. (2015). Abstract A83: Nuclear morphometry differentiates chronic pancreatitis, IPMN, and pancreatic carcinoma. Cancer Research, 75. doi:10.1158/1538-7445.panca2014-a83
More info
Background: It can be difficult to distinguish between chronic pancreatitis (CP), IPMN, and pancreatic carcinoma (PC) on tissue biopsy. Nuclear morphometry can measure up to 93 unique nuclear features based on standard histopathology. The goal of this work is to build novel, objective, and accurate prediction tools, based on nuclear morphometric signatures in high resolution images of nuclei of histologic sections, for classifying pancreatic tissues into three distinct groups. Materials & Methods: 44 patients who underwent pancreatic resections were identified. 12 cases of CP, 16 cases of IPMN, and 16 cases of PC were utilized in this pilot study. 180 ± 22 nuclei from each lesion were imaged with high resolution microscopy. Clincodemographic data was obtained retrospectively from the medical record. Statistically significant nuclear features were determined by a fully automated penalized multinomial regression algorithm in order to determine a multi-class classifier and simultaneously identify important nuclear features. The LASSO penalty function, and associated regularization parameter, is adaptively chosen by cross validation to prevent over-fitting. In order to test the veracity of the automated algorithm, we randomly removed 25% of the cases as a training set and utilized the remaining cases as a test set; this was repeated 10 times. Results: The average age was 64 ± 15 years, with patients in the CP being slightly younger; 63% were male. Median follow-up time was 3 years in the CP group, 3 years in the IPMN group, and 5 years in PC group. The method described automatically identified 6 unique and statistically significant nuclear features (corrected overall P Conclusions: Nuclear morphometry classifies pancreatic lesions into CP, IPMN, and PC with 84.5% accuracy using a fully automated algorithm to determine statistically significant and unique nuclear features. Since the incorrectly classified lesions had a larger proportion of mixed nuclei, diagnostic uncertainty may be determined in a quantitative manner allowing for a confidence probability estimation of whether a given lesions should be classified as a CP, IPMN, or PC. Further studies will validate these results in a resected cohort as well as a cohort based on biopsied specimens alone. Citation Format: Evan S. Glazer, Hao Zhang, Kimberly A. Hill, Charmi Patel, Stephanie T. Kha, Peter H. Bartels, Michael L. Yozwiak, Hubert G. Bartels, Joseph C. Watkins, David S. Alberts, Robert S. Krouse. Nuclear morphometry differentiates chronic pancreatitis, IPMN, and pancreatic carcinoma. [abstract]. In: Proceedings of the AACR Special Conference on Pancreatic Cancer: Innovations in Research and Treatment; May 18-21, 2014; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2015;75(13 Suppl):Abstract nr A83.
Bailey, B. L., Visscher, K., & Watkins, J. (2014). A stochastic model of translation with -1 programmed ribosomal frameshifting. Physical biology, 11(1), 016009.
More info
Many viruses produce multiple proteins from a single mRNA sequence by encoding overlapping genes. One mechanism to decode both genes, which reside in alternate reading frames, is -1 programmed ribosomal frameshifting. Although recognized for over 25 years, the molecular and physical mechanism of -1 frameshifting remains poorly understood. We have developed a mathematical model that treats mRNA translation and associated -1 frameshifting as a stochastic process in which the transition probabilities are based on the energetics of local molecular interactions. The model predicts both the location and efficiency of -1 frameshift events in HIV-1. Moreover, we compute -1 frameshift efficiencies upon mutations in the viral mRNA sequence and variations in relative tRNA abundances, predictions that are directly testable in experiment.
Bartels, P. H., Zhang, H. H., Watkins, J. C., Krouse, R. S., Hill, K. A., Glazer, E. S., Bartels, P. H., & Alberts, D. S. (2014). Abstract 1362: Nuclear morphometry measures progressive atypia in the development of pancreatic carcinoma. Cancer Research, 74, 1362-1362. doi:10.1158/1538-7445.am2014-1362
More info
Pancreatic lesions that are not clearly benign are often treated as malignant despite uncertainty in the true diagnosis due to the nearly universally fatal nature of pancreatic carcinoma (PC). Nuclear morphometry is a technique to quantify nuclear features too complex for the human eye to discern. We hypothesized that nuclear atypia can be quantified with morphometry in order to distinguish between chronic pancreatitis, IPMN, and PC. We retrospectively analyzed 14 specimens of chronic pancreatitis, 16 IPMN lesions, and 19 PC lesions. Clinicopathologic data were obtained. Nuclear morphometry determined overall atypia based on the average nuclear abnormality of 95 distinct nuclear features (a nuclear signature). For PC lesions, 5 nuclear features defined a classification score (CS) representing the proportion of aggressive nuclei in a given PC lesion. Statistical significance was determined with ANOVA and the Kruskal-Wallis test. The average age for all patients was 63 ± 15 years while 62% were male; there were no differences between the 3 groups. The follow up was approximately 4 years in all groups as well. The average nuclear atypia of chronic pancreatitis was 0.80, 0.99 for IPMN, and 1.08 for PC (P = 0.02). Importantly, based on 5 nuclear features, the CS for PC lesions that recurred was 87% while it was only 56% for those PC lesions that did recur (P = 0.04). The research describes a precise, accurate, and objective method to distinguish IPMN from PC. The CS and overall atypia score provides a method to not only objectively describe a given lesion, but it importantly describes where in the progression from benign to malignant lesion an unknown lesion exists. Clinically, this may be of great utility in risk stratifying unknown pancreatic lesions at the time of diagnosis or tissue biopsy. Citation Format: Evan S. Glazer, Kimberly A. Hill, Hao (Helen) Zhang, Peter Bartels, Joseph Watkins, David S. Alberts, Robert S. Krouse. Nuclear morphometry measures progressive atypia in the development of pancreatic carcinoma. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 1362. doi:10.1158/1538-7445.AM2014-1362
Veeramah, K. R., Gutenkunst, R. N., Woerner, A. E., Watkins, J. C., & Hammer, M. F. (2014).
Evidence for Increased Levels of Positive and Negative Selection on the X Chromosome versus Autosomes in Humans
. Molecular Biology and Evolution, 31, 2267-2282. doi:10.1093/molbev/msu166
More info
Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.
Veeramah, K. R., Gutenkunst, R. N., Woerner, A. E., Watkins, J. C., & Hammer, M. F. (2014). Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans. Molecular biology and evolution, 31(9), 2267-82.
More info
Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2013).
Neandertal Origin of Genetic Variation at the Cluster of OAS Immunity Genes
. Molecular Biology and Evolution, 30, 798-801. doi:10.1093/molbev/mst004
More info
Analyses of ancient DNA from extinct humans reveal signals of at least two independent hybridization events in the history of non-African populations. To date, there are very few examples of specific genetic variants that have been rigorously identified as introgressive. Here, we survey DNA sequence variation in the OAS gene cluster on chromosome 12 and provide strong evidence that a haplotype extending for ∼185 kb introgressed from Neandertals. This haplotype is nearly restricted to Eurasians and is estimated to have diverged from the Neandertal sequence ∼125 kya. Despite the potential for novel functional variation, the observed frequency of this haplotype is consistent with neutral introgression. This is the second locus in the human genome, after STAT2, carrying distinct haplotypes that appear to have introgressed separately from both Neandertals and Denisova.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2013). Neandertal origin of genetic variation at the cluster of OAS immunity genes. Molecular biology and evolution, 30(4), 798-801.
More info
Analyses of ancient DNA from extinct humans reveal signals of at least two independent hybridization events in the history of non-African populations. To date, there are very few examples of specific genetic variants that have been rigorously identified as introgressive. Here, we survey DNA sequence variation in the OAS gene cluster on chromosome 12 and provide strong evidence that a haplotype extending for ~185 kb introgressed from Neandertals. This haplotype is nearly restricted to Eurasians and is estimated to have diverged from the Neandertal sequence ~125 kya. Despite the potential for novel functional variation, the observed frequency of this haplotype is consistent with neutral introgression. This is the second locus in the human genome, after STAT2, carrying distinct haplotypes that appear to have introgressed separately from both Neandertals and Denisova.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012).
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea
. The American Journal of Human Genetics, 91, 265-274. doi:10.1016/j.ajhg.2012.06.015
More info
Signals of archaic admixture have been identified through comparisons of the draft Neanderthal and Denisova genomes with those of living humans. Studies of individual loci contributing to these genome-wide average signals are required for characterization of the introgression process and investigation of whether archaic variants conferred an adaptive advantage to the ancestors of contemporary human populations. However, no definitive case of adaptive introgression has yet been described. Here we provide a DNA sequence analysis of the innate immune gene STAT2 and show that a haplotype carried by many Eurasians (but not sub-Saharan Africans) has a sequence that closely matches that of the Neanderthal STAT2. This haplotype, referred to as N, was discovered through a resequencing survey of the entire coding region of STAT2 in a global sample of 90 individuals. Analyses of publicly available complete genome sequence data show that haplotype N shares a recent common ancestor with the Neanderthal sequence (∼80 thousand years ago) and is found throughout Eurasia at an average frequency of ∼5%. Interestingly, N is found in Melanesian populations at ∼10-fold higher frequency (∼54%) than in Eurasian populations. A neutrality test that controls for demography rejects the hypothesis that a variant of N rose to high frequency in Melanesia by genetic drift alone. Although we are not able to pinpoint the precise target of positive selection, we identify nonsynonymous mutations in ERBB3, ESYT1, and STAT2—all of which are part of the same 250 kb introgressive haplotype—as good candidates. Signals of archaic admixture have been identified through comparisons of the draft Neanderthal and Denisova genomes with those of living humans. Studies of individual loci contributing to these genome-wide average signals are required for characterization of the introgression process and investigation of whether archaic variants conferred an adaptive advantage to the ancestors of contemporary human populations. However, no definitive case of adaptive introgression has yet been described. Here we provide a DNA sequence analysis of the innate immune gene STAT2 and show that a haplotype carried by many Eurasians (but not sub-Saharan Africans) has a sequence that closely matches that of the Neanderthal STAT2. This haplotype, referred to as N, was discovered through a resequencing survey of the entire coding region of STAT2 in a global sample of 90 individuals. Analyses of publicly available complete genome sequence data show that haplotype N shares a recent common ancestor with the Neanderthal sequence (∼80 thousand years ago) and is found throughout Eurasia at an average frequency of ∼5%. Interestingly, N is found in Melanesian populations at ∼10-fold higher frequency (∼54%) than in Eurasian populations. A neutrality test that controls for demography rejects the hypothesis that a variant of N rose to high frequency in Melanesia by genetic drift alone. Although we are not able to pinpoint the precise target of positive selection, we identify nonsynonymous mutations in ERBB3, ESYT1, and STAT2—all of which are part of the same 250 kb introgressive haplotype—as good candidates.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012).
Global Genetic Variation at OAS1 Provides Evidence of Archaic Admixture in Melanesian Populations
. Molecular Biology and Evolution, 29, 1513-1520. doi:10.1093/molbev/msr301
More info
Recent analysis of DNA extracted from two Eurasian forms of archaic human shows that more genetic variants are shared with humans currently living in Eurasia than with anatomically modern humans in sub-Saharan Africa. Although these genome-wide average measures of genetic similarity are consistent with the hypothesis of archaic admixture in Eurasia, analyses of individual loci exhibiting the signal of archaic introgression are needed to test alternative hypotheses and investigate the admixture process. Here, we provide a detailed sequence analysis of the innate immune gene OAS1, a locus with a divergent Melanesian haplotype that is very similar to the Denisova sequence from the Altai region of Siberia. We resequenced a 7-kb region encompassing the OAS1 gene in 88 individuals from six Old World populations (San, Biaka, Mandenka, French Basque, Han Chinese, and Papua New Guineans) and discovered previously unknown and ancient genetic variation. The 5′ region of this gene has unusual patterns of diversity, including 1) higher levels of nucleotide diversity in Papuans than in sub-Saharan Africans, 2) very deep ancestry with an estimated time to the most recent common ancestor of >3 myr, and 3) a basal branching pattern with Papuan individuals on either side of the rooted network. A global geographic survey of >1,500 individuals showed that the divergent Papuan haplotype is nearly restricted to populations from eastern Indonesia and Melanesia. Polymorphic sites within this haplotype are shared with the draft Denisova genome over a span of ∼90 kb and are associated with an extended block of linkage disequilibrium, supporting the hypothesis that this haplotype introgressed from an archaic source that likely lived in Eurasia.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012). A haplotype at STAT2 Introgressed from neanderthals and serves as a candidate of positive selection in Papua New Guinea. American journal of human genetics, 91(2), 265-74.
More info
Signals of archaic admixture have been identified through comparisons of the draft Neanderthal and Denisova genomes with those of living humans. Studies of individual loci contributing to these genome-wide average signals are required for characterization of the introgression process and investigation of whether archaic variants conferred an adaptive advantage to the ancestors of contemporary human populations. However, no definitive case of adaptive introgression has yet been described. Here we provide a DNA sequence analysis of the innate immune gene STAT2 and show that a haplotype carried by many Eurasians (but not sub-Saharan Africans) has a sequence that closely matches that of the Neanderthal STAT2. This haplotype, referred to as N, was discovered through a resequencing survey of the entire coding region of STAT2 in a global sample of 90 individuals. Analyses of publicly available complete genome sequence data show that haplotype N shares a recent common ancestor with the Neanderthal sequence (~80 thousand years ago) and is found throughout Eurasia at an average frequency of ~5%. Interestingly, N is found in Melanesian populations at ~10-fold higher frequency (~54%) than in Eurasian populations. A neutrality test that controls for demography rejects the hypothesis that a variant of N rose to high frequency in Melanesia by genetic drift alone. Although we are not able to pinpoint the precise target of positive selection, we identify nonsynonymous mutations in ERBB3, ESYT1, and STAT2-all of which are part of the same 250 kb introgressive haplotype-as good candidates.
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012). Global genetic variation at OAS1 provides evidence of archaic admixture in Melanesian populations. Molecular biology and evolution, 29(6), 1513-20.
More info
Recent analysis of DNA extracted from two Eurasian forms of archaic human shows that more genetic variants are shared with humans currently living in Eurasia than with anatomically modern humans in sub-Saharan Africa. Although these genome-wide average measures of genetic similarity are consistent with the hypothesis of archaic admixture in Eurasia, analyses of individual loci exhibiting the signal of archaic introgression are needed to test alternative hypotheses and investigate the admixture process. Here, we provide a detailed sequence analysis of the innate immune gene OAS1, a locus with a divergent Melanesian haplotype that is very similar to the Denisova sequence from the Altai region of Siberia. We resequenced a 7-kb region encompassing the OAS1 gene in 88 individuals from six Old World populations (San, Biaka, Mandenka, French Basque, Han Chinese, and Papua New Guineans) and discovered previously unknown and ancient genetic variation. The 5' region of this gene has unusual patterns of diversity, including 1) higher levels of nucleotide diversity in Papuans than in sub-Saharan Africans, 2) very deep ancestry with an estimated time to the most recent common ancestor of >3 myr, and 3) a basal branching pattern with Papuan individuals on either side of the rooted network. A global geographic survey of >1,500 individuals showed that the divergent Papuan haplotype is nearly restricted to populations from eastern Indonesia and Melanesia. Polymorphic sites within this haplotype are shared with the draft Denisova genome over a span of ∼90 kb and are associated with an extended block of linkage disequilibrium, supporting the hypothesis that this haplotype introgressed from an archaic source that likely lived in Eurasia.
Veeramah, K. R., Wegmann, D., Woerner, A., Mendez, F. L., Watkins, J. C., Destro-Bisol, G., Soodyall, H., Louie, L., & Hammer, M. F. (2012). An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. Molecular biology and evolution, 29(2), 617-30.
More info
Sub-Saharan Africa has consistently been shown to be the most genetically diverse region in the world. Despite the fact that a substantial portion of this variation is partitioned between groups practicing a variety of subsistence strategies and speaking diverse languages, there is currently no consensus on the genetic relationships of sub-Saharan African populations. San (a subgroup of KhoeSan) and many Pygmy groups maintain hunter-gatherer lifestyles and cluster together in autosomal-based analysis, whereas non-Pygmy Niger-Kordofanian speakers (non-Pygmy NKs) predominantly practice agriculture and show substantial genetic homogeneity despite their wide geographic range throughout sub-Saharan Africa. However, KhoeSan, who speak a set of relatively unique click-based languages, have long been thought to be an early branch of anatomically modern humans based on phylogenetic analysis. To formally test models of divergence among the ancestors of modern African populations, we resequenced a sample of San, Eastern, and Western Pygmies and non-Pygmy NKs individuals at 40 nongenic (∼2 kb) regions and then analyzed these data within an Approximate Bayesian Computation (ABC) framework. We find substantial support for a model of an early divergence of KhoeSan ancestors from a proto-Pygmy-non-Pygmy NKs group ∼110 thousand years ago over a model incorporating a proto-KhoeSan-Pygmy hunter-gatherer divergence from the ancestors of non-Pygmy NKs. The results of our analyses are consistent with previously identified signals of a strong bottleneck in Mbuti Pygmies and a relatively recent expansion of non-Pygmy NKs. We also develop a number of methodologies that utilize "pseudo-observed" data sets to optimize our ABC-based inference. This approach is likely to prove to be an invaluable tool for demographic inference using genome-wide resequencing data.
Hammer, M. F., Woerner, A. E., Mendez, F. L., Watkins, J. C., & Wall, J. D. (2011). Genetic evidence for archaic admixture in Africa. Proceedings of the National Academy of Sciences of the United States of America, 108(37), 15123-8.
More info
A long-debated question concerns the fate of archaic forms of the genus Homo: did they go extinct without interbreeding with anatomically modern humans, or are their genes present in contemporary populations? This question is typically focused on the genetic contribution of archaic forms outside of Africa. Here we use DNA sequence data gathered from 61 noncoding autosomal regions in a sample of three sub-Saharan African populations (Mandenka, Biaka, and San) to test models of African archaic admixture. We use two complementary approximate-likelihood approaches and a model of human evolution that involves recent population structure, with and without gene flow from an archaic population. Extensive simulation results reject the null model of no admixture and allow us to infer that contemporary African populations contain a small proportion of genetic material (≈ 2%) that introgressed ≈ 35 kya from an archaic population that split from the ancestors of anatomically modern humans ≈ 700 kya. Three candidate regions showing deep haplotype divergence, unusual patterns of linkage disequilibrium, and small basal clade size are identified and the distributions of introgressive haplotypes surveyed in a sample of populations from across sub-Saharan Africa. One candidate locus with an unusual segment of DNA that extends for >31 kb on chromosome 4 seems to have introgressed into modern Africans from a now-extinct taxon that may have lived in central Africa. Taken together our results suggest that polymorphisms present in extant populations introgressed via relatively recent interbreeding with hominin forms that diverged from the ancestors of modern humans in the Lower-Middle Pleistocene.
Veeramah, K. R., Wegmann, D., Woerner, A. E., Mendez, F. L., Watkins, J. C., Destro-Bisol, G., Soodyall, H., Louie, L., & Hammer, M. F. (2011).
An Early Divergence of KhoeSan Ancestors from Those of Other Modern Humans Is Supported by an ABC-Based Analysis of Autosomal Resequencing Data
. Molecular Biology and Evolution, 29, 617-630. doi:10.1093/molbev/msr212
More info
Sub-Saharan Africa has consistently been shown to be the most genetically diverse region in the world. Despite the fact that a substantial portion of this variation is partitioned between groups practicing a variety of subsistence strategies and speaking diverse languages, there is currently no consensus on the genetic relationships of sub-Saharan African populations. San (a subgroup of KhoeSan) and many Pygmy groups maintain hunter-gatherer lifestyles and cluster together in autosomal-based analysis, whereas non-Pygmy Niger-Kordofanian speakers (non-Pygmy NKs) predominantly practice agriculture and show substantial genetic homogeneity despite their wide geographic range throughout sub-Saharan Africa. However, KhoeSan, who speak a set of relatively unique click-based languages, have long been thought to be an early branch of anatomically modern humans based on phylogenetic analysis. To formally test models of divergence among the ancestors of modern African populations, we resequenced a sample of San, Eastern, and Western Pygmies and non-Pygmy NKs individuals at 40 nongenic (∼2 kb) regions and then analyzed these data within an Approximate Bayesian Computation (ABC) framework. We find substantial support for a model of an early divergence of KhoeSan ancestors from a proto-Pygmy-non-Pygmy NKs group ∼110 thousand years ago over a model incorporating a proto-KhoeSan–Pygmy hunter-gatherer divergence from the ancestors of non-Pygmy NKs. The results of our analyses are consistent with previously identified signals of a strong bottleneck in Mbuti Pygmies and a relatively recent expansion of non-Pygmy NKs. We also develop a number of methodologies that utilize “pseudo-observed” data sets to optimize our ABC-based inference. This approach is likely to prove to be an invaluable tool for demographic inference using genome-wide resequencing data.
Watkins, J. C., Woerner, A. E., Hammer, M. F., Mendez, F. L., & Wall, J. D. (2011). Genetic evidence for archaic admixture in Africa. Proceedings of the National Academy of Sciences, 108(37), 15123-15128. doi:10.1073/pnas.1109300108
Hammer, M. F., Woerner, A. E., Mendez, F. L., Watkins, J. C., Cox, M. P., & Wall, J. D. (2010). The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nature genetics, 42(10), 830-1.
More info
The ratio of X-linked to autosomal diversity was estimated from an analysis of six human genome sequences and found to deviate from the expected value of 0.75. However, the direction of this deviation depends on whether a particular sequence is close to or far from the nearest gene. This pattern may be explained by stronger locally acting selection on X-linked genes compared with autosomal genes, combined with larger effective population sizes for females than for males.
Marsteller, P., de Pillis, L., Findley, A., Joplin, K., Pelesko, J., Nelson, K., Thompson, K., Usher, D., & Watkins, J. (2010). Toward integration: from quantitative biology to mathbio-biomath?. CBE life sciences education, 9(3), 165-71.
More info
In response to the call of BIO2010 for integrating quantitative skills into undergraduate biology education, 30 Howard Hughes Medical Institute (HHMI) Program Directors at the 2006 HHMI Program Directors Meeting established a consortium to investigate, implement, develop, and disseminate best practices resulting from the integration of math and biology. With the assistance of an HHMI-funded mini-grant, led by Karl Joplin of East Tennessee State University, and support in institutional HHMI grants at Emory and University of Delaware, these institutions held a series of summer institutes and workshops to document progress toward and address the challenges of implementing a more quantitative approach to undergraduate biology education. This report summarizes the results of the four summer institutes (2007-2010). The group developed four draft white papers, a wiki site, and a listserv. One major outcome of these meetings is this issue of CBE-Life Sciences Education, which resulted from proposals at our 2008 meeting and a January 2009 planning session. Many of the papers in this issue emerged from or were influenced by these meetings.
Watkins, J. C. (2010). Convergence time to the Ewens sampling formula in the infinite alleles Moran model. Journal of Mathematical Biology, 60(2), 189-206.
More info
PMID: 19288263;Abstract: In this paper, we establish an upper bound for time to convergence to stationarity for the discrete time infinite alleles Moran model. If M is the population size and μ is the mutation rate, this bound gives a cutoff time of log(M μ)/μ generations. The stationary distribution for this process in the case of sampling without replacement is the Ewens sampling formula. We show that the bound for the total variation distance from the generation t distribution to the Ewens sampling formula is well approximated by one of the extreme value distributions, namely, a standard Gumbel distribution. Beginning with the card shuffling examples of Aldous and Diaconis and extending the ideas of Donnelly and Rodrigues for the two allele model, this model adds to the list of Markov chains that show evidence for the cutoff phenomenon. Because of the broad use of infinite alleles models, this cutoff sets the time scale of applicability for statistical tests based on the Ewens sampling formula and other tests of neutrality in a number of population genetic studies. © Springer-Verlag 2009.
Watkins, J. C. (2010). On a Calculus-based Statistics Course for Life Science Students. CBE—Life Sciences Education, 9(3), 298-310. doi:10.1187/cbe.10-03-0035
Watkins, J. C. (2010). On a calculus-based statistics course for life science students. CBE Life Sciences Education, 9(3), 298-310.
More info
PMID: 20810962;PMCID: PMC2931677;Abstract: The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the curriculum. © 2010 The American Society for Cell Biology.
Watkins, J., & Watkins, J. C. (2010). Convergence time to the Ewens sampling formula in the infinite alleles Moran model. Journal of mathematical biology, 60(2).
More info
In this paper, we establish an upper bound for time to convergence to stationarity for the discrete time infinite alleles Moran model. If M is the population size and mu is the mutation rate, this bound gives a cutoff time of log(Mmu)/mu generations. The stationary distribution for this process in the case of sampling without replacement is the Ewens sampling formula. We show that the bound for the total variation distance from the generation t distribution to the Ewens sampling formula is well approximated by one of the extreme value distributions, namely, a standard Gumbel distribution. Beginning with the card shuffling examples of Aldous and Diaconis and extending the ideas of Donnelly and Rodrigues for the two allele model, this model adds to the list of Markov chains that show evidence for the cutoff phenomenon. Because of the broad use of infinite alleles models, this cutoff sets the time scale of applicability for statistical tests based on the Ewens sampling formula and other tests of neutrality in a number of population genetic studies.
Watkins, J., & Watkins, J. C. (2010). On a calculus-based statistics course for life science students. CBE life sciences education, 9(3).
More info
The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the curriculum.
Watkins, J., Marsteller, P., de Pillis, L., Findley, A., Joplin, K., Pelesko, J., Nelson, K., Thompson, K., & Usher, D. (2010). Toward Integration: From Quantitative Biology to Mathbio-Biomath?. CBE—Life Sciences Education, 9(3), 165-171. doi:10.1187/cbe.10-03-0053
Watkins, J. C. (2009).
Convergence time to the Ewens sampling formula in the infinite alleles Moran model
. Journal of Mathematical Biology, 66, 189-206. doi:10.1007/s00285-009-0255-x
Hallmark, B., Watkins, J. C., Lansing, J. S., Cox, M. P., Karafet, T. M., Sudoyo, H., & Hammer, M. F. (2008). Male dominance rarely skews the frequency distribution of Y chromosome haplotypes in human populations. Proceedings of the National Academy of Sciences, 105(33), 11645-11650. doi:10.1073/pnas.0710158105
Lansing, J. S., Watkins, J. C., Hallmark, B., Cox, M. P., Karafet, T. M., Sudoyo, H., & Hammer, M. F. (2008). Male dominance rarely skews the frequency distribution of Y chromosome haplotypes in human populations. Proceedings of the National Academy of Sciences of the United States of America, 105(33), 11645-50.
More info
A central tenet of evolutionary social science holds that behaviors, such as those associated with social dominance, produce fitness effects that are subject to cultural selection. However, evidence for such selection is inconclusive because it is based on short-term statistical associations between behavior and fertility. Here, we show that the evolutionary effects of dominance at the population level can be detected using noncoding regions of DNA. Highly variable polymorphisms on the nonrecombining portion of the Y chromosome can be used to trace lines of descent from a common male ancestor. Thus, it is possible to test for the persistence of differential fertility among patrilines. We examine haplotype distributions defined by 12 short tandem repeats in a sample of 1269 men from 41 Indonesian communities and test for departures from neutral mutation-drift equilibrium based on the Ewens sampling formula. Our tests reject the neutral model in only 5 communities. Analysis and simulations show that we have sufficient power to detect such departures under varying demographic conditions, including founder effects, bottlenecks, and migration, and at varying levels of social dominance. We conclude that patrilines seldom are dominant for more than a few generations, and thus traits or behaviors that are strictly paternally inherited are unlikely to be under strong cultural selection.
Lansing, J. S., Cox, M. P., Downey, S. S., Gabler, B. M., Hallmark, B., Karafet, T. M., Norquest, P., Schoenfelder, J. W., Sudoyo, H., Watkins, J. C., & Hammer, M. F. (2007). Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proceedings of the National Academy of Sciences of the United States of America, 104(41), 16022-6.
More info
Numerous studies indicate strong associations between languages and genes among human populations at the global scale, but all broader scale genetic and linguistic patterns must arise from processes originating at the community level. We examine linguistic and genetic variation in a contact zone on the eastern Indonesian island of Sumba, where Neolithic Austronesian farming communities settled and began interacting with aboriginal foraging societies approximately 3,500 years ago. Phylogenetic reconstruction based on a 200-word Swadesh list sampled from 29 localities supports the hypothesis that Sumbanese languages derive from a single ancestral Austronesian language. However, the proportion of cognates (words with a common origin) traceable to Proto-Austronesian (PAn) varies among language subgroups distributed across the island. Interestingly, a positive correlation was found between the percentage of Y chromosome lineages that derive from Austronesian (as opposed to aboriginal) ancestors and the retention of PAn cognates. We also find a striking correlation between the percentage of PAn cognates and geographic distance from the site where many Sumbanese believe their ancestors arrived on the island. These language-gene-geography correlations, unprecedented at such a fine scale, imply that historical patterns of social interaction between expanding farmers and resident hunter-gatherers largely explain community-level language evolution on Sumba. We propose a model to explain linguistic and demographic coevolution at fine spatial and temporal scales.
Norquest, P. K., Lansing, J. S., Hammer, M. F., Watkins, J. C., Karafet, T. M., Cox, M. P., Downey, S. S., Gabler, B. M., Schoenfelder, J. W., & Sudoyo, H. (2007). Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proceedings of the National Academy of Sciences, 104(41), 16022–16026. doi:10.1073/pnas.0704451104
More info
Numerous studies indicate strong associations between languages and genes among human populations at the global scale, but all broader scale genetic and linguistic patterns must arise from processes originating at the community level. We examine linguistic and genetic variation in a contact zone on the eastern Indonesian island of Sumba, where Neolithic Austronesian farming communities settled and began interacting with aboriginal foraging societies ≈3,500 years ago. Phylogenetic reconstruction based on a 200-word Swadesh list sampled from 29 localities supports the hypothesis that Sumbanese languages derive from a single ancestral Austronesian language. However, the proportion of cognates (words with a common origin) traceable to Proto-Austronesian (PAn) varies among language subgroups distributed across the island. Interestingly, a positive correlation was found between the percentage of Y chromosome lineages that derive from Austronesian (as opposed to aboriginal) ancestors and the retention of PAn cognates. We also find a striking correlation between the percentage of PAn cognates and geographic distance from the site where many Sumbanese believe their ancestors arrived on the island. These language–gene–geography correlations, unprecedented at such a fine scale, imply that historical patterns of social interaction between expanding farmers and resident hunter-gatherers largely explain community-level language evolution on Sumba. We propose a model to explain linguistic and demographic coevolution at fine spatial and temporal scales.
Watkins, J. C. (2007). Microsatellite evolution: Markov transition functions for a suite of models. Theoretical Population Biology, 71(2), 147-159.
More info
PMID: 17123560;Abstract: This paper takes from the collection of models considered by Whittaker et al. [2003. Likelihood-based estimation of microsatellite mutation rates. Genetics 164, 781-787] derived from direct observation of microsatellite mutation in parent-child pairs and provides analytical expressions for the probability distributions for the change in number of repeats over any given number of generations. The mathematical framework for this analysis is the theory of Markov processes. We find these expressions using two approaches, approximating by circulant matrices and solving a partial differential equation satisfied by the generating function. The impact of the differing choice of models is examined using likelihood estimates for time to most recent common ancestor. The analysis presented here may play a role in elucidating the connections between these two approaches and shows promise in reconciling differences between estimates for mutation rates based on Whittaker's approach and methods based on phylogenetic analyses. © 2006 Elsevier Inc. All rights reserved.
Watkins, J. C., Norquest, P., Lansing, J. S., Cox, M. P., Downey, S. S., Gabler, B. M., Hallmark, B., Karafet, T. M., Schoenfelder, J. W., Sudoyo, H., & Hammer, M. F. (2007). Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proceedings of the National Academy of Sciences, 104(41), 16022-16026. doi:10.1073/pnas.0704451104
Watkins, J., & Watkins, J. C. (2007). Microsatellite evolution: Markov transition functions for a suite of models. Theoretical population biology, 71(2).
More info
This paper takes from the collection of models considered by Whittaker et al. [2003. Likelihood-based estimation of microsatellite mutation rates. Genetics 164, 781-787] derived from direct observation of microsatellite mutation in parent-child pairs and provides analytical expressions for the probability distributions for the change in number of repeats over any given number of generations. The mathematical framework for this analysis is the theory of Markov processes. We find these expressions using two approaches, approximating by circulant matrices and solving a partial differential equation satisfied by the generating function. The impact of the differing choice of models is examined using likelihood estimates for time to most recent common ancestor. The analysis presented here may play a role in elucidating the connections between these two approaches and shows promise in reconciling differences between estimates for mutation rates based on Whittaker's approach and methods based on phylogenetic analyses.
Karafet, T. M., Lansing, J. S., Redd, A. J., Reznikova, S., Watkins, J. C., Surata, K., Arthawiguna, W. A., Mayer, L., Bamshad, M. J., Jorde, L. B., & Hammer, M. F. (2005).
Balinese Y-Chromosome Perspective on the Peopling of Indonesia: Genetic Contributions from Pre-Neolithic Hunter-Gatherers, Austronesian Farmers, and Indian Traders
. Human Biology, 77, 93-114. doi:10.1353/hub.2005.0030
More info
The island of Bali lies near the center of the southern chain of islands in the Indonesian archipelago, which served as a stepping-stone for early migrations of hunter-gatherers to Melanesia and Australia and for more recent migrations of Austronesian farmers from mainland Southeast Asia to the Pacific. Bali is the only Indonesian island with a population that currently practices the Hindu religion and preserves various other Indian cultural, linguistic, and artistic traditions (Lansing 1983). Here, we examine genetic variation on the Y chromosomes of 551 Balinese men to investigate the relative contributions of Austronesian farmers and pre-Neolithic hunter-gatherers to the contemporary Balinese paternal gene pool and to test the hypothesis of recent paternal gene flow from the Indian subcontinent. Seventy-one Y-chromosome binary polymorphisms (single nucleotide polymorphisms, SNPs) and 10 Y-chromosome-linked short tandem repeats (STRs) were genotyped on a sample of 1,989 Y chromosomes from 20 populations representing Indonesia (including Bali), southern China, Southeast Asia, South Asia, the Near East, and Oceania. SNP genotyping revealed 22 Balinese lineages, 3 of which (O-M95, O-M119, and O-M122) account for nearly 83.7% of Balinese Y chromosomes. Phylogeographic analyses suggest that all three major Y-chromosome haplogroups migrated to Bali with the arrival of Austronesian speakers; however, STR diversity patterns associated with these haplogroups are complex and may be explained by multiple waves of Austronesian expansion to Indonesia by different routes. Approximately 2.2% of contemporary Balinese Y chromosomes (i.e., K-M9*, K-M230, and M lineages) may represent the pre-Neolithic component of the Indonesian paternal gene pool. In contrast, eight other haplogroups (e.g., within H, J, L, and R), making up approximately 12% of the Balinese paternal gene pool, appear to have migrated to Bali from India. These results indicate that the Austronesian expansion had a profound effect on the composition of the Balinese paternal gene pool and that cultural transmission from India to Bali was accompanied by substantial levels of gene flow.
Karafet, T. M., Lansing, J. S., Redd, A. J., Reznikova, S., Watkins, J. C., Surata, S. P., Arthawiguna, W. A., Mayer, L., Bamshad, M., Jorde, L. B., & Hammer, M. F. (2005). Balinese Y-chromosome perspective on the peopling of Indonesia: genetic contributions from pre-neolithic hunter-gatherers, Austronesian farmers, and Indian traders. Human biology, 77(1), 93-114.
More info
The island of Bali lies near the center of the southern chain of islands in the Indonesian archipelago, which served as a stepping-stone for early migrations of hunter-gatherers to Melanesia and Australia and for more recent migrations of Austronesian farmers from mainland Southeast Asia to the Pacific. Bali is the only Indonesian island with a population that currently practices the Hindu religion and preserves various other Indian cultural, linguistic, and artistic traditions (Lansing 1983). Here, we examine genetic variation on the Y chromosomes of 551 Balinese men to investigate the relative contributions of Austronesian farmers and pre-Neolithic hunter-gatherers to the contemporary Balinese paternal gene pool and to test the hypothesis of recent paternal gene flow from the Indian subcontinent. Seventy-one Y-chromosome binary polymorphisms (single nucleotide polymorphisms, SNPs) and 10 Y-chromosome-linked short tandem repeats (STRs) were genotyped on a sample of 1,989 Y chromosomes from 20 populations representing Indonesia (including Bali), southern China, Southeast Asia, South Asia, the Near East, and Oceania. SNP genotyping revealed 22 Balinese lineages, 3 of which (O-M95, O-M119, and O-M122) account for nearly 83.7% of Balinese Y chromosomes. Phylogeographic analyses suggest that all three major Y-chromosome haplogroups migrated to Bali with the arrival of Austronesian speakers; however, STR diversity patterns associated with these haplogroups are complex and may be explained by multiple waves of Austronesian expansion to Indonesia by different routes. Approximately 2.2% of contemporary Balinese Y chromosomes (i.e., K-M9*, K-M230, and M lineages) may represent the pre-Neolithic component of the Indonesian paternal gene pool. In contrast, eight other haplogroups (e.g., within H, J, L, and R), making up approximately 12% of the Balinese paternal gene pool, appear to have migrated to Bali from India. These results indicate that the Austronesian expansion had a profound effect on the composition of the Balinese paternal gene pool and that cultural transmission from India to Bali was accompanied by substantial levels of gene flow.
Lansing, J. S., Redd, A. J., Karafet, T. M., Watkins, J., Ardika, I. W., Surata, S. P., Schoenfelder, J. S., Campbell, M., Merriwether, A. M., & Hammer, M. F. (2004). An Indian trader in ancient Bali?. Antiquity, 78(300), 287-293.
More info
Abstract: DNA analysis of a tooth found with imported pottery in Bali offers a strong possibility of the presence of a trader of Indian extraction in the late first millennium BC.
Watkins, J. C. (2004). The role of marriage rules in the structure of genetic relatedness. Theoretical Population Biology, 66(1), 13-24.
More info
PMID: 15225572;Abstract: In this work, we take a forward in time approach to compute the probabilities of non-identity by descent for a population consisting of n sections obeying one of a class of marriage rules that is invariant under cyclical relabeling of sections. A perturbation method allows for exact asymptotics using the reciprocal of the section population as a small parameter. The analysis yields relatedness measures that generalize Wright's F-statistics. © 2004 Elsevier Inc. All rights reserved.
Watkins, J., & Watkins, J. C. (2004). The role of marriage rules in the structure of genetic relatedness. Theoretical population biology, 66(1).
More info
In this work, we take a forward in time approach to compute the probabilities of non-identity by descent for a population consisting of n sections obeying one of a class of marriage rules that is invariant under cyclical relabeling of sections. A perturbation method allows for exact asymptotics using the reciprocal of the section population as a small parameter. The analysis yields relatedness measures that generalize Wright's F-statistics.
Anderson, K. R., Mendelson, N. H., & Watkins, J. C. (2000).
A New Mathematical Approach Predicts Individual Cell Growth Behavior using Bacterial Population Information
. Journal of Bacteriology, 181, 600-609. doi:10.1006/jtbi.1999.1051
More info
A theoretical methodology has been developed for studying the growth kinetics of bacterial cells. It utilizes the steady-state cell length distribution in a bacterial population to predict the dependency of growth and division rates on cell length and age. The mathematical model has been applied to the analysis of two bacterial populations, a wild-type strain of Bacillus subtilis, and a minicell-producing strain that carries the divIVB1 mutation. The results show that our model describes the wild-type population very well and that the assumptions typically used in traditional methods are unrealistic. In the case of the minicell-producing mutant we find evidence that the rate of cell division must be a function not only of cell size but also of cell age.
Anderson, K. R., Mendelson, N. H., & Watkins, J. C. (2000). A new mathematical approach predicts individual cell growth behavior using bacterial population information. Journal of Theoretical Biology, 202(1), 87-94.
More info
PMID: 10623502;Abstract: A theoretical methodology has been developed for studying the growth kinetics of bacterial cells. It utilizes the steady-state cell length distribution in a bacterial population to predict the dependency of growth and division rates on cell length and age. The mathematical model has been applied to the analysis of two bacterial populations, a wild-type strain of Bacillus subtilis, and a minicell-producing strain that carries the divIVB1 mutation. The results show that our model describes the wild-type population very well and that the assumptions typically used in traditional methods are unrealistic. In the case of the minicell-producing mutant we find evidence that the rate of cell division must be a function not only of cell size but also of cell age. (C) 2000 Academic Press.
DeGrandi-Hoffman, G., & Watkins, J. C. (2000). The foraging activity of honey bees Apis mellifera and non-Apis bees on hybrid sunflowers (Helianthus annuus) and its influence on cross-pollination and seed set. Journal of Apicultural Research, 39(1-2), 37-45.
More info
Abstract: The repercussions of concurrent foraging by honey bee (Apis mellifera) and non-Apis bee populations on cross-pollination and seed set in hybrid sunflowers (Helianthus annuus) was investigated. The amount of sunflower pollen on the bodies of honey bees foraging in rows of male-sterile (MS) sunflowers was positively correlated with the size of the non-Apis bee population. The combined population of non-Apis bees and honey bees foraging on male-fertile (MF) and MS sunflowers also was positively correlated to seed set in MS rows. There were more honey bees than non-Apis bees foraging in MF and MS rows, but there was no evidence of competition for resources between the two populations. The size of the honey bee population was positively correlated to the area of open flowers on sunflower capitula, while the non-Apis population remained relatively constant throughout bloom. Results from this study indicate that a combined honey bee and non-Apis bee population might result in better pollination of hybrid sunflowers than either population alone.
DeGrandi-Hoffman, G., Watkins, J., Guerrero, P., & Erickson, E. (2000). Using honey bees to teach mathematics and science to high school students. American Bee Journal, 140(4), 293-295.
More info
Abstract: Honey bees have been an integral part of human civilization for centuries. They pollinate crops and supply us with honey and pollen. In fact, humans use almost everything that honey bees collect or produce from royal jelly in cosmetics, to wax for candies and propolis for finishing the wood of fine musical instruments. Honey bees also have another function. They make great tools for teaching the fundamentals of mathematics and biology to students of all ages.
Watkins, J. C. (2000). Consistency and fluctuation theorems for discrete time structured population models having demographic stochasticity. Journal of Mathematical Biology, 41(3), 253-271.
More info
PMID: 11072758;Abstract: In this paper we prove a consistency theorem (law of large numbers) and a fluctuation theorem (central limit theorem) for structured population processes. The basic assumptions for these theorems are that the individuals have no statistically distinguishing features beyond their class and that the interaction between any two individuals is not too high. We apply these results to density dependent models of Leslie type and to a model for flour beetle dynamics.
Mendelson, N. H., Bourque, A., Wilkening, K., Anderson, K. R., & Watkins, J. C. (1999). Organized cell swimming motions in Bacillus subtilis colonies: Patterns of short-lived whirls and jets. Journal of Bacteriology, 181(2), 600-609.
More info
PMID: 9882676;PMCID: PMC93416;Abstract: The swimming motions of cells within Bacillus subtilis colonies, as well as the associated fluid flows, were analyzed from video films produced during colony growth and expansion on wet agar surfaces. Individual cells in very wet dense populations moved at rates between 76 and 116 μm/s. Swimming cells were organized into patterns of whirls, each approximately 1,000 μm2, and jets of about 95 by 12 μm. Whirls and jets were short-lived, lasting only about 0.25 s. Patterns within given areas constantly repeated with a periodicity of approximately 1 s. Whirls of a given direction became disorganized and then re-formed, usually into whirls moving in the opposite direction. Pattern elements were also organized with respect to one another in the colony. Neighboring whirls usually turned in opposite directions. This correlation decreased as a function of distance between whirls. Fluid flows associated with whirls and jets were measured by observing the movement of marker latex spheres added to colonies. The average velocity of markers traveling in whirls was 19 μm/s, whereas those traveling in jets moved at 27 μm/s. The paths followed by markers were aligned with the direction of cell motion, suggesting that cells create flows moving with them into whirls and along jets. When colonies became dry, swimming motions ceased except in regions close to the periphery and in isolated islands where cells traveled in slow whirls at about 4 μm/s. The addition of water resulted in immediate though transient rapid swimming (> 80 μm/s) in characteristic whirl and jet patterns. The rate of swimming decreased to 13 μm/s within 2 min, however, as the water diffused into the agar. Organized swimming patterns were nevertheless preserved throughout this period. These findings show that cell swimming in colonies is highly organized.
Watkins, J. C., Mendelson, N. H., Bourque, A., Wilkening, K., & Anderson, K. R. (1999). Organized Cell Swimming Motions in Bacillus subtilis Colonies: Patterns of Short-Lived Whirls and Jets. Journal of Bacteriology, 181(2), 600-609. doi:10.1128/jb.181.2.600-609.1999
DeGrandi-Hoffman, G., & Watkins, J. C. (1998). Queen development time and the Africanization of European honey bees. American Bee Journal, 138(6), 467-469.
More info
Abstract: Organisms that inhabit an area thrive because they have become well adapted to the environmental conditions that surround them. The general rule is that the distinguishing characteristics of a species change very slowly over time. Suppose an individual of a particular species migrates into a territory in which that same species is established. If the migrant individual is similar to the current inhabitants, then the small differences in traits that the new arrival might bring will mix into the population and the distinguishing traits will rarely, if ever, be seen. On the other hand, if the immigrant has distinctive genetic characteristics, it often cannot compete for survival and reproduction with the resident population. Thus, these distinguishing characteristics are rapidly removed from the population's gene pool.
DeGrandi-Hoffman, G., Watkins, J. C., Collins, A. G., Loper, G. M., Martin, J. H., Arias, M. C., & Sheppard, W. S. (1998).
Queen Developmental Time as a Factor in the Africanization of European Honey Bee (Hymenoptera: Apidae) Populations
. Annals of the Entomological Society of America, 91, 52-58. doi:10.1093/aesa/91.1.52
More info
The development times of daughter queens from African and European matrilines mated to both African and European drones were recorded. Regardless of the matriline, African patriline queens completed their development and emerged 8–12 h before those with European paternity. A probability distribution function derived from the emergence time data indicated that because of differences in development times between patrilines, the probability that an African patriline queen will emerge 1st can be 2–3 times greater than the proportion of the African patrilines in the colony population. Because the 1st queen to emerge has the best chance of becoming the colony's new queen, differences in queen development times between Africanand European patrilines might be a factor contributing to the asymmetrical gene flow between African and European honey bee, Apis mellifera L., populations, and the eventual loss of European nuclear markers and behavioral attributes in European honey bee populations where African bees have migrated.
Degrandi-Hoffman, G., Watkins, J. C., Collins, A. M., Loper, G. M., Martin, J. H., Arias, M. C., & Sheppard, W. S. (1998). Queen developmental time as a factor in the Africanization of European honey bee (Hymenoptera: Apidae) populations. Annals of the Entomological Society of America, 91(1), 52-58.
More info
Abstract: The development times of daughter queens from African and European matrilines mated to both African and European drones were recorded. Regardless of the matriline, African patriline queens completed their development and emerged 8-12 h before those with European paternity. A probability distribution function derived from the emergence time data indicated that because of differences in development times between patrilines, the probability that an African patriline queen will emerge 1st can be 2-3 times greater than the proportion of the African patrilines in the colony population. Because the 1st queen to emerge has the best chance of becoming the colony's new queen, differences in queen development times between African and European patrilines might be a factor contributing to the asymmetrical gene flow between African and European honey bee, Apis mellifera L., populations, and the eventual loss of European nuclear markers and behavioral attributes in European honey bee populations where African bees have migrated.
Degrandi-hoffman, G., Watkins, J. C., & Degrandi-hoffman, G. (1998). QUEEN DEVELOPMENT TIME AND THE AFRICANIZATION OF EUROPEAN HONEY BEES. American Bee Journal, 138(6), 467-469.
Watkins, J. C. (1997). Mechanical models for cell movement - Locomotion, translocation, migration. Journal of Applied Probability, 34(4), 827-846.
More info
Abstract: This paper provides a detailed stochastic analysis of leucocyte cell movement based on the dynamics of a rigid body. The cell's behavior is studied in two relevant anisotropic environments displaying adhesion mediated movement (haptotaxis) and stimulus mediated movement (chemotaxis). This behavior is modeled by diffusion processes on three successively longer time scales, termed locomotion, translocation, and migration.
Watkins, J. C. (1996). REVIEW OF "Lectures on Random Evolutions," by Mark A. Pinsky. Annals of Probability, 24(3), 1647-1652. doi:10.1214/aop/1065725198
Heubach, S., & Watkins, J. C. (1995). A stochastic model for the movement of a white blood cell. Advances in Applied Probability, 27(2), 443-475. doi:10.2307/1427835
More info
We present a stochastic model for the movement of a white blood cell both in uniform concentration of chemoattractant and in the presence of a chemoattractant gradient. It is assumed that the rotational velocity is proportional to the weighted difference of the occupied receptors in the two halves of the cell and that each of the receptors stays free or occupied for an exponential length of time. We define processes corresponding to a cell with 2nP + 1 receptors (receptor sites). In the case of constant concentration, we show that the limiting process for the rotational velocity is an Omstein-Uhlenbeck process. Its drift coefficient depends on the parameters of the exponential waiting times and its diffusion coefficient depends in addition also on the weight function. In the inhomogeneous case, the velocity process has a diffusion limit with drift coefficient depending on the concentration gradient and diffusion coefficient depending on the concentration and the weight function.
Watkins, J. C., & Woessner, B. (1991). Diffusion models for chemotaxis: a statistical analysis of noninteractive unicellular movement. Mathematical Biosciences, 104(2), 271-303.
More info
PMID: 1804464;Abstract: A program is developed for applying stochastic differential equations to models for chemotaxis. First a few of the experimental and theoretical models for chemotaxis both for swimming bacteria and for cells migrating along a substrate are reviewed. In physical and biological models of deterministic systems, finite difference equations are often replaced by a limiting differential equation in order to take advantage of the ease in the use of calculus. A similar but more intricate methodology is developed here for stochastic models for chemotaxis. This exposition is possible because recent work in probability theory gives ease in the use of the stochastic calculus for diffusions and broad applicability in the convergence of stochastic difference equations to a stochastic differential equation. Stochastic differential equations suggest useful data for the model and provide statistical tests. We begin with phenomenological considerations as we analyze a one-dimensional model proposed by Boyarsky, Noble, and Peterson in their study of human granulocytes. In this context, a theoretical model consists in identifying which diffusion best approximates a model for cell movement based upon theoretical considerations of cell phsyiology. Such a diffusion approximation theorem is presented along with discussion of the relationship between autocovariance and persistence. Both the stochastic calculus and the diffusion approximation theorem are described in one dimension. Finally, these tools are extended to multidimensional models and applied to a three-dimensional experimental setup of spherical symmetry. © 1991.
Watkins, J. C. (1990). A remark on Kunita's decomposition theorem. Stochastic Processes and their Applications, 35(1), 81-85.
More info
Abstract: We use Michel Emery's stability theorem for stochastic differential equations to give a short proof for explicit solutions to linear stochastic differential equations over a solvable Lie group. © 1990.
Watkins, J. C. (1989). Donsker's Invariance Principle for Lie Groups. Annals of Probability, 17(3), 1220-1242. doi:10.1214/aop/1176991265
More info
This paper establishes a functional central limit theorem for Lie groups under a mixing hypothesis. The main theorem generalizes results by Patrick Billingsley for Euclidean space and the author for the general linear group.
Watkins, J. C. (1987).
A Companion to the Oseledec Multiplicative Ergodic Theorem
. Proceedings of the American Mathematical Society, 90, 772-776. doi:10.2307/2046491
More info
Let ${F_1},{F_2}, \ldots$ be a stationary sequence of continuously differentiable mappings from $[0,1]$ into the set of $d \times d$ matrices. Assume ${F_k}(0) = I$ for each $k$ and $E[{\sup _{0 \leq p \leq 1}}||{Fâ_k}(p)||] < \infty$. Let $\mathcal {I}$ denote the invariant sigma field for the sequence. Then \[ \lim \limits _{n \to \infty } {F_n}\left ( {\frac {1}{n}} \right ) \cdots {F_2}\left ( {\frac {1}{n}} \right ){F_1}\left ( {\frac {1}{n}} \right ) = \exp E[{Fâ_1}(0)|\mathcal {I}]\] with probability one.
Watkins, J. C. (1987). A companion to the oseledec multiplicative ergodic theorem. Proceedings of the American Mathematical Society, 99(4), 772-776. doi:10.1090/s0002-9939-1987-0877055-7
Watkins, J. C. (1987). Functional central limit theorems and their associated large deviation principles for products of random matrices. Probability Theory and Related Fields, 76(2), 133-166.
More info
Abstract: This paper establishes a functional central limit theorem for a product of random matrices. The sequence of matrices form a stationary process which is a φ-mixing. The individual matrices in the product become closer and closer to the identity matrix with longer and longer products. In addition, these perturbations from the identity matrix have mean zero. A large deviation principle for the limit process is proved. © 1987 Springer-Verlag.
Watkins, J. C. (1986).
Limit theorems for products of random matrices: a comparison of two points of view
. Random matrices and their applications, 50, 5-22. doi:10.1090/conm/050/841078
Watkins, J. C. (1985). A STOCHASTIC INTEGRAL REPRESENTATION FOR RANDOM EVOLUTIONS. Annals of Probability, 13(2), 531-557. doi:10.1214/aop/1176993007
Watkins, J. C. (1985). Limit theorems for stationary random evolutions. Stochastic Processes and their Applications, 19(2), 189-224.
More info
Abstract: On a separable Banach space, let A(ξ1),A(ξ2),... be a strictly stationary sequence of infinitesimal operators, centered so that EA(ξi) = 0, i = 1,2,.... This paper characterizes the limit of the random evolutions Yn(t)=exp 1 nA(ξ[n2t])⋯exp 1 nA(ξ2)exp 1 nA(ξ1)Yn(0)as the solution to a martingale problem. This work is a direct extension of previous work on i.i.d. random evolutions. © 1985.
Watkins, J. C. (1984). A Central Limit Problem in Random Evolutions. Annals of Probability, 12(2), 480-513. doi:10.1214/aop/1176993302
More info
Let $Tnln~ be a sequence of independent and identically distributed strongly continuous semigroups on a separable Banach space. The corresponding generators JAnlnal satisfy E[An] = 0. Conditions are given to guarantee that the weak limit Y(t) = limitn Ho f LI5' Ti(1/n) Yn(O) exists, and is characterized as the unique solution of a martingale problem. Transport phenomena, random classical mechanics, and families of bounded operators are the featured examples.

Proceedings Publications

Mak, J., Marquez, A., Watkins, J., Higgins, M., & Wilch, M. (2023). Culturally Sustaining-Revitalizing Approaches to Computer Science Education in Southern AZ. In RESPECT.
More info
Taking a culturally sustaining-revitalizing approach to supporting computer science (CS) education initiatives is essential to upholding tribal identity and sovereignty. This project addresses a critical gap in predominantly Western-centered CS education approaches by infusing and centering Indigenous perspectives, cultures, ideas, and goals into teaching computer science within two southern Arizona tribal communities. We started the CS co-construction process with listening circles, using an assets and community-based approach to understand the unique perspectives for their youth to build these into the co-developed CS course that will reflect and actualize their community's values, lived experiences, and future hopes.
Ahmed, R., Angelini, P., Efrat, A., Glickenstein, D., Gronemann, M., Heinsohn, N., Kobourov, S. G., Sahneh, F. D., Spence, R., Watkins, J. C., & Wolff, A. (2018).
Multi-Level Steiner Trees.
. In Symposium on Experimental Algorithms.
More info
In the classical Steiner tree problem, one is given an undirected, connected graph G=(V,E) with non-negative edge costs and a set of terminals T subseteq V. The objective is to find a minimum-cost edge set E' subseteq E that spans the terminals. The problem is APX-hard; the best known approximation algorithm has a ratio of rho = ln(4)+epsilon < 1.39. In this paper, we study a natural generalization, the multi-level Steiner tree (MLST) problem: given a nested sequence of terminals T_1 subset ... subset T_k subseteq V, compute nested edge sets E_1 subseteq ... subseteq E_k subseteq E that span the corresponding terminal sets with minimum total cost. The MLST problem and variants thereof have been studied under names such as Quality-of-Service Multicast tree, Grade-of-Service Steiner tree, and Multi-Tier tree. Several approximation results are known. We first present two natural heuristics with approximation factor O(k). Based on these, we introduce a composite algorithm that requires 2^k Steiner tree computations. We determine its approximation ratio by solving a linear program. We then present a method that guarantees the same approximation ratio and needs at most 2k Steiner tree computations. We compare five algorithms experimentally on several classes of graphs using four types of graph generators. We also implemented an integer linear program for MLST to provide ground truth. Our combined algorithm outperforms the others both in theory and in practice when the number of levels is small (k

Presentations

Watkins, J. C. (2022, April). Epidemics from the Eye of the Pathogen. Mathematical Biology Seminar. virtual: Arizona State University.
Watkins, J. C. (2022, January/March). Data Sciences Academy at the University of Arizona. Academic Data Science Alliance Annual Conference. Online/Irvine, california: Academic Data Science Alliance.
Watkins, J. C., Gentry, B., & Richardson, M. (2020, August). Indigenous Language Migration along the US Southern Border, the View from Arizona. Joint Statistical Meetings. virtual: American Statistical Association.
More info
Panel presentation for the ASA Committee on Human Rights
Watkins, J. C. (2018, January). Data and the Human Condition. AAAS. AAAS Washington DC: AAAS.
Watkins, J. C. (2018, October). Multilevel Steiner Trees. TRIPODS Conference. Santa Clara, California: University of California, Santa Cruz.
Watkins, J. C. (2017, September). On the Human Condition. Biostatistics Seminar. University of Arizona: College of Public Health.
Watkins, J. C. (2014, June). BEEPOP: The Population Dynamics of the Honey Bee in the Hive and in the Wild. BioQuest/HHMI. University of Delaware: BioQuest/HHMI.
More info
Workshop on the Native American Summer Program at the University of Arizona
Watkins, J. C. (2014, September). Curriculum & Classroom Strategies in a Statistics Course for Life Sciences Majors/Math Minors. High Performance Computing in Undergradraduate Quantitative Biologyu. Cold Spring Harbor, New York: Cold Spring Harbor.

Poster Presentations

Watkins, J. C. (2018, October). A novel non-linear dimension reduction approach to infer population structure for low-coverage sequencing data. American Society of Human Genetics Annual Meeting. San Diego. California: American Society of Human Genetics.
Bender, C., Watkins, J. C., Tolbert, L. P., Bender, C., Watkins, J. C., Tolbert, L. P., Bender, C., Watkins, J. C., & Tolbert, L. P. (2012, October). Assessing Undergraduate Research and BioMath Efforts at the University of Arizona. Howard Hughes Medical Institute Program Directors' Meeting. Chevy Chase, Maryland: Howard Hughes Medical Institute.

Edit my profile

Profiles search form

Joseph C Watkins

Degrees

Work Experience

Related Links

Interests

Teaching

Research

Courses

2026-27 Courses

Capstone for Data Science

Honors Thesis

Theory of Probability

Theory of Probability

2025-26 Courses

Capstone for Data Science

Honors Thesis

Intro Statistical Method

Intro Statistical Method

Thesis

Honors Thesis

Intro Statistical Method

Intro Statistical Method

Senior Capstone

2024-25 Courses

Honors Independent Study

Honors Thesis

Honors Thesis

Intro Statistical Method

Intro Statistical Method

Honors Independent Study

Honors Thesis

Honors Thesis

Intro Statistical Method

Intro Statistical Method

2023-24 Courses

Honors Thesis

Honors Thesis

Intro Statistical Method

Intro Statistical Method

Thesis

Topics in Math

Honors Thesis

Honors Thesis

Topics in Math

2022-23 Courses

Honors Thesis

Topics in Math

Honors Thesis

Topics in Math

2021-22 Courses

Topics in Math

Topics in Math

2020-21 Courses

Thesis

Topics in Math

Theory of Probability

Theory of Probability

Topics in Math

2019-20 Courses

Honors Thesis

Intro to Statistical Computing

Theory of Statistics

Theory of Statistics

Thesis

Topics in Math

Honors Thesis

Independent Study

Research

Theory of Probability

Theory of Probability

Thesis

Topics in Math

2018-19 Courses

Intro Statistical Method

Research

Topics in Math

Intro Statistical Method

Research

Topics in Math