Lingling An
- Associate Professor, Agricultural-Biosystems Engineering
- Associate Professor, Statistics-GIDP
- Associate Professor, BIO5 Institute
- Associate Professor, Public Health
- Member of the Graduate Faculty
Contact
- (520) 621-1248
- Shantz, Rm. 403
- Tucson, AZ 85721
- anling@arizona.edu
Degrees
- Ph.D. Statistics
- Purdue University, West Lafayette, Indiana
Work Experience
- University of Arizona, Tucson, Arizona (2015 - Ongoing)
- University of Arizona, Tucson, Arizona (2008 - 2015)
Awards
- top 5 downloaded paper
- BMC Bioinformatics, Fall 2014
Interests
Research
Statistical genomics/metagenomics; Bioinformatics; Data mining and pattern recognition.
Teaching
Design and Analysis of Experiments; Statistical Bioinformatics; Biostatistics
Courses
2023-24 Courses
-
Dissertation
STAT 920 (Spring 2024) -
Applied Biostatistics
BAT 413 (Fall 2023) -
Applied Biostatistics
BAT 513 (Fall 2023) -
Applied Biostatistics
BE 413 (Fall 2023) -
Applied Biostatistics
BE 513 (Fall 2023) -
Applied Biostatistics
EIS 513 (Fall 2023) -
Applied Biostatistics
RNR 513 (Fall 2023) -
Dissertation
STAT 920 (Fall 2023) -
Research
BIOS 900 (Fall 2023) -
Thesis
BE 910 (Fall 2023) -
Thesis
STAT 910 (Fall 2023)
2022-23 Courses
-
Design of Experiments
MATH 571B (Spring 2023) -
Design of Experiments
STAT 571B (Spring 2023) -
Dissertation
STAT 920 (Spring 2023) -
Research
BIOS 900 (Spring 2023) -
Research
STAT 900 (Spring 2023) -
Thesis
STAT 910 (Spring 2023) -
Applied Biostatistics
BAT 413 (Fall 2022) -
Applied Biostatistics
BE 413 (Fall 2022) -
Applied Biostatistics
BE 513 (Fall 2022) -
Applied Biostatistics
EIS 513 (Fall 2022) -
Dissertation
STAT 920 (Fall 2022) -
Independent Study
BE 599 (Fall 2022) -
Independent Study
BIOS 699 (Fall 2022) -
Independent Study
STAT 599 (Fall 2022) -
Thesis
BE 910 (Fall 2022) -
Thesis
STAT 910 (Fall 2022)
2021-22 Courses
-
Internship
BE 593 (Summer I 2022) -
Design of Experiments
MATH 571B (Spring 2022) -
Design of Experiments
STAT 571B (Spring 2022) -
Dissertation
BE 920 (Spring 2022) -
Dissertation
STAT 920 (Spring 2022) -
Independent Study
BIOS 699 (Spring 2022) -
Thesis
BE 910 (Spring 2022) -
Thesis
STAT 910 (Spring 2022) -
Applied Biostatistics
BAT 413 (Fall 2021) -
Applied Biostatistics
BE 413 (Fall 2021) -
Applied Biostatistics
BE 513 (Fall 2021) -
Applied Biostatistics
EIS 513 (Fall 2021) -
Applied Biostatistics
RNR 513 (Fall 2021) -
Dissertation
BE 920 (Fall 2021) -
Dissertation
STAT 920 (Fall 2021) -
Independent Study
BIOS 699 (Fall 2021) -
Internship
BE 693 (Fall 2021) -
Thesis
BE 910 (Fall 2021)
2020-21 Courses
-
Design of Experiments
MATH 571B (Spring 2021) -
Design of Experiments
STAT 571B (Spring 2021) -
Dissertation
BE 920 (Spring 2021) -
Dissertation
STAT 920 (Spring 2021) -
Internship
BE 493 (Spring 2021) -
Research
STAT 900 (Spring 2021) -
Thesis
BE 910 (Spring 2021) -
Applied Biostatistics
BE 413 (Fall 2020) -
Applied Biostatistics
BE 513 (Fall 2020) -
Applied Biostatistics
EIS 513 (Fall 2020) -
Dissertation
BE 920 (Fall 2020) -
Dissertation
STAT 920 (Fall 2020) -
Thesis
BE 910 (Fall 2020)
2019-20 Courses
-
Design of Experiments
MATH 571B (Spring 2020) -
Design of Experiments
STAT 571B (Spring 2020) -
Dissertation
BE 920 (Spring 2020) -
Dissertation
STAT 920 (Spring 2020) -
Research
STAT 900 (Spring 2020) -
Thesis
STAT 910 (Spring 2020) -
Applied Biostatistics
BE 413 (Fall 2019) -
Applied Biostatistics
BE 513 (Fall 2019) -
Applied Biostatistics
EIS 513 (Fall 2019) -
Applied Biostatistics
RNR 513 (Fall 2019) -
Dissertation
BE 920 (Fall 2019) -
Dissertation
STAT 920 (Fall 2019) -
Thesis
STAT 910 (Fall 2019)
2018-19 Courses
-
Design of Experiments
MATH 571B (Spring 2019) -
Design of Experiments
STAT 571B (Spring 2019) -
Dissertation
BE 920 (Spring 2019) -
Dissertation
BIOS 920 (Spring 2019) -
Dissertation
STAT 920 (Spring 2019) -
Independent Study
STAT 599 (Spring 2019) -
Applied Biostatistics
ABE 513 (Fall 2018) -
Computation in Biomedicine
ACBS 567 (Fall 2018) -
Dissertation
ABE 920 (Fall 2018) -
Dissertation
BIOS 920 (Fall 2018) -
Dissertation
STAT 920 (Fall 2018) -
Thesis
STAT 910 (Fall 2018)
2017-18 Courses
-
Design of Experiments
MATH 571B (Spring 2018) -
Design of Experiments
STAT 571B (Spring 2018) -
Dissertation
ABE 920 (Spring 2018) -
Dissertation
BIOS 920 (Spring 2018) -
Dissertation
STAT 920 (Spring 2018) -
Stat Bioinfo+Genomic Anl
ABE 516A (Spring 2018) -
Stat Bioinfo+Genomic Anl
GENE 516A (Spring 2018) -
Stat Bioinfo+Genomic Anl
MCB 416A (Spring 2018) -
Stat Bioinfo+Genomic Anl
MCB 516A (Spring 2018) -
Computation in Biomedicine
ACBS 567 (Fall 2017) -
Dissertation
ABE 920 (Fall 2017) -
Dissertation
BIOS 920 (Fall 2017) -
Dissertation
STAT 920 (Fall 2017) -
Thesis
STAT 910 (Fall 2017)
2016-17 Courses
-
Thesis
STAT 910 (Summer I 2017) -
Design of Experiments
MATH 571B (Spring 2017) -
Design of Experiments
STAT 571B (Spring 2017) -
Dissertation
EPID 920 (Spring 2017) -
Dissertation
STAT 920 (Spring 2017) -
Honors Thesis
CSC 498H (Spring 2017) -
Thesis
STAT 910 (Spring 2017) -
Dissertation
ABE 920 (Fall 2016) -
Dissertation
EPID 920 (Fall 2016) -
Dissertation
STAT 920 (Fall 2016) -
Honors Thesis
CSC 498H (Fall 2016) -
Internship
ABE 393 (Fall 2016) -
Thesis
STAT 910 (Fall 2016)
2015-16 Courses
-
Dissertation
EPID 920 (Summer I 2016) -
Design of Experiments
MATH 571B (Spring 2016) -
Design of Experiments
STAT 571B (Spring 2016) -
Honors Thesis
PSIO 498H (Spring 2016) -
Internship
ABE 493 (Spring 2016) -
Stat Bioinfo+Genomic Anl
ABE 516A (Spring 2016) -
Stat Bioinfo+Genomic Anl
GENE 516A (Spring 2016) -
Stat Bioinfo+Genomic Anl
MCB 416A (Spring 2016) -
Stat Bioinfo+Genomic Anl
MCB 516A (Spring 2016)
Scholarly Contributions
Chapters
- Du, R., An, L., & Fang, Z. (2018). Performance evaluation of normalization approaches for metagenomic compositional data on differential abundance analysis. In Frontiers of Biostatistics and Bioinformatics. Spinger. doi:10.1007/978-3-319-99389-8
Journals/Publications
- An, L., Bhadani, R., & Chen, Z. (2023).
Attention-Based Graph Neural Network for Label Propagation in Single-Cell Omics
. Genes, 14(2), 506. doi:10.3390/genes14020506More infoSingle-cell data analysis has been at forefront of development in biology and medicine since sequencing data have been made available. An important challenge in single-cell data analysis is the identification of cell types. Several methods have been proposed for cell-type identification. However, these methods do not capture the higher-order topological relationship between different samples. In this work, we propose an attention-based graph neural network that captures the higher-order topological relationship between different samples and performs transductive learning for predicting cell types. The evaluation of our method on both simulation and publicly available datasets demonstrates the superiority of our method, scAGN, in terms of prediction accuracy. In addition, our method works best for highly sparse datasets in terms of F1 score, precision score, recall score, and Matthew's correlation coefficients as well. Further, our method's runtime complexity is consistently faster compared to other methods. - Lu, Y., Chen, Q. M., & An, L. (2023). Semi-reference based cell type deconvolution with application to human metastatic cancers. NAR Genomics and Bioinformatics, 5(4), lqad109.
- Luo, Q., Lu, M., Zhang, M., Jiang, H., & An, L. (2023). A Regression-Based Approach for Accurate Source Tracking Using Microbial Communities. International Journal of Forensic Sciences, 8(4), 000338. doi:DOI: 10.23880/ijfsc-16000338
- Lu, Y., An, L., Taylor, M. R., & Chen, Q. M. (2022). Nrf2 signaling in heart failure: expression of Nrf2, Keap1, antioxidant, and detoxification genes in dilated or ischemic cardiomyopathy. Physiological genomics, 54(3), 115-127.More infoIncreased levels of oxidative stress have been found with heart failure. Whether failing hearts express antioxidant and detoxification enzymes have not been addressed systematically. Nrf2 gene encodes a transcription factor that regulates the expression of antioxidant and detoxification genes. Using RNA-Seq data set from explanted hearts of 37 patients with dilated cardiomyopathy (DCM), 13 patients with ischemic cardiomyopathy (ICM), and 14 nonfailure (NF) donors as a control, we addressed whether failing hearts change the expression of Nrf2, its negative regulator Keap1, and antioxidant or detoxification genes. Significant increases in the ratio of Nrf2 to Keap1 were found to associate with DCM or ICM. Antioxidant genes showed decreased expression in both types of heart failure, including , , , , , , and . Detoxification enzymes, GCLM and EPHX1, also showed decreased expression, whereas the CYP1B1 transcript was elevated in both DCM and ICM. The genes encoding metal-binding protein ferritin were decreased, whereas five out of 12 metallothionein genes showed elevated expression. Our finding on Nrf2 gene expression has been validated by meta-analysis of seven independent data sets of microarray or RNA-Seq for differential gene expression in DCM and ICM from NF controls. In conclusion, minor elevation of Nrf2 gene expression is not coupled to increases in antioxidant and detoxification genes, supporting an impairment of Nrf2 signaling in patients with heart failure. Decreases in multiple antioxidant and detoxification genes are consistent with the observed increases of oxidative stress in failing hearts.
- Luo, D., Liu, W., Chen, T., & An, L. (2022). A Distribution-Free Model for Longitudinal Metagenomic Count Data. Genes, 13(7).More infoLongitudinal metagenomics has been widely studied in the recent decade to provide valuable insight for understanding microbial dynamics. The correlation within each subject can be observed across repeated measurements. However, previous methods that assume independent correlation may suffer from incorrect inferences. In addition, methods that do account for intra-sample correlation may not be applicable for count data. We proposed a distribution-free approach, namely CorrZIDF, which extends the current method to model correlated zero-inflated metagenomic count data, offering a powerful and accurate solution for detecting significance features. This method can handle different working correlation structures without specifying each margin distribution of the count data. Through simulation studies, we have shown the robustness of CorrZIDF when selecting a working correlation structure for repeated measures studies to enhance the efficiency of estimation. We also compared four methods using two real datasets, and the new proposed method identified more unique features that were reported previously on the relevant research.
- Mallick, H., An, L., Chen, M., Wang, P., & Zhao, N. (2022). Editorial: Methods for Single-Cell and Microbiome Sequencing Data. Frontiers in genetics, 13, 920191.
- Zhang, X., Chen, Z., Bhadani, R., Cao, S., Lu, M., Lytal, N., Chen, Y., & An, L. (2022). NISC: Neural Network-Imputation for Single-Cell RNA Sequencing and Cell Type Clustering. Frontiers in genetics, 13, 847112.More infoSingle-cell RNA sequencing (scRNA-seq) reveals the transcriptome diversity in heterogeneous cell populations as it allows researchers to study gene expression at single-cell resolution. The latest advances in scRNA-seq technology have made it possible to profile tens of thousands of individual cells simultaneously. However, the technology also increases the number of missing values, i. e, dropouts, from technical constraints, such as amplification failure during the reverse transcription step. The resulting sparsity of scRNA-seq count data can be very high, with greater than 90% of data entries being zeros, which becomes an obstacle for clustering cell types. Current imputation methods are not robust in the case of high sparsity. In this study, we develop a Neural Network-based Imputation for scRNA-seq count data, NISC. It uses autoencoder, coupled with a weighted loss function and regularization, to correct the dropouts in scRNA-seq count data. A systematic evaluation shows that NISC is an effective imputation approach for handling sparse scRNA-seq count data, and its performance surpasses existing imputation methods in cell type identification.
- Dai, Z., chen, J., Liu, B., Yi, D., Feng, A., Wang, T., Gao, C., An, L., Wang, Y., Zhu, M., Zhang, X., & Zhang, Y. (2021). Loss of Endothelial Hypoxia Inducible Factor-Prolyl Hydroxylase 2 Induces Cardiac Hypertrophy and Fibrosis. Journal of American Heart Association, 10(e022077). doi:doi/10.1161/JAHA.121.022077
- Fabiano-Smith, L., Privette, C., & An, L. (2021). Phonological Measures for Bilingual Spanish-English Speaking Preschoolers: The Language Combination Effect. Journal of Speech, Language, and Hearing Research, 64(3942-3968). doi:doi: 10.1044/2021_JSLHR-21-00008
- Khan, S. A., Campbell, A. M., Lu, Y., An, L., & Chen, Q. (2021). N-Acetylcysteine for Cardiac Protection during Coronary Artery Reperfusion: A Comprehensive Systematic Review and Meta-Analysis of Twenty Eight Randomized Controlled Trials. Frontiers Cardiovascular Medicine, 8(752939). doi:doi: 10.3389/fcvm.2021.752939
- Li, H., Baldwin, E., Zhang, X., Kenost, C., Luo, W., Calhoun, E., An, L., Bennett, C., & Lussier, Y. (2021). Comparison and impact of COVID-19 for patients with cancer: a survival analysis of fatality rate controlling for age, sex and cancer type. BMJ Health & Care Informatics, 28(e100341). doi:doi:10.1136/ bmjhci-2021-100341
- An, L., Lytal, N., Ran, D., & Zhang, S. (2020).
scDoc: Correcting Drop-out Events in Single-cell RNA-seq Data
. Bioinformatics, 36(15), 4233-4239. doi:doi:10.1093/bioinformatics/btaa283More infoAbstract Single-cell RNA sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of “drop-out” events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this paper, we present a novel Single-Cell RNA-seq Drop-Out Correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. scDoc is the first method that involves drop-out information to account for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc can impute the drop-out events more accurately and robustly; specifically, it outperforms all available imputation methods in reference to data visualization, cell subpopulation identification, and differential expression detection in scRNA-seq data. - Baldwin, E., Baldwin, E., Han, J., Han, J., Luo, W., Luo, W., Zhou, J., Zhou, J., An, L., An, L., Liu, J., Liu, J., Zhang, H., Zhang, H., Li, H., & Li, H. (2020). On fusion methods for knowledge discovery from multi-omics datasets. Computational and structural biotechnology journal, 18, 509–517. doi:https://doi.org/10.1016/j.csbj.2020.02.011
- Carter, K., Lu, M., Jiang, H., & An, L. (2020). An Information-based Approach for Mediation Analysis on High-dimensional Metagenomic Data. Frontiers in Genetics, 11, 148. doi:doi: 10.3389/fgene.2020.00148
- Carter, K., Lu, M., Luo, Q., Jiang, H., & An, L. (2020). Microbial community dissimilarity for source tracking with application in forensic studies. PLoS ONE. doi:https://doi.org/10.1371/journal.pone.0236082More infoSenior/corresponding author
- Lytal, N., Ran, D., & An, L. (2020). Normalization methods on single-cell RNA-seq data: an empirical survey. Frontiers in Genetics, 11, 41. doi:doi: 10.3389/fgene.2020.00041More infoSenior/corresponding author
- Ran, D., Zhang, S., Lytal, N., & An, L. (2020). scDoc: correcting drop-out events in single-cell RNA-seq data. Bioinformatics, 36(15), 4233-4239. doi:doi: 10.1093/bioinformatics/btaa283More infoSenior/corresponding author
- Sun, X., Liu, Y., & An, L. (2020). Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data. Nature Communications, 11, 5853. doi:https://doi.org/10.1038/s41467-020-19465-7More infoCorresponding author
- Klug, K. E., Jennings, C. M., Lytal, N., An, L., & Yoon, J. (2019). Mie Scattering and Microparticle-Based Characterization of Heavy Metal Ions and Classification by Statistical Inference Methods. Royal Society Open Science, 6(5), 190001. doi:https://doi.org/10.1098/rsos.190001More infoA straightforward method for classifying heavy metal ions in water is proposed using statistical classification and clustering techniques from non-specific microparticle scattering data. A set of carboxylated polystyrene microparticles of sizes 0.91, 0.75 and 0.40 µm was mixed with the solutions of nine heavy metal ions and two control cations, and scattering measurements were collected at two angles optimized for scattering from non-aggregated and aggregated particles. Classification of these observations was conducted and compared among several machine learning techniques, including linear discriminant analysis, support vector machine analysis, K-means clustering and K-medians clustering. This study found the highest classification accuracy using the linear discriminant and support vector machine analysis, each reporting high classification rates for heavy metal ions with respect to the model. This may be attributed to moderate correlation between detection angle and particle size. These classification models provide reasonable discrimination between most ion species, with the highest distinction seen for Pb(II), Cd(II), Ni(II) and Co(II), followed by Fe(II) and Fe(III), potentially due to its known sorption with carboxyl groups. The support vector machine analysis was also applied to three different mixture solutions representing leaching from pipes and mine tailings, and showed good correlation with single-species data, specifically with Pb(II) and Ni(II). With more expansive training data and further processing, this method shows promise for low-cost and portable heavy metal identification and sensing.
- Zhang, S., Wang, D., Zhang, H., Skaggs, M., Lloyd, A., Ran, D., An, L., Schumaker, K., Drews, G., & Yadegari, R. (2018). FERTILIZATION-INDEPENDENT SEED-Polycomb Repressive Complex 2 plays a dual role in regulating type I MADS-box genes in early endosperm development. Plant Physiology, 177(1), 285-299. doi:doi: 10.1104/pp.17.00534
- Zhu, L., An, L., Ran, D., Lizzarraga, R., Bondy, C., Zhou, X., Harper, R., Liao, S., & Chen, Y. (2018). The Club Cell Marker SCGB1A Downstream of FOXA1 is Reduced in Asthma. American Journal of Respiratory Cell and Molecular Biology. doi:DOI: 10.1165/rcmb.2018-0199OC
- Hernandez, E., Giacomelli, G., Lewis, M., & An, L. (2017). Evaluation of season, cultivar, and aeration on biomass production of greenhouse hydroponic lettuce. ISHS Acta Horticulturae. doi:10.17660/ActaHortic.2017.1170.77
- Luo, D., Ziebell, S., & An, L. (2017). An Informative Approach on Differential Abundance Analysis for Time-course Metagenomic Sequencing Count Data. Bioinformatics. doi:https://doi.org/10.1093/bioinformatics/btw828
- Pena, E., Wu, W., Piegrosch, W., West, R., & An, L. (2016). Model Selection and Estimation with Quantal-Response Data in Benchmark Risk Assessment. Risk Analysis. doi:10.1111/risa.12644
- Zhang, Y., Kacira, M., & An, L. (2016). A CFD study on improving air flow uniformity in indoor plant factory system. Biosystems Engineering, 147, 193-205.
- An, L. (2015). Classification, predictive modelling, and statistical analysis of cancer data.. Cancer Informatics.
- An, L. (2015). Investigating microbial co-occurrence patterns based on meta-genomic compositional data. Bioinformatics.
- Drewry, J., Choi, C., An, L., & Gharagozloo, P. (2015). A COMPUTATIONAL FLUID DYNAMICS MODEL OF ALGAL GROWTH: DEVELOPMENT AND VALIDATION. Transactions of ASABE.
- Sohn, M., Du, R., & An, L. (2015). A robust approach for identifying differentially abundant features in metagenomic samples. Bioinformatics.
- Yigiter, A., An, L. -., Chen, J., & Danacioglu, N. (2015). An on-line CNV detection method for short sequencing reads. Journal of Applied Statistics.
- An, L. -., Pookhao, N., Jiang, H., & Xu, J. (2014). A Statistical approach for profiling functionality of a microbial community. PLoS ONE, 9(9): e106588.
- Du, R., Mercante, D., An, L. -., & Fang, Z. (2014). A statistical approach to correcting cross-annotations in a metagenomic functional profile. Journal of Biometrics & Biostatistics.
- Jiang, H., An, L., Baladandayuthapani, V., & Auer, P. (2014). Classification, predictive modelling, and statistical analysis of cancer data. Cancer informatics.
- Pookhao, N., Sohn, M., Jenkins, I., Du, R., Jiang, H., & An, L. (2014). A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes. Bioinformatics.
- Sohn, M., An, L. -., Pookhao, N., & Li, Q. (2014). Accurate Genome Relative Abundance Estimation for Closely Related Species in a Metagenomic Sample. BMC Bioinformatics.
- An, L. -., Piegorsch, W. W., Wickens, A., West, W., Pena, E., & Wu, W. (2013). nformation-theoretic model-averaged benchmark dose analysis in environmental risk assessment. Environmetrics, 24, 143-157.More infoW. Piegorsch and L. An are co-first authors.
- Kadiyala, V. -., Patrick, N., Mathieu, W., Jaime-Frias, R., Pookhao, N., An, L. -., & Smith, C. L. (2013). Class I lysine deacetylases facilitate glucocorticoid-induced transcription.. Journal of Biology Chemistry, 288, 28900-12.
- Kadiyala, V., Patrick, N. M., Mathieu, W., Jaime-Frias, R., Pookhao, N., Lingling, A. n., & Smith, C. L. (2013). Class i lysine deacetylases facilitate glucocorticoid-induced transcription. Journal of Biological Chemistry, 288(40), 28900-28912.More infoPMID: 23946490;PMCID: PMC3789985;Abstract: Background:KDACis impair GR transactivation of the MMTV promoter, but their impact on cellular target genes is unknown. Results:KDACi or KDAC depletion suppresses transactivation of about 50% of GR target genes. Conclusion:KDAC1 is required for efficient GR transactivation in a gene-selective fashion. Significance:Because KDACs facilitate GR transactivation, clinical KDACi use may have a major impact on GR signaling. © 2013 by The American Society for Biochemistry and Molecular Biology, Inc.
- Tamimi, E., Kacira, M. -., Choi, C., & An, L. -. (2013). Analysis of microclimate uniformity in a naturally vented greenhouse with a high-pressure fogging system. Transactions of the ASABE, 56, 1241-1254.
- An, L., & Doerge, R. (2012). Dynamic Clustering of Gene Expression. ISRN Bioinformaitcs.More infoArticle ID 537217. Doi:10.5402/2012/53721
- Jiang, H., Lingling, A. n., Lin, S. M., Feng, G., & Qiu, Y. (2012). A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads. PLoS ONE, 7(10).More infoPMID: 23049702;PMCID: PMC3462201;Abstract: The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/~hji403/MetaR.htm. © 2012 Jiang et al.
- West, W., Piegorsch, W. W., Pena, E., An, L. -., Wu, W., Wickens, A., Xiong, H., & Chen, W. (2012). The impact of model uncertainty on benchmark dose estimation. Environmetrics, 23, 706-716.
- An, L., Niu, Y. S., Hao, N., & An, L. -. (2011). Detection of rare functional variants using group ISIS. BMC proceedings, 5 Suppl 9.More infoGenome-wide association studies have been firmly established in investigations of the associations between common genetic variants and complex traits or diseases. However, a large portion of complex traits and diseases cannot be explained well by common variants. Detecting rare functional variants becomes a trend and a necessity. Because rare variants have such a small minor allele frequency (e.g.,
- McDowell, E., Kapteyn, J., Schmidt, A., Li, C., Kang, J., Descour, A., Shi, F., Larson, M., Schilmiller, A., An, L. -., Jones, A., Pichersky, E., Soderlund, C., & David, G. (2011). Comparative functional genomic analysis of Solanum glandular trichome types. Plant physiology, 155, 524-39.
- Zeng, L., An, L., & Wu, X. (2011). Modeling Drug-Carrier Interaction in the Drug Release from Nanocarriers. Journal of Drug Delivery.More infoArticle ID: 370308
- Liu, S. S., Kim, H. T., Chen, J., & Lingling, A. n. (2010). Visualizing desirable patient healthcare experiences. Health Marketing Quarterly, 27(1), 116-130.More infoPMID: 20155554;Abstract: High healthcare cost has drawn much attention and healthcare service providers (HSPs) are expected to deliver high-quality and consistent care. Therefore, an intimate understanding of the most desirable experience from a patient's and/or family's perspective as well as effective mapping and communication of such findings should facilitate HSPs' efforts in attaining sustainable competitive advantage in an increasingly discerning environment. This study describes (a) the critical quality attributes (CQAs) of the experience desired by patients and (b) the application of two visualization tools that are relatively new to the healthcare sector, namely the "spider-web diagram" and "promotion and detraction matrix." The visualization tools are tested with primary data collected from telephone surveys of 1,800 patients who had received care during calendar year 2005 at 6 of 61 hospitals within St. Louis, Missouri-based, Ascension Health. Five CQAs were found by factor analysis. The spider-web diagram illustrates that communication and empowerment and compassionate and respectful care are the most important CQAs, and accordingly, the promotion and detraction matrix shows those attributes that have the greatest effect for creating promoters, preventing detractors, and improving consumer's likelihood to recommend the healthcare provider. © Taylor & Francis Group, LLC.
- Long, A. A., Mahapatra, C. T., A., E., Rohrbough, J., Leung, H., Shino, S., Lingling, A. n., Doerge, R. W., Metzstein, M. M., Pak, W. L., & Broadie, K. (2010). The nonsense-mediated decay pathway maintains synapse architecture and synaptic vesicle cycle efficacy. Journal of Cell Science, 123(19), 3303-3315.More infoPMID: 20826458;PMCID: PMC2939802;Abstract: A systematic Drosophila forward genetic screen for photoreceptor synaptic transmission mutants identified no-on-and-no-off transient C (nonC) based on loss of retinal synaptic responses to light stimulation. The cloned gene encodes phosphatidylinositol-3-kinase-like kinase (PIKK) Smg1, a regulatory kinase of the nonsense-mediated decay (NMD) pathway. The Smg proteins act in an mRNA quality control surveillance mechanism to selectively degrade transcripts containing premature stop codons, thereby preventing the translation of truncated proteins with dominant-negative or deleterious gain-of-function activities. At the neuromuscular junction (NMJ) synapse, an extended allelic series of Smg1 mutants show impaired structural architecture, with decreased terminal arbor size, branching and synaptic bouton number. Functionally, loss of Smg1 results in a ∼50% reduction in basal neurotransmission strength, as well as progressive transmission fatigue and greatly impaired synaptic vesicle recycling during high-frequency stimulation. Mutation of other NMD pathways genes (Upf2 and Smg6) similarly impairs neurotransmission and synaptic vesicle cycling. These findings suggest that the NMD pathway acts to regulate proper mRNA translation to safeguard synapse morphology and maintain the efficacy of synaptic function.
- Riddle, N. C., Jiang, H., Lingling, A. n., Doerge, R. W., & Birchler, J. A. (2010). Gene expression analysis at the intersection of ploidy and hybridity in maize. Theoretical and Applied Genetics, 120(2), 341-353.More infoPMID: 19657617;Abstract: Heterosis and polyploidy are two important aspects of plant evolution. To examine these issues, we conducted a global gene expression study of a maize ploidy series as well as a set of tetraploid inbred and hybrid lines. This gene expression analysis complements an earlier phenotypic study of these same materials. We find that ploidy change affects a large fraction of the genome, albeit at low levels; gene expression changes rarely exceed 2-fold and are typically not statistically significant. The most common gene expression profile we detected is greater than linear increase from monoploid to diploid, and reductions from diploid to triploid and from triploid to tetraploid, a trend that mirrors plant stature. When examining heterosis in tetraploid maize lines, we found a large fraction of the genome impacted but the majority of changes were not statistically significant at 2-fold or less. Non-additive expression was common in the hybrids, and the extent of non-additivity increased both in number and magnitude from duplex to quadruplex hybrids. Overall, we find that gene expression trends mirror observations from the phenotypic studies; however, obvious mechanistic connections remain unknown. © Springer-Verlag 2009.
- Riddle, N., Jiang, H., An, L., Doerge, R., & Birchler, J. (2009). Gene expression analysis at the intersection of ploidy and hybridity in maize. Theoretical and applied genetics, 120(2), 341-53.
- Leung, H., Tseng-Crank, J., Kim, E., Mahapatra, C., Shino, S., Zhou, Y., Lingling, A. n., Doerge, R. W., & Pak, W. L. (2008). DAG Lipase Activity Is Necessary for TRP Channel Regulation in Drosophila Photoreceptors. Neuron, 58(6), 884-896.More infoPMID: 18579079;PMCID: PMC2459341;Abstract: In Drosophila, a phospholipase C-mediated signaling cascade links photoexcitation of rhodopsin to the opening of the TRP/TRPL channels. A lipid product of the cascade, diacylglycerol (DAG) and its metabolite(s), polyunsaturated fatty acids (PUFAs), have both been proposed as potential excitatory messengers. A crucial enzyme in the understanding of this process is likely to be DAG lipase (DAGL). However, DAGLs that might fulfill this role have not been previously identified in any organism. In this work, the Drosophila DAGL gene, inaE, has been identified from mutants that are defective in photoreceptor responses to light. The inaE-encoded protein isoforms show high sequence similarity to known mammalian DAG lipases, exhibit DAG lipase activity in vitro, and are highly expressed in photoreceptors. Analyses of norpA inaE double mutants and severe inaE mutants show that normal DAGL activity is required for the generation of physiologically meaningful photoreceptor responses. © 2008 Elsevier Inc. All rights reserved.
- Long, A. A., Kim, E., Leung, H., III, E. W., Lingling, A. n., Doerge, R. W., Pak, W. L., & Broadie, K. (2008). Presynaptic calcium channel localization and calcium-dependent synaptic vesicle exocytosis regulated by the fuseless protein. Journal of Neuroscience, 28(14), 3668-3682.More infoPMID: 18385325;PMCID: PMC2769928;Abstract: A systematic forward genetic Drosophila screen for electroretinogram mutants lacking synaptic transients identified the fuseless (fusl) gene, which encodes a predicted eight-pass transmembrane protein in the presynaptic membrane. Null fusl mutants display >75% reduction in evoked synaptic transmission but, conversely, an approximately threefold increase in the frequency and amplitude of spontaneous synaptic vesicle fusion events. These neurotransmission defects are rescued by a wild-type fusl transgene targeted only to the presynaptic cell, demonstrating a strictly presynaptic requirement for Fusl function. Defects in FM dye turnover at the synapse show a severely impaired exo-endo synaptic vesicle cycling pool. Consistently, ultrastructural analyses reveal accumulated vesicles arrested in clustered and docked pools at presynaptic active zones. In the absence of Fusl, calcium-dependent neurotransmitter release is dramatically compromised and there is little enhancement of synaptic efficacy with elevated external Ca2+ concentrations. These defects are causally linked with severe loss of the Cacophony voltage-gated Ca2+ channels, which fail to localize normally at presynaptic active zone domains in the absence of Fusl. These data indicate that Fusl regulates assembly of the presynaptic active zone Ca 2+ channel domains required for efficient coupling of the Ca 2+ influx and synaptic vesicle exocytosis during neurotransmission. Copyright © 2008 Society for Neuroscience.
- Zhao, J., Wang, J., Lingling, A. n., Doerge, R. W., Chen, Z. J., Grau, C. R., Meng, J., & Osborn, T. C. (2007). Analysis of gene expression profiles in response to Sclerotinia sclerotiorum in Brassica napus. Planta, 227(1), 13-24.More infoPMID: 17665211;Abstract: Sclerotinia sclerotiorum is a necrotrophic plant pathogen which causes serious disease in agronomically important crop species. The molecular basis of plant defense to this pathogen is poorly understood. We investigated gene expression changes associated with S. sclerotiorum infection in a partially resistant and a susceptible genotype of oilseed Brassica napus using a whole genome microarray from Arabidopsis. A total of 686 and 1,547 genes were found to be differentially expressed after infection in the resistant and susceptible genotypes, respectively. The number of differentially expressed genes increased over infection time with the majority being up-regulated in both genotypes. The putative functions of the differentially expressed genes included pathogenesis-related (PR) proteins, proteins involved in the oxidative burst, protein kinase, molecule transporters, cell maintenance and development, abiotic stress, as well as proteins with unknown functions. The gene regulation patterns indicated that a large part of the defense response exhibited as a temporal and quantitative difference between the two genotypes. Genes associated with jasmonic acid (JA) and ethylene signal transduction pathways were induced, but no salicylic acid (SA) responsive genes were identified. Candidate defense genes were identified by integration of the early response genes in the partially resistant line with previously mapped quantitative trait loci (QTL). Expression levels of these genes were verified by Northern blot analyses. These results indicate that genes encoding various proteins involved in diverse roles, particularly WRKY transcription factors and plant cell wall related proteins may play an important role in the defense response to S. sclerotiorum disease. © 2007 Springer-Verlag.
Presentations
- An, L. (2023, June). Accurate estimation in microbial source tracking. Western North American Region of International of Biometirc Society. Anchorage, Alaska.
- An, L. (2023, March). Practical Challenges for Single-cell Multi-omics Analysis. MCB bioinformatics seminar, UA.
- An, L. (2022).
. WNAR. Virtual.Practical Challenges for Single-Cell Multi-omics Analysis
- An, L., Bhadani, R., & Chen, Z. (2022, Aug).
. Joint Statistical Meetings. Washington DC.Attention-based Graph Neural Network for Label Transfer in Single Cell Multiomics Data
- An, L. (2021, Aug). Information Based Mediation Analysis on High-dimensional Metagenomic Data. Joint Statistical Meeting. Virtual: American Statistical Association.
- Zhang, X., Cao, S., Lu, M., & An, L. (2020, Aug). NISC: Accurate clustering through neural network-imputation for single-cell RNA-seq data with high sparsity. Joint Statistical Meetings. Online: American Statistical Association.More infoSenior/corresponding author
- An, L. (2019, July). Accurate correction on dropout events in single-cell RNA-seq data. Joint Statistical Meetings. Denver, CO: American Statistical Association.
- An, L. (2019, June). Accurate Trace Evidence Using Regression Approaches in Microbial Forensic Studies. 3rd International Conference on Econometrics and Statistics. Taiwan: National Chung Hsing University.
- An, L. (2019, March). Mediation Analysis on High-dimensionalMicrobiome and Host Genome Data. Eastern North American Region (ENAR) Spring Meeting of International of Biometrics of Society. Philadelphia, PA: American Statistician Association.
- An, L. (2019, Nov). A Neural network based imputation method for single-cell RNA-seq data with high sparsity. TRIPODS (Transdisciplinary Research in the Principles of Data Science) seminar. U of A: Math department.
- An, L., & Lu, W. (2018, March). A Novel Approach on Differential Abundance Analysis for Matched Metagenomic Samples. ENAR (Eastern North American Region of International Biometric Society) meeting. Atlanta, GA.
- An, L., Carter, K., & Lu, M. (2018, July). Nonparametric Mediation Analysis for investigating the Role of Microbiota in Human Health. Joint Statistical Meeting. Vancouver, Canada.
- An, L., Lytal, N., & Ran, D. (2018, July). Single-Cell RNA Sequencing: Dropout Imputation and Normalization with Spike-in Genes. Joint Statistical Meeting. Vancouver, Canada.
- An, L., Ran, D., Zhang, S., & Lytal, N. (2018, July). Single-cell gene set analysis with applications in tumor heterogeneity. Joint Statistical Meeting. Vancouver, Canada: American Statistical Association.
- An, L. (2017, Aug). Differential Abundance Analysis on Longitudianal Metagenomic Count Data. Joint Statistical Meetings.
- An, L. (2017, Feb). Trace evidence in forensic study using microbial source tracking. American Academy of Forensic Sciences annual meeting.
- An, L. (2017, June). Mediation Analysis of High Dimensional Microbiome and Host Genome Data. Applied Statistics Symposium of International Chinese Statistical Association.
- An, L. (2017, march). Differential abundance analysis on longitudinal metagenomic count data. ENAR annual meeting.
- An, L. (2016, Aug). A Robust Approach to Identifying Differentially Abundant Features in Metagenomic Samples. Joint Statistical Meetings. Chicago.
- An, L. (2016, June). Informative Approach on Differential Analysis for Time-course Metagenomic Sequencing Data. ICSA Applied Statistics Symposium. Atlanta.
- An, L. (2016, March). A Novel Normalization Method for Time Series Metagenomic Count Data. Eastern North American Region (ENAR) meeeint, International Biometric Society. Austin, TX.
- An, L. (2015, Aug). A Robust Approach For Identifying Differentially Abundant Features In Metagenomic Samples. NSF workshop. Arlington, VA: NSF.
- An, L. (2014, April). A Two-Stage Statistical Procedure For Feature Selection And Feature Comparison In Functional Analysis of Metagenomes. Bioinformatics Seminar, Purdue University. W. Lafayette: Dept. Statistics.
- An, L. (2014, April). A Two-Stage Statistical Procedure For Feature Selection And Feature Comparison In Functional Analysis of Metagenomes. Statistics Seminar, Northwestern University. Evanston, IL: Dept. of Statistics.
- An, L., Sohn, M., Pookhao, D., & Li, Q. (2014, Aug). . Accurate Estimation of Genome Relative Abundance for Closely Related Species in a Metagenomic Sample. Joint Statistical meeting. Boston.
- Du, R., & An, L. (2014, Aug). A New Normalization Method on Metagenomic Sequencing Data. Joint Statistical meeting.
- An, L. -. (2013, April). Introduction to metagenomics. Seminar of Next-generation Sequencing (SONGS). Tucson: GIDP in Statistics.
- An, L. -. (2013, April). Statistical methods on functional metagenomics. Seminar at Department of Math & Statistics (UMKC). Kansas City: University of Missouri at Kansas City.
- An, L. -. (2013, June). Statistical methods on functional metagenomics. The International Biometric Society (IBS) - WNAR annual meeting. Los Angles: IBS Western North American Region.
- An, L. -. (2013, March). Statistical Methods for Functional Metagenomic Analysis Based on Next Generation Sequencing Data. The International Biometric Society (IBS) - ENAR annual meeting. Orlando, FL: IBS Eastern North American Region.
- Pena, E., Wu, W., Piegorsch, W. W., West, W., & An, L. -. (2013, March). Model Selection and BMD stimation with Quantal-Response Data. The International Biometric Society (IBS) - ENAR Spring meeting. Orlando, FL: IBS Eastern North American Region.
- An, L. -. (2012, February). Statistical and Computational Challenges for Metagenomics Analysis Based on Next-generation Sequencing Data. Quantitative Biology Colloquium. Tucson, AZ: Math Dept, UA.
- Jiang, H., An, L., Lin, S., Feng, G., & Qiu, Y. (2011, June). Estimating relative abundance of multiple genomes in a metagenomic sample. Statistical Methods for Very Large Datasets Conference. Baltimore, Maryland.
Poster Presentations
- Ran, D., & An, L. (2018, March). Statistical Method for Gene Set Analysis of Single-Cell RNA-seq Data. ENAR meeting. Atlanta, GA.
- An, L. (2016, July). A Powerful Approach In Differential Analysis For Time Series Microbial Studies. International Biometric Conference. Victoria, Canada.
- An, L. (2016, Nov). A Powerful Approach in Differential Abundance Analysis For Time Course Microbial Studies. International Human Microbiome Consortium Congress 2016. Huston, TX.
- An, L. (2015, March). A Robust Approach For Identifying Differentially Abundant Features In Metagenomic Samples. International Human Microbiome Congress. Luxenbourg.
- An, L. (2015, March). Investigating microbial co-occurrence patterns based on metagenomics compositional data.. International Human Microbiome Congress. Luxembourg.
- An, L., Sohn, M., Pookhao, N., & Li, Q. (2014, March). Accurate genome relative abundance estimation for closely related species in a metagenomic sample. Algorithms for Threat Detection Workshop (NSF). Boulder, CO: NSF.
- An, L., & Jiang, H. (2012, March). Statistical Identification of Multiple Genomes in a Metagenomic Sample. IBE 2012 Annual Conference. Indianapolis, IN.
- An, L., & Jiang, H. (2012, May). Statistical Identification of Multiple Genomes in a Metagenomic Sample. New Statistical Methods for Next Generation Sequencing Data Analysis. Demoise, IW.
- An, L., & Jiang, H. (2012, November). TAMER: an R package for accurate taxonomic assignment of metagenomic sequencing reads. Algorithms for Threat Detection. San Diego, CA.
- Zhang, S., Wang, D., Zhang, H., Lloyd, A., Skaggs, M. I., An, L., Drews, G. N., Schumaker, K. S., & Yadegari, R. (2012, July). Polycomb repressive complex 2 regulates type I MADS-box gene expression during endosperm development in Arabidopsis. Plant Biology 2012. Austin, TX.More infoAlso an oral presentation by Shanshan Zhang in a minisymposium.