- Associate Professor, Mathematics
- Associate Professor, Statistics-GIDP
- Ph.D. Operations Research and Financial Engineering
- Princeton University, Princeton, New Jersey, United States
- NSF-AWM Travel Grant
- National Science Foundation and Association for Women in Mathematics, Spring 2013
No activities entered.
Intro to Statistical ComputingDATA 375 (Spring 2021)
Intro to Statistical ComputingDATA 375 (Fall 2020)
Theory of StatisticsMATH 466 (Spring 2019)
Theory of ProbabilityMATH 564 (Fall 2018)
Theory of ProbabilitySTAT 564 (Fall 2018)
Theory of StatisticsMATH 466 (Spring 2018)
Theory of ProbabilityMATH 564 (Fall 2017)
Theory of ProbabilitySTAT 564 (Fall 2017)
Theory of StatisticsMATH 466 (Spring 2017)
Theory of ProbabilityMATH 564 (Fall 2016)
Theory of ProbabilitySTAT 564 (Fall 2016)
Theory of StatisticsMATH 466 (Fall 2016)
ThesisSTAT 910 (Fall 2016)
Theory of StatisticsMATH 466 (Spring 2016)
- Hao, N., Niu, Y., Xiao, F., & Zhang, H. (2020). A super scalable algorithm for segment detection. Statistics in Biosciences.
- Xiao, F., Luo, X., Hao, N., Niu, Y., Xiao, X., Cai, G., Amos, C. I., & Zhang, H. (2019). An Accurate and Powerful Method for Copy Number Variation Detection. Bioinformatics, 35(17), 2891-2898. doi:https://doi.org/10.1093/bioinformatics/bty1041
- Niu, Y., Hao, N., & Dong, B. (2018). A New Reduced-Rank Linear Discriminant Analysis Method and Its Applications. Statistica Sinica, 28, 189-202. doi:https://doi.org/10.5705/ss.202015.0387
- Niu, Y., Hao, N., & Zhang, H. (2018). Interaction Screening by Partial Correlation. Statistics and Its Interface, 11(2), 317-325. doi:http://dx.doi.org/10.4310/SII.2018.v11.n2.a9
- Niu, Y., Suh, J. H., Wang, Z., Gmitter Jr., F., & Wang, Y. (2018). Metabolic analysis reveals altered long-chain fatty acid metabolism in the host by Huanglongbing disease. Journal of Agricultural and Food Chemistry, 66, 1296-1304. doi:10.1021/acs.jafc.7b05273
- Suh, J. H., Niu, Y. S., Hung, W., Ho, C., & Wang, Y. (2017). Lipidomic analysis for carbonyl species derived from fish oil using liquid chromatography-tandem mass spectrometry. Talanta, 168, 31-42. doi:http://dx.doi.org/10.1016/j.talanta.2017.03.023
- Xiao, F., Niu, Y., Hao, N., Xu, Y., Jin, Z., & Zhang, H. (2017). modSaRa: a computationally efficient R package for CNV identification. Bioinformatics.
- Zhang, H., Niu, Y., Hao, N., Hao, N., Zhang, H., & Niu, Y. (2016). Multiple Change-Point Detection, a Selective Overview. Statistical Science, 31(4), 611-623.
- Hao, N., Niu, Y. S., & Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm. Statistica Sinica, 23(4), 1553-1572.More infoAbstract: Let Y1; Yn be a sequence whose underlying mean is a step function with an unknown number of the steps and unknown change points. The detection of the change points, namely the positions where the mean changes, is an important problem in such fields as engineering, economics, climatology and bioscience. This problem has attracted a lot of attention in statistics, and a variety of solutions have been proposed and implemented. However, there is scant literature on the theoretical properties of those algorithms. Here, we investigate a recently developed algorithm called the Screening and Ranking algorithm (SaRa). We characterize the theoretical properties of SaRa and show its superiority over other commonly used algorithms. In particular, we develop a false discovery rate approach to the multiple change-point problem and show a strong sure coverage property for the SaRa.
- Niu, Y., Hao, N., & Zhang, H. (2016). Multiple Change-Point Detection, a Selective Overview. Statistical Science, 31(4), 611-623.
- Niu, Y. S., & Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Annals of Applied Statistics, 6(3), 1306-1326.More infoAbstract: DNA Copy number variation (CNV) has recently gained considerable interest as a source of genetic variation that likely influences phenotypic differences. Many statistical and computational methods have been proposed and applied to detect CNVs based on data that generated by genome analysis platforms. However, most algorithms are computationally intensive with complexity at least O(n 2), where n is the number of probes in the experiments. Moreover, the theoretical properties of those existing methods are not well understood. A faster and better characterized algorithm is desirable for the ultra high throughput data. In this study, we propose the Screening and Ranking algorithm (SaRa) which can detect CNVs fast and accurately with complexity down to O(n). In addition, we characterize theoretical properties and present numerical analysis for our algorithm. © Institute of Mathematical Statistics 2012.
- Bailey-Wilson, J. E., Brennan, J. S., Bull, S. B., Culverhouse, R., Kim, Y., Jiang, Y., Jung, J., Qing, L. i., Lamina, C., Liu, Y., Mägi, R., Niu, Y. S., Simpson, C. L., Wang, L., Yilmaz, Y. E., Zhang, H., & Zhang, Z. (2011). Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data. Genetic Epidemiology, 35(SUPPL. 1), S92-S100.More infoPMID: 22128066;PMCID: PMC3360949;Abstract: Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors. © 2011 Wiley Periodicals, Inc.
- Fan, J., Feng, Y., & Niu, Y. S. (2010). Nonparametric estimation of genewise variance for microarray data. Annals of Statistics, 38(5), 2723-2750.More infoAbstract: Estimation of genewise variance arises from two important applications in microarray data analysis: selecting significantly differentially expressed genes and validation tests for normalization of microarray data. We approach the problem by introducing a two-way nonparametric model, which is an extension of the famous Neyman-Scott model and is applicable beyond microarray data. The problem itself poses interesting challenges because thenumber of nuisance parameters is proportional to the sample size and it is not obvious how the variance function can be estimated when measurements are correlated. In such a high-dimensional nonparametric problem, we proposed two novel nonparametric estimators for genewise variance function and semiparametric estimators for measurement correlation, via solving a system of nonlinear equations. Their asymptotic normality is established. The finite sample property is demonstrated by simulation studies. The estimators also improve the power of the tests for detecting statistically differentially expressed genes. The methodology is illustrated by the data from microarray quality control (MAQC) project. © Institute of Mathematical Statistics, 2010.
- Fan, J., & Niu, Y. (2007). Selection and validation of normalization methods for c-DNA microarrays using within-array replications. Bioinformatics, 23(18), 2391-2398.More infoPMID: 17660210;Abstract: Motivation: Normalization of microarray data is essential for multiple-array analyses. Several normalization protocols have been proposed based on different biological or statistical assumptions. A fundamental problem arises whether they have effectively normalized arrays. In addition, for a given array, the question arises how to choose a method to most effectively normalize the microarray data. Results: We propose several techniques to compare the effectiveness of different normalization methods. We approach the problem by constructing statistics to test whether there are any systematic biases in the expression profiles among duplicated spots within an array. The test statistics involve estimating the genewise variances. This is accomplished by using several novel methods, including empirical Bayes methods for moderating the genewise variances and the smoothing methods for aggregating variance information. P-values are estimated based on a normal or χ approximation. With estimated P-values, we can choose a most appropriate method to normalize a specific array and assess the extent to which the systematic biases due to the variations of experimental conditions have been removed. The effectiveness and validity of the proposed methods are convincingly illustrated by a carefully designed simulation study. The method is further illustrated by an application to human placenta cDNAs comprising a large number of clones with replications, a customized microarray experiment carrying just a few hundred genes on the study of the molecular roles of Interferons on tumor, and the Agilent microarrays carrying tens of thousands of total RNA samples in the MAQC project on the study of reproducibility, sensitivity and specificity of the data. © The Author 2007. Published by Oxford University Press. All rights reserved.
- Niu, Y. (2019, July). Variance estimation for change-point model. 2019 ICSA China Conference. Tianjin, China.
- Niu, Y. (2019, June). Variance estimation for change-point model. Ecosta 2019. Taiwan, China.
- Niu, Y. (2019, March). A super scalable algorithm for short segment detection. Statistics and Data Science Seminar. Chicago, IL: Dept of Mathematics, Statistics and Computer Science, UIC.
- Niu, Y. (2019, May). A Super Scalable Algorithm for Short Segment Detection. 2019 Hangzhou International Conference on Frontiers of Data Science. Hangzhou, China.
- Niu, Y. (2018, July). A super scalable algorithm for short segment detection. 2018 ICSA China Conference with the Focus on Data Science. Qingdao, China: ICSA.
- Niu, Y. (2018, July). Reduced-Rank Linear Discriminant Analysis. MJU First International workshop on data science. Minjiang University, Fuzhou, China: Minjiang University.
- Niu, Y. (2018, June). A super scalable algorithm for short segment detection. EcoSta 2018. Hong Kong, China.
- Niu, Y. (2018, June). A super scalable algorithm for short segment detection. Statistics Colloquium. Nankai University, Tianjin, China.: Institute of Statistics.
- Niu, Y. (2018, June). Reduced-Rank Linear Discriminant Analysis. Statistics Colloquium. Shanghai University of Finance and Economics, Shanghai, China.: School of Statistics and Management.
- Niu, Y. (2016, June). Reduced-Rank Linear Discriminant Analysis. 2016 ICSA Applied Statistics Symposium. Atlanta, GA: ICSA.
- Niu, Y. (2015, July). Reduced Ranked Linear Discriminant Analysis. Invited seminar talk. Tianjin, China: Nankai University, Institute of Statistics.
- Niu, Y. (2015, June). Screening Interaction Effects in Quadratic Regression Model. ISBS/DIA Symposium on Biopharmaceutical Statistics. Beijing, China: The International Society for Biopharmaceutical Statistics (ISBS).
- Niu, Y. (2015, May). Reduced Ranked Linear Discriminant Analysis. Invited seminar talk. Statistics Department: Oregon State University.
- Xiao, F., Luo, X., Hao, N., Niu, Y., Xiao, X., Cai, G., Amos, C. I., & Zhang, H. (2018. R Package: modSaRa2. https://publichealth.yale.edu/c2s2/software/modSaRa2/.