For genetic association studies that involve a binary or an ordered categorical phenotype, the standard logistic regression (LG) or ordered LG (oLG) model is commonly used to identify genetic associations. However, these approaches can lose power or cannot control type I error rate if the phenotype is derived from dichotomizing/categorizing a continuous phenotype following a normal distribution or from complicated unobservable or unobserved continuous variables or if the genetic mutations are rare. We propose a set-valued (SV) system model, which is a generalized form of LG, oLG, Probit (Probit) regression, or ordered Probit (oPRB) regression, to be considered as a method for discovering genetic variants, especially rare genetic variants in next generation sequencing studies. We propose a new set-valued system identification (SVSI) method to estimate all the underlying key system parameters for the SV model and compare it with LG in the setting of genetic association studies for a binary phenotype and with LG (a regrouped phenotype), oLG and oPRB for an ordered categorical phenotype.
2007-2009: Postdoctoral Fellow, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
HONORS AND AWARDS:
2003-2006 Research Assistant Fellowship, Chinese Academy of Science (China)
2008 Spring 2008 Career Enhancement Award, The University of Alabama at Birmingham (USA)
2008 Travel fellowship from International Genetic epidemiology Society (USA)
2009 The Science Unbound Foundation Best Paper Award in Statistical Genetics Research (USA)
2010 Genetic Analysis workshop 17 travel award (USA)
2012 The Science Unbound Foundation Best Paper Award in General Statistics
Research (USA)
2014 Travel awards from the 3rd Workshop on Biostatistics and Bioinformatics held at
Georgia State University (USA)
2014 2014 National Institute of General Medical Sciences (NIGMS) Bursary Award
2014 Travel fellowship from the 4th NIGMS-funded Short Course on Statistical Genetics
RESEARCH INTERESTS:
Statistical Genetics/Genomics—statistical methods for studying susceptibility genes for complex traits. Specific topics include: study designs, association analysis, multiple testing/false discovery rate, gene- or region-based association tests, gene-gene interactions, combining information from linkage and association, population stratification/substructure, admixture mapping, pathway analysis, copy number variation (CNV) analysis, next generation sequencing (NGS) data analysis, gene-environment (G-E) interaction analysis.
Systems Biology/Genetic Genomics—eQTL mapping and gene network inference by incorporating genetic marker information into gene expression data via information theoretic approaches; Large-scale integrative analysis of SNP, gene expression and phenotype data.
Bioinformatics/Modeling for Complex Data—microarray data analysis, NGS data analysis, ChiP-seq and RNA-seq data analysis, machine learning and its application to biology and medicine.
Clinical trials—design and analyses of phase I, II and III clinical trials of children with hematological diseases and Bone Marrow Transplantation and Cellular Therapy, validation experiments and clinical trials in large-scale genomic studies.