Set-valued system identification approach to identifying genetic variants in sequencing studies----中国科学院系统控制重点实验室

学术报告

来源： 时间：2015-06-26 《打印》

Set-valued system identification approach to identifying genetic variants in sequencing studies

Title：Set-valued system identification approach to identifying genetic variants in sequencing studies

Speaker : Dr. Guolian Kang ( St. Jude Children's Research Hospital)

Time and Vuene: Jun 25, 9:00- 10:00, N514

Abstract：

For genetic association studies that involve a binary or an ordered categorical phenotype, the standard logistic regression (LG) or ordered LG (oLG) model is commonly used to identify genetic associations. However, these approaches can lose power or cannot control type I error rate if the phenotype is derived from dichotomizing/categorizing a continuous phenotype following a normal distribution or from complicated unobservable or unobserved continuous variables or if the genetic mutations are rare. We propose a set-valued (SV) system model, which is a generalized form of LG, oLG, Probit (Probit) regression, or ordered Probit (oPRB) regression, to be considered as a method for discovering genetic variants, especially rare genetic variants in next generation sequencing studies. We propose a new set-valued system identification (SVSI) method to estimate all the underlying key system parameters for the SV model and compare it with LG in the setting of genetic association studies for a binary phenotype and with LG (a regrouped phenotype), oLG and oPRB for an ordered categorical phenotype.

For a binary phenotype, simulations showed that the SV method maintained Type I error control and had similar or greater power than the LG method which is robust to different distributions of noise: logistic, normal or t distributions. Additionally, the SV association parameter estimate was 2.7–46.8 fold less variable than the LG log-odds ratio association parameter estimate. Less variability in the association parameter estimate translates to greater power and robustness across the spectrum of minor allele frequencies (MAFs), and these advantages are the most pronounced for rare variants. For instance, in a simulation that generated data from an additive logistic model with odds ratio of 7.4 for a rare single nucleotide polymorphism with a MAF of 0.005 and a sample size of 2300, the SV method had 60% power whereas the LG method had 25% power at the α=10^-6 level. Consistent with these simulation results, the set of variants identified by the LG method was a subset of those identified by the SV method in two example analyses. For an ordered categorical phenotype, simulations and two examples showed that SV and LG accurately controlled the Type I error rate even at a significance level of 10^-6 but not oLG and oPRB in some cases. LG had significantly smaller power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. Thus, we recommend that the SV model with SVSI be used in SNP-based genetic association studies, especially for detecting rare variants or given a small sample size such as for some rare pediatric cancer genomics projects.

CV:

2007-2009： Postdoctoral Fellow, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama

2009-2011： Postdoctoral Researcher, Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania
2011-present ： Assistant Member, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee

HONORS AND AWARDS:
2003-2006 Research Assistant Fellowship, Chinese Academy of Science (China)
2008 Spring 2008 Career Enhancement Award, The University of Alabama at Birmingham (USA)
2008 Travel fellowship from International Genetic epidemiology Society (USA)
2009 The Science Unbound Foundation Best Paper Award in Statistical Genetics Research (USA)
2010 Genetic Analysis workshop 17 travel award (USA)
2012 The Science Unbound Foundation Best Paper Award in General Statistics
Research (USA)
2014 Travel awards from the 3rd Workshop on Biostatistics and Bioinformatics held at
Georgia State University (USA)
2014 2014 National Institute of General Medical Sciences (NIGMS) Bursary Award
2014 Travel fellowship from the 4th NIGMS-funded Short Course on Statistical Genetics

RESEARCH INTERESTS:
Statistical Genetics/Genomics—statistical methods for studying susceptibility genes for complex traits. Specific topics include: study designs, association analysis, multiple testing/false discovery rate, gene- or region-based association tests, gene-gene interactions, combining information from linkage and association, population stratification/substructure, admixture mapping, pathway analysis, copy number variation (CNV) analysis, next generation sequencing (NGS) data analysis, gene-environment (G-E) interaction analysis.
Systems Biology/Genetic Genomics—eQTL mapping and gene network inference by incorporating genetic marker information into gene expression data via information theoretic approaches; Large-scale integrative analysis of SNP, gene expression and phenotype data.
Bioinformatics/Modeling for Complex Data—microarray data analysis, NGS data analysis, ChiP-seq and RNA-seq data analysis, machine learning and its application to biology and medicine.
Clinical trials—design and analyses of phase I, II and III clinical trials of children with hematological diseases and Bone Marrow Transplantation and Cellular Therapy, validation experiments and clinical trials in large-scale genomic studies.

附件

相关文档