今回は産業技術総合研究所 生命情報工学研究センター・David duVerle先生を講師としてお迎えし、「高次元データにおける遺伝子の組み合わせ相互作用の発見」について講演していただきます。
日時：平成26年2月28 日(金) 17：00‐18：30
演題：Discovering Combinatorial Gene Interactions in High-Dimensional Data
講師：Dr. David duVerle（産業技術総合研究所 生命情報工学研究センター）
・要旨：In the past decade or so, new technologies in biotech have meant an explosion in the availability of high-dimensional genomic data (microarrays, SNP data, RNA-Seq…): their dimension and noise levels making it necessary to rely on machine learning techniques and statistical models to extract meaningful signal and narrow down the field for further experimental research. In this presentation, I will try to give a very general overview of some of the statistical techniques commonly used to treat high-dimensional data, as well as a more detailed illustration of the technique we developed to identify combinatorial interaction effects in such data.
A crucial aspect of machine-learning when dealing with high-dimensional data, is the concept of sparsity: how much of the input’s variables find their way in the model. By using regularisation techniques (the addition of a tailored penalisation component), it is possible to ensure certain properties of the statistical model (size, elimination of collinear variables…). Another, is the fitting of complex statistical models that cannot be solved analytically, usually requiring optimising a non-linear objective function (e.g to maximise likelihood or minimise empirical error). While relatively simple in application, both techniques require some understanding of the underlying statistical assumption and information theoretic implications, in order to obtain satisfying results.
After giving a brief overview of regularisation techniques and their use in typical regression problems encountered in bioinformatics, I will introduce our recent work, which combines them with data-mining (itemset mining) and fractional programming techniques to fit complex statistical models over (non-linear) combinations of heterogeneous input variables, allowing for example to identify sets of genes (up- or down-regulated) that drive complex phenotypes or clinical observations.
This work was in particular successfully applied to a combination of cDNA microarray and gene mutation copy number data paired with (right-censored) survival data, to identify interactions (and potential synthetic lethals) playing a role in neuroblastoma and breast-cancer.
Copyright(C) Tohoku University Tohoku Medical Megabank Organization All Rights Reserved.