Detecting novel associations in large data sets
szh123 添加于 2012-3-19 22:52
| 2328 次阅读 | 0 个评论
作 者
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC
摘 要
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. -
详细资料
- 关键词: Algorithms; Animals; Baseball/statistics & numerical data; *Data Interpretation, Statistical; Female; Gene Expression; Genes, Fungal; Genomics/methods; Humans; Intestines/microbiology; Male; Metagenome; Mice; Obesity; Saccharomyces cerevisiae/genetics
- 文献种类: Journal Article
- 期刊名称: Science (New York, N.Y.)
- 期刊缩写: Science
- 期卷页: 2011年 第334卷 第6062期 1518-1524页
- 地址: Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. dnreshef@mit.edu
- ISBN: 0036-8075
-
评论( 人)