TY - JOUR
T1 - An effective statistical evaluation of chipseq dataset similarity
AU - Chikina, Maria D.
AU - Troyanskaya, Olga G.
N1 - Funding Information:
Funding: This research was supported by NSF CAREER award DBI-0546275, NIH grant R01 GM071966, NIH grant R01 HG005998, and partially supported by NIGMS Center of Excellence grant P50 GM071508 and NIH grant T32 HG003284.
PY - 2012/3
Y1 - 2012/3
N2 - Motivation: ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation. Results: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly.
AB - Motivation: ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation. Results: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly.
UR - http://www.scopus.com/inward/record.url?scp=84857822152&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857822152&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bts009
DO - 10.1093/bioinformatics/bts009
M3 - Article
C2 - 22262674
AN - SCOPUS:84857822152
SN - 1367-4803
VL - 28
SP - 607
EP - 613
JO - Bioinformatics
JF - Bioinformatics
IS - 5
M1 - bts009
ER -