TY - JOUR
T1 - GenomeVIP
T2 - A cloud platform for genomic variant discovery and interpretation
AU - Mashl, R. Jay
AU - Scott, Adam D.
AU - Huang, Kuan Lin
AU - Wyczalkowski, Matthew A.
AU - Yoon, Christopher J.
AU - Niu, Beifang
AU - DeNardo, Erin
AU - Yellapantula, Venkata D.
AU - Handsaker, Robert E.
AU - Chen, Ken
AU - Koboldt, Daniel C.
AU - Ye, Kai
AU - Fenyö, David
AU - Raphael, Benjamin J.
AU - Wendl, Michael C.
AU - Ding, Li
N1 - Publisher Copyright:
© 2017 2017 Mashl et al.
PY - 2017/8
Y1 - 2017/8
N2 - Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional "download and analyze" paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.
AB - Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional "download and analyze" paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.
UR - http://www.scopus.com/inward/record.url?scp=85026655131&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026655131&partnerID=8YFLogxK
U2 - 10.1101/gr.211656.116
DO - 10.1101/gr.211656.116
M3 - Article
C2 - 28522612
AN - SCOPUS:85026655131
SN - 1088-9051
VL - 27
SP - 1450
EP - 1459
JO - Genome Research
JF - Genome Research
IS - 8
ER -