TY - JOUR
T1 - Computational pan-genomics
T2 - Status, promises and challenges
AU - The Computational Pan-Genomics Consortium
AU - Marschall, Tobias
AU - Marz, Manja
AU - Abeel, Thomas
AU - Dijkstra, Louis
AU - Dutilh, Bas E.
AU - Ghaffaari, Ali
AU - Kersey, Paul
AU - Kloosterman, Wigard P.
AU - Mäkinen, Veli
AU - Novak, Adam M.
AU - Paten, Benedict
AU - Porubsky, David
AU - Rivals, Eric
AU - Alkan, Can
AU - Baaijens, Jasmijn A.
AU - De Bakker, Paul I.W.
AU - Boeva, Valentina
AU - Bonnal, Raoul J.P.
AU - Chiaromonte, Francesca
AU - Chikhi, Rayan
AU - Ciccarelli, Francesca D.
AU - Cijvat, Robin
AU - Datema, Erwin
AU - Van Duijn, Cornelia M.
AU - Eichler, Evan E.
AU - Ernst, Corinna
AU - Eskin, Eleazar
AU - Garrison, Erik
AU - El-Kebir, Mohammed
AU - Klau, Gunnar W.
AU - Korbel, Jan O.
AU - Lameijer, Eric Wubbo
AU - Langmead, Benjamin
AU - Martin, Marcel
AU - Medvedev, Paul
AU - Mu, John C.
AU - Neerincx, Pieter
AU - Ouwens, Klaasjan
AU - Peterlongo, Pierre
AU - Pisanti, Nadia
AU - Rahmann, Sven
AU - Raphael, Ben
AU - Reinert, Knut
AU - de Ridder, Dick
AU - de Ridder, Jeroen
AU - Schlesner, Matthias
AU - Schulz-Trieglaff, Ole
AU - Sanders, Ashley D.
AU - Sheikhizadeh, Siavash
AU - Shneider, Carl
N1 - Publisher Copyright:
© The Author 2016.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.
AB - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.
KW - Data structures
KW - Haplotypes
KW - Pan-genome
KW - Read mapping
KW - Sequence graph
UR - http://www.scopus.com/inward/record.url?scp=85041170263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041170263&partnerID=8YFLogxK
U2 - 10.1093/bib/bbw089
DO - 10.1093/bib/bbw089
M3 - Article
C2 - 27769991
AN - SCOPUS:85041170263
SN - 1467-5463
VL - 19
SP - 118
EP - 135
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbw089
ER -