A machine-compiled database of genome-wide association studies

Volodymyr Kuleshov, Jialin Ding, Christopher Vo, Braden Hancock, Alexander Ratner, Yang Li, Christopher Ré, Serafim Batzoglou, Michael Snyder

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60–80% and with an estimated precision of 78–94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.

Original languageEnglish (US)
Article number3341
JournalNature communications
Volume10
Issue number1
DOIs
StatePublished - Dec 1 2019
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Chemistry
  • General Biochemistry, Genetics and Molecular Biology
  • General
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'A machine-compiled database of genome-wide association studies'. Together they form a unique fingerprint.

Cite this