High-throughput quantitative DNA sequencing enables the parallel phenotyping of pools of thousands of mutants. However, the appropriate analytical methods and experimental design that maximize the efficiency of these methods while maintaining statistical power are currently unknown. Here, we have used Bar-seq analysis of the Saccharomyces cerevisiae yeast deletion library to systematically test the effect of experimental design parameters and sequence read depth on experimental results. We present computational methods that efficiently and accurately estimate effect sizes and their statistical significance by adapting existing methods for RNA-seq analysis. Using simulated variation of experimental designs, we found that biological replicates are critical for statistical analysis of Bar-seq data, whereas technical replicates are of less value. By subsampling sequence reads, we found that when using four-fold biological replication, 6 million reads per condition achieved 96% power to detect a two-fold change (or more) at a 5% false discovery rate. Our guidelines for experimental design and computational analysis enables the study of the yeast deletion collection in up to 30 different conditions in a single sequencing lane. These findings are relevant to a variety of pooled genetic screening methods that use high-throughput quantitative DNA sequencing, including Tn-seq.
All Science Journal Classification (ASJC) codes
- Molecular Biology