Cell-lineage-specific transcripts are essential for differentiated tissue function, implicated in hereditary organ failure, and mediate acquired chronic diseases. However, experimental identification of cell-lineage-specific genes in a genome-scale manner is infeasible for most solid human tissues. We developed the first genome-scale method to identify genes with celllineage-specific expression, even in lineages not separable by experimental microdissection. Our machine-learning-based approach leverages high-throughput data from tissue homogenates in a novel iterative statistical framework. We applied this method to chronic kidney disease and identified transcripts specific to podocytes, key cells in the glomerular filter responsible for hereditary and most acquired glomerular kidney disease. In a systematic evaluation of our predictions by immunohistochemistry, our in silico approach was significantly more accurate (65% accuracy in human) than predictions based on direct measurement of in vivo fluorescence-tagged murine podocytes (23%). Our method identified genes implicated as causal in hereditary glomerular disease and involved in molecular pathways of acquired and chronic renal diseases. Furthermore, based on expression analysis of human kidney disease biopsies, we demonstrated that expression of the podocyte genes identified by our approach is significantly related to the degree of renal impairment in patients. Our approach is broadly applicable to define lineage specificity in both cell physiology and human disease contexts. We provide a user-friendly website that enables researchers to apply this method to any cell-lineage or tissue of interest. Identified cell-lineage-specific transcripts are expected to play essential tissue-specific roles in organogenesis and disease and can provide starting points for the development of organ-specific diagnostics and therapies.
All Science Journal Classification (ASJC) codes