TY - JOUR
T1 - Joint analysis of expression levels and histological images identifies genes associated with tissue morphology
AU - Ash, Jordan T.
AU - Darnell, Gregory
AU - Munro, Daniel
AU - Engelhardt, Barbara E.
N1 - Funding Information:
B.E.E. was supported by NIH R01 HL133218, a Sloan Faculty Fellowship, NSF CAREER AWD1005627, CZI AWD1005664, and CZI AWD1005667. D.M. was funded by the NSF Graduate Research Fellowship Program under Grant No. DGE 1148900. The input data reported in this paper are archived at The Cancer Genome Atlas (TGCA), study accession IDs TCGA-BRCA (BRCA) and TCGA-LGG (LGG), and dbGaP phs000424.v6 (GTEx). We acknowledge the kind help of Phil Branton, the pathologist for the GTEx project.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.
AB - Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.
UR - http://www.scopus.com/inward/record.url?scp=85102503517&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102503517&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-21727-x
DO - 10.1038/s41467-021-21727-x
M3 - Article
C2 - 33707455
AN - SCOPUS:85102503517
SN - 2041-1723
VL - 12
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 1609
ER -