TY - JOUR
T1 - Mapping between fMRI responses to movies and their natural language annotations
AU - Vodrahalli, Kiran
AU - Chen, Po Hsuan
AU - Liang, Yingyu
AU - Baldassano, Christopher
AU - Chen, Janice
AU - Yong, Esther
AU - Honey, Christopher
AU - Hasson, Uri
AU - Ramadge, Peter Jeffrey
AU - Norman, Kenneth Andrew
AU - Arora, Sanjeev
N1 - Publisher Copyright:
© 2017 The Authors
PY - 2018/10/15
Y1 - 2018/10/15
N2 - Several research groups have shown how to map fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock (Chen et al., 2017), and learn bidirectional mappings between fMRI responses and natural language representations. By leveraging data from multiple subjects watching the same movie, we were able to perform scene classification with 72% accuracy (random guessing would give 4%) and scene ranking with average rank in the top 4% (random guessing would give 50%). The key ingredients underlying this high level of performance are (a) the use of the Shared Response Model (SRM) and its variant SRM-ICA (Chen et al., 2015; Zhang et al., 2016) to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low-dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature (Arora et al., 2017) that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data. These optimizations in how we featurize the fMRI data and text annotations provide a substantial improvement in classification performance, relative to standard approaches.
AB - Several research groups have shown how to map fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock (Chen et al., 2017), and learn bidirectional mappings between fMRI responses and natural language representations. By leveraging data from multiple subjects watching the same movie, we were able to perform scene classification with 72% accuracy (random guessing would give 4%) and scene ranking with average rank in the top 4% (random guessing would give 50%). The key ingredients underlying this high level of performance are (a) the use of the Shared Response Model (SRM) and its variant SRM-ICA (Chen et al., 2015; Zhang et al., 2016) to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low-dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature (Arora et al., 2017) that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data. These optimizations in how we featurize the fMRI data and text annotations provide a substantial improvement in classification performance, relative to standard approaches.
KW - FMRI
KW - Multi-modal model
KW - Natural language processing
KW - Natural movie stimulus
KW - Shared response model
KW - Text annotations
UR - http://www.scopus.com/inward/record.url?scp=85023200931&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85023200931&partnerID=8YFLogxK
U2 - 10.1016/j.neuroimage.2017.06.042
DO - 10.1016/j.neuroimage.2017.06.042
M3 - Article
C2 - 28648889
AN - SCOPUS:85023200931
SN - 1053-8119
VL - 180
SP - 223
EP - 231
JO - Neuroimage
JF - Neuroimage
ER -