TY - JOUR
T1 - Autoencoding Galaxy Spectra. II. Redshift Invariance and Outlier Detection
AU - Liang, Yan
AU - Melchior, Peter
AU - Lu, Sicong
AU - Goulding, Andy
AU - Ward, Charlotte
N1 - Funding Information:
The authors wish to thank Michael Strauss for his swift help with the visual inspection and classification of the outliers discussed in this paper. The authors would also like to thank Jiaxuan Li for his help in setting up the normalizing flow. This work was supported by the AI Accelerator program of the Schmidt Futures Foundation. The authors are pleased to acknowledge that the work reported on in this paper was substantially performed using the Princeton Research Computing resources at Princeton University which is a consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Office of Information Technology’s Research Computing.
Funding Information:
The authors wish to thank Michael Strauss for his swift help with the visual inspection and classification of the outliers discussed in this paper. The authors would also like to thank Jiaxuan Li for his help in setting up the normalizing flow. This work was supported by the AI Accelerator program of the Schmidt Futures Foundation. The authors are pleased to acknowledge that the work reported on in this paper was substantially performed using the Princeton Research Computing resources at Princeton University which is a consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Office of Information Technology’s Research Computing.
Publisher Copyright:
© 2023. The Author(s). Published by the American Astronomical Society.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender, which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.
AB - We present an unsupervised outlier detection method for galaxy spectra based on the spectrum autoencoder architecture spender, which reliably captures spectral features and provides highly realistic reconstructions for SDSS galaxy spectra. We interpret the sample density in the autoencoder latent space as a probability distribution, and identify outliers as low-probability objects with a normalizing flow. However, we found that the latent-space position is not, as expected from the architecture, redshift invariant, which introduces stochasticity into the latent space and the outlier detection method. We solve this problem by adding two novel loss terms during training, which explicitly link latent-space distances to data-space distances, preserving locality in the autoencoding process. Minimizing the additional losses leads to a redshift-invariant, nondegenerate latent-space distribution with clear separations between common and anomalous data. We inspect the spectra with the lowest probability and find them to include blends with foreground stars, extremely reddened galaxies, galaxy pairs and triples, and stars that are misclassified as galaxies. We release the newly trained spender model and the latent-space probability for the entire SDSS-I galaxy sample to aid further investigations.
UR - http://www.scopus.com/inward/record.url?scp=85166421897&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166421897&partnerID=8YFLogxK
U2 - 10.3847/1538-3881/ace100
DO - 10.3847/1538-3881/ace100
M3 - Article
AN - SCOPUS:85166421897
SN - 0004-6256
VL - 166
JO - Astronomical Journal
JF - Astronomical Journal
IS - 2
M1 - 75
ER -