Abstract
We present the Multimodal Universe, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the Multimodal Universe contains hundreds of millions of astronomical observations, constituting 100 TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and “metadata”. In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the Multimodal Universe and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse.
| Original language | English (US) |
|---|---|
| Journal | Advances in Neural Information Processing Systems |
| Volume | 37 |
| State | Published - 2024 |
| Event | 38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: Dec 9 2024 → Dec 15 2024 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Information Systems
- Computer Networks and Communications
Fingerprint
Dive into the research topics of 'The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver