PolymathicAI / AstroCLIP

Multimodal contrastive pretraining for astronomical data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AstroCLIP

Official PyTorch implementation and pre-trained models for the paper AstroCLIP: A Cross-Modal Foundation Model for Galaxies.

image

AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification.

Web App

Check out our interactive similarity search app, enabling both in-modal and cross-modal search for galaxies: https://astroclip.streamlit.app/

Installation

The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that the code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below.

pip install --upgrade pip
pip install --upgrade eventlet torch lightning[extra]
pip install -e .

NOTE The package provides the three shortcuts: astroclip_trainer and spectrum_trainer, which link to astroclip/trainer.py, and image_trainer, which links to astroclip/astrodino/trainer.py, as long as it is installed. The shortcuts are defined in the project.scripts section of the pyproject.toml file.

Handling roots

The package expects to load models and data by default from

{ASTROCLIP_ROOT}

You can configure ASTROCLIP_ROOT as well as the weights and biases group in which runs are saved by creating a .env file in the root of astroclip with the following content:

ASTROCLIP_ROOT="/mnt/ceph/users/polymathic/astroclip"
WANDB_ENTITY_NAME="flatiron-scipt"

If no environment is specified, the default path at Flatiron will be assumed.

Pretrained Models

We provide the pretrained AstroCLIP model on the Huggingface model hub for easy access. Additionally, we provide the pretrained single-modal models for galaxy images and spectra as well. Model details, checkpoints, configs and logs are below.

Model Name Pretraining # Params. Download
AstroCLIP CLIP 370M ckpt config logs
Image Encoder DINOv2 302M ckpt config logs
Spectrum Encoder Masked Modeling 43M ckpt config logs

Loading the Pretrained Models

The pretrained AstroCLIP model can be loaded using the following:

from astroclip.models import AstroClipModel
model = AstroClipModel.load_from_checkpoint(
    checkpoint_path = "path_to_model.ckpt",
)

High-Level Performance Overview

Below, we include a high-level performance overview of our models on a variety of downstream tasks. This is non-exhaustive, and we refer the reader to the paper for the full details.

Source Model Type Redshift Properties Morphology
Image AstroCLIP* Zero-Shot 0.79 0.47 0.76
Image Encoder* Zero-Shot 0.63 0.37 0.78
Stein, et al. Zero-Shot 0.36 0.26 0.76
ResNet18 Supervised 0.77 0.43 -
ZooBot1 Supervised - - 0.88
Spectrum AstroCLIP* Zero-Shot 0.99 0.63 -
Spectrum Encoder* Zero-Shot 0.99 0.64 -
Conv+Att2 Supervised 0.99 0.60 -
Photometry MLP Supervised 0.68 0.42 -

We report R-squared metrics on redshift and galaxy property estimation (averaged across all properties) and accuracy on galaxy morphology classification (averaged across all labels). Our models are marked with an asterisk (*). [1] We use the results reported from Walmsley, et al. (2021). [2] We use the encoder from Melchior, et al. (2022).

Data Access

The AstroCLIP model is trained on the cross-matched sample containing optical spectra from the Dark Energy Spectroscopic Instrument (DESI) Early Data Release (EDR) and multi-band images (g,r,z) from the DESI Legacy Survey prepared by Stein, et al. (2022). We provide the dataset as a HuggingFace dataset, which can be accessed directly using

from datasets import load_dataset

# This downloads about 60 GB of data
dset = load_dataset('astroclip/data/dataset.py')

For reproducibility, we include the scripts and a brief description of how to generate the cross-matched dataset in astroclip/data/crossmatch.

Image Pretraining Dataset

image

While the AstroCLIP and Spectrum Encoder models are trained on the image-spectrum dataset, we pretrain the galaxy image model separately on full Stein, et al. (2022) image dataset, which consists of 76M galaxy images. This dataset can be accessed using this globus endpoint:

https://app.globus.org/file-manager?origin_id=9fb0fc0e-e760-11ec-9bd2-2d2219dcc1fa&origin_path=%2F

The directory is organized into south and north surveys, where each survey is split into chunks of 1,000,000 galaxies (sorted by decreasing z-band flux) and saved in hdf5 format. For more details, see here.

Pretraining

AstroCLIP is trained using a two-step process:

  1. We pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately.
  2. We CLIP-align these two encoders on a paired image-spectrum dataset.

Single-Modal Pretraining

Image Pretraining - DINOv2 ViT:

AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the DINOv2 package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context. Model training can be launched with the following command:

image_trainer -c astroclip/astrodino/config.yaml

We train the model using 20 A100 GPUs (on 5 nodes) for 250k steps which takes roughly 46 hours.

Spectrum Pretraining - Masked Modelling Transformer:

AstroCLIP uses a 1D Transformer to encode galaxy spectra. Pretraining is performed using a masked-modeling objective, whereby the 1D spectrum is split into contiguous, overlapping patches. Model training can be launched with the following command:

spectrum_trainer fit -c config/specformer.yaml

We train the model using 4 A100 GPUs (on 1 node) for 30k steps which takes roughly 12 hours.

CLIP Alignment:

Once pretrained, we align the image and spectrum encoder using cross-attention projection heads to maximize the similarity between cross-modal embeddings that correspond to the same galaxy while simultaneously minimizing the similarity between cross-modal embeddings that correspond to different galaxies. Model training can be launched with the following command:

spectrum_trainer fit -c config/astroclip.yaml

We train the model using 4 A100 GPUs (on 1 node) for 25k steps or until the validation loss does not increase for a fixed number of steps. This takes roughly 12 hours.

Downstream Tasks

We demonstrate that the AstroCLIP can be used to easily perform a variety of downstream tasks. In particular, we demonstrate their ability to do:

  1. In-modal and cross-modal similarity search
  2. Photometric redshift prediction
  3. Physical property estimation from images
  4. Physical property estimation from spectra
  5. Morphology classification from images

The details of these downstream tasks and the results in our paper can be found in astroclip/downstream_tasks.

Acknowledgements

This reposity uses datasets and contrastive augmentations from Stein, et al. (2022). The image pretraining is built on top of the DINOv2 framework; we also thank Piotr Bojanowski for valuable conversations around image pretraining.

License

AstroCLIP code and model weights are released under the MIT license. See LICENSE for additional details.

Citation

@article{Parker_2024, title={AstroCLIP: a cross-modal foundation model for galaxies}, volume={531}, ISSN={1365-2966}, url={http://dx.doi.org/10.1093/mnras/stae1450}, DOI={10.1093/mnras/stae1450}, number={4}, journal={Monthly Notices of the Royal Astronomical Society}, publisher={Oxford University Press (OUP)}, author={Parker, Liam and Lanusse, Francois and Golkar, Siavash and Sarra, Leopoldo and Cranmer, Miles and Bietti, Alberto and Eickenberg, Michael and Krawezik, Geraud and McCabe, Michael and Morel, Rudy and Ohana, Ruben and Pettee, Mariel and Régaldo-Saint Blancard, Bruno and Cho, Kyunghyun and Ho, Shirley}, year={2024}, month=jun, pages={4990–5011} }

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Liam Parker
Liam Parker

💻
Francois Lanusse
Francois Lanusse

💻 🔣
Siavash Golkar
Siavash Golkar

💻
Leopoldo Sarra
Leopoldo

💻 🔧
Shirley Ho
Shirley Ho

🤔 🔍
Miles Cranmer
Miles Cranmer

🤔 🎨

This project follows the all-contributors specification. Contributions of any kind welcome!

About

Multimodal contrastive pretraining for astronomical data

License:MIT License


Languages

Language:Python 83.4%Language:Jupyter Notebook 15.7%Language:Shell 0.9%