Code to shrink Gaia down to a single DB, for specific purposes (eg. astrometry)
Install the Python requirements:
pip install -r requirements.txt
Install the indexing requirements: first make sure that you have make
and gcc
installed and in your path. Then:
git clone https://github.com/Caltech-IPAC/SpatialIndex.git
cd SpatialIndex
make
python setup.py install
You'll now have the indexing executable SpatialIndex/bin/sptIndx
in your path and the Python spatial-index
installed.
First, edit the config_astrom.py
file to make sure that you are setting the parameters you want to save in the minified DB and any boundaries on parameters that you want to enforce (e.g. mag < 18.0).
Make the DB:
python ./minifyGaia.py make_db --clobber=True
Ingest the DB:
python ./minifyGaia.py ingest_all --source_list=gaia_source.list
This will download the Gaia DR3 *.csv.gz files as they load into the DB.
In Python, connect to the database:
import duckdb
from config_astrom import default_filename, default_table_name
conn = duckdb.connect(default_filename, read_only=True)
Let's say we want to issue a query around a certain point:
import healpy
from ADQL.adql import ADQL
from spatial_index import SpatialIndex
from healpy.pixelfunc import vec2ang
import numpy as np
from astropy.table import Table
ra_source, dec_source = 45.00432028915398, 0.0210477637811747
max_distance_deg = 0.05
adql_string = f"""
SELECT x, y, z, phot_rp_mean_mag, pmra, pmdec, parallax
FROM {default_table_name}
WHERE 1=CONTAINS(
POINT('ICRS', ra, dec),
CIRCLE('ICRS', {ra_source} , {dec_source}, {max_distance_deg}))
adql = ADQL(dbms='sqlite', level=20,
racol='ra', deccol='dec',
xcol = 'x', ycol='y', zcol='z', indxcol='htm20',
mode=SpatialIndex.HTM, encoding=SpatialIndex.BASE10)
q = adql.sql(adql_string)
Now we have our ADQL query converted into a duckDB query. Get the results and add in ra dec colunms
ret = conn.execute(q).fetchall()
radec = np.array([vec2ang(np.array([row[0], row[1], row[2]]),lonlat=True) for row in ret]).squeeze()
df = pd.DataFrame(ret, columns = ["x", "y", "z",
"phot_rp_mean_mag", "pmra", "pmdec", "parallax"])
df["ra"], df["dec"] = radec[:,0] ,radec[:,1]
Now we can filter this as an astropy Table, sorting by distance from the source position:
import astropy.units as u
from astropy.coordinates import SkyCoord
c1 = SkyCoord(ra_source*u.deg, dec_source*u.deg)
c2 = SkyCoord(df["ra"]*u.deg, df["dec"]*u.deg)
df["dist"] = c1.separation(c2).arcsec
del df["x"]
del df["y"]
del df["z"]
r = Table.from_pandas(df)
filter_mask = (
(r["phot_rp_mean_mag"] < 18)
& (r["phot_rp_mean_mag"] > 10)
& (r["parallax"] < 250)
)
r[filter_mask]
r.sort("dist")