Currently focusing on broad classification of astronomical objects. Project in hibernation as of May 2018. Like a bear.
- Python 3
- SQLite command line tool (optional)
- Set
LCML
environment variable to repo checkout's path (e.g.,export LCML=/Users/*/code/light_curve_ml
) cd $LCML && pip install -e . --user
See instructions in conf/dev/ubuntu_install.txt
Supervised and unsupervised machine learning pipelines are run via the
run_pipeline.py
entry point. It expects the path to a job (config) file and
file name for logger output. For example:
python3 lcml/pipeline/run_pipeline.py --path conf/local/supervised/macho.json --logFileName super_macho.log
The pipeline expects a job file (macho.json
in above example) specifying the
configuration of the pipeline and detailed declaration of experiment parameters.
The specified job file supercedes and overrides the default job file
(conf/common/pipeline.json
) on a per field basis recursively. So any, or none,
of the default fields may be overridden. The default settings are located at
conf/common/pipeline.json
.
Job files have the following structure:
globalParams
- Parameters used across multiple pipeline stagesdatabase
- All db config and table namesloadData
- Stage coverting raw data into coherent light curvespreprocessData
- Stage cleaning and preprocessing light curvesextractFeatures
- Stage extracting features from cleaned light curvespostprocessFeatures
- Stage further processing extracted featuresmodelSearch
- Stage testing several ML models with differing hyperparametersfunction
- search function namemodel
- ML model spec including non-searched parametersparams
- parameters controlling the model search
serialization
- Stage persisting ML model and metadata to disk
Pipeline 'stages' are customizable processors. Each stage definition has the following components:
skip
- Boolean determining whether stage should executeparams
- stage-specific parameterswriteTable
- name of db table to which output is written
Some representative job files provided in this repo include:
local/supervised/fast_macho.json
- Runs tiny portion of MACHO dataset through all supervised stages. Useful for pipeline debugging and for integration testing.local/supervised/macho.json
- Full supervised learning pipeline for MACHO dataset. Usesfeets
library for feature extraction and random forests for classification.local/supervised/ogle3.json
- Ditto for OGLE3local/unsupervised/macho.json
- Unsupervised learning pipeline for MACHO focused on Mini-batch KMeans and Agglomerative clustering
lcml.data.acquisistion
- Scripts used to acquire and/or process various datasets including MACHO, OGLE3, Catalina, and Gaialcml.poc
- One-off proof-of-concept scripts for various libaries
The LoggingManager
class allows for convenient customization of Python Logger
objects. The default Logging config is specified conf/common/logging.json
.
This config should contain the following main keys:
basicConfig
- values passed tologging.basicConfig
handlers
- handler definitions with atype
attribute, which may be eitherstream
orfile
modules
- list of module specific logger level settings
Main modules should initialize the manager by invoking LoggingManager.initLogging
at the start of execution before logger objects have been created.