This repository contains the draft code used to explore and analyze the data in the 12/2020 "Technical Scenario" document for VAULT. It is organized into a set of Jupyter notebooks runnable on any Linux or Mac system. For notebooks without interactive plots, the notebook is provided with output embedded directly into it, so that the results can be seen without having to set up and execute the code. Notebooks without output included are meant to be viewed "live", with a running Python server, so that the data can be fully explored interactively. PDF copies of all notebooks are provided for quick skimming or in case the notebook code or data is not available for running. Where appropriate, you can also visit a deployed version of the code.
To understand our algorithm and approach, please see our write-up at High Performance Hit Finder.
To get started with this codebase, see the Quickstart.
You can access deployed versions of the notebooks and dashboard at http://bit.ly/attvault, though these will be taken down at some point after the demo presentation.
See Downloading Data
The notebooks fall into the following categories:
These notebooks start with raw data where possible, with a goal of revealing it as it is, with as little cleanup as possible, so that same process can be applied to new data. These are primarily self contained, not relying on external scripts or modules in this repository (just packages in the Python environment installed).
- Viewing_AIS: Basic rendering of location data from sets of AIS pings. (PDF)
- Viewing_AIS_Categorical: Breakdown of AIS location data by vessel type. (PDF)
- Viewing_TLEs: Basic rendering of earth-centered satellite location at epoch time from sets of TLEs. (PDF)
These notebooks also focus on data, but on derived or computed values.
- Viewing_AIS_Gaps: Visualizing unusually large gaps between AIS pings. (PDF)
- Viewing_Tracks: Visualizing computed satellite tracks (derived from TLE records). (PDF)
These files start with processed/prepared data, and approximate an end-user task (e.g. hit detection).
- Hit_Finder: Notebook for calculating vessels viewable by a satellite over a date/time range. (PDF)
- Hit_Dashboard: End-user app for showing tracks and vessels viewable by a satellite over a date/time range. (PDF)
- DOD_anomaly: Case study provided by H2O for Pinnacle Use Case: Classify Suspicious Activity from AIS Data. (PDF)
- PrepareDataForMachineLearning: Curate and Prepare Data for various Pinnacle Use cases. (PDF)
- AIS_Analyze_Vessel_Cluster: EDA, Stats, K-Means clustering, Plots for a given vessel. (PDF)
- AIS_Anomaly_Detection: Collect stats and flag anomalous vessel coordinates. (PDF)
These files start with raw data and create cleaned/consolidated/computed data for use in the other categories. Many of these rely on scripts in scripts/
, where you can see the detailed computations involved.
- AIS_Parser: Parse the 2015-2017 flat csv files and transform data into Vessel, Broadcast, and Voyage files to be uniform with the GDB Exported Data. (PDF)
- AIS_Validation: Combine all vessels' data and generate clean consolidated files. (PDF)
- TLE_Parser: Validate or correct the TLE data, producing gridded data for ingestion into the compute engine. (PDF)
- TLE_precompute_checks: Various sanity checks on the TLE data. (PDF)
- TLE_to_pytables: Converting TLE data into h5 format. (PDF)
These are all in the scripts/
subdirectory. Most print useful help with given the --help
option, or in their file docstrings.
hit_finder.py
: CLI tool for computing all vessels & times visible to a satellitereverse-hit-finder.py
: CLI tool for computing all satellites visible to a particular vessel MMSIintersect.py
: The core logic of the visibility intersection algorithminterpolate_ais.py
: Generates HDF5 files with synthetic interpolated points for vessel motionbuild_index_parallel.sh
: Parallel driver forsathelpers.py
, to precompute satellite trajectories.
This page serves as the main instruction index. From here, you can navigate to various resources, deliverables, and documention specific to that process.
- Public GitHub – All code/doc/Instructions
- Main Repos: https://github.com/att-vault/vault
- API Repos: https://github.com/att-vault/vault-apis
- Public vault-data-corpus on S3: http://vault-data-corpus.s3-website.us-east-2.amazonaws.com/ (a subset of which is provided at vault-data-minimal, sufficient for running the code)
- Satellite Data - Contains all TLE related data snapshots from various EDA/Curation processes
- Vessel Data - Contains all AIS related data snapshots from various EDA/Curation processes
- Docker Images - Contains latest Docker images for API and Interactive UI App; but you can also use our Jenkins pipeline to build and deploy new Docker images as well.
- Deployed apps at http://bit.ly/attvault , though these will be taken down at some point after the demo presentation.
DoD/government documents:
- 2020-09-30 Executive Summary: DoD Data
- 2020-12-17 Technical Scenario
- 2020-05-01 Space Force Data Management strategy
- 2020-12-30 Example of an open data catalog
- 2020-12-30 Example of an open dataset
- 2014-11-06 Open data standard 1.1
- 2014-11-06 Open data standard 1.1 field Mappings
Data files:
General background: