bzambri / GCLAP

[Insight Data Engineering project] A climate data warehouse that enables the immediate analysis of large climate datasets without data preparation.

Home Page:http://engineeringhub.me

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Global Climate data Lake and Analysis Platform (GCLAP)

Overview

As a climate scientist, every time I wanted to answer a new research question, I spent hour after hour looking for, downloading, and cleaning new data. According to a study by CrowdFlower (now Figure Eight), data scientists spend about 80% of their time preparing and managing data for analysis. According to the same study, 76% of data scientists view data preparation as the least enjoyablbe part of their work.

The Global Climate data Lake and Analysis Platform (GCLAP) is a climate data warehouse that takes care of the painful process of preparing and managing data. With its user interface, data scientists can simply choose the variable and the analysis that they want to perform, and GCLAP does the work, allowing near-instant download of visualizations or even the output, if the user wants to perform their own visualization using some other popular geophsycial data visualization tool (e.g., NCL).

Data

Monthly output from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis product are available in netcdf format through their API.

Installation

Here is an overview of the steps required to setup the cluster:

  1. Download the ERA5 reanalysis data to S3.
  2. Set up a CockroachDB cluster.
  3. Set up a Spark cluster for data processing.
  4. Calculate desired tables using Spark and save to CockroachDB.
  5. Set up a web server with Apache and UI with Plotly/Dash.
  6. Set up Airflow to run the data pipeline and refresh the data and UI monthly.*

Architecture

Tech Stack

About

[Insight Data Engineering project] A climate data warehouse that enables the immediate analysis of large climate datasets without data preparation.

http://engineeringhub.me


Languages

Language:Python 69.9%Language:Shell 30.1%