os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OS-Climate Data Commons

OS-Climate Data Commons is a unified, open Multimodal Data Processing platform used by OS-Climate members to collect, normalize and integrate climate and ESG data from public and private sources, in support of:

  • Corporates in efficiently disclosing and managing their own climate and ESG data, including correcting, reporting and confirming the information in an auditable and secure manner.
  • Data scientists in collaboratively solving data collection, cleaning and normalization issues, based on shared modeling standards, tooling and commnunity development following a data pipeline as code approach.
  • Decision makers such as investors, financial institutions, regulators in integrating new or existing scenario-based predictive analytics with an open repository of trustworthy climate data.

Overview

OS-C Data Commons Platform Overview

The Data Commons platform aims at bridging climate-related data gaps across 3 dimensions:

  1. Data Availability: The platform supports data availability through data democratization via self-service data infrastructure as a platform. A self-service platform is fundamental to a successful data mesh architectural approach where existing data sources are federated and can be made discoverable and shareable easily across an organization and ecosystem through open tools and readily available infrastructure supporting data creation, storage, transformation and distribution.

  2. Data Comparability: The platforms supports data comparability through domain-oriented decentralized data ownership and architecture i.e. data is treated like a product. The goal is to stop proliferation of data puddles to “connect” the data with proper referential and relevant industry identifiers in order to have collections of data aligned with business goals.

  3. Data Reliability: The platform supports data reliability through a federated data access, data lifecycle management, security and compliance. This supports a data as code approach where the data pipeline code, the data itself and data schema are versioned so as to have transparency and reproducibility (time machine), while enforcing authentication and authorization required for data access management with consistent policies across the platform and throughout the data lineage.

For more information on this and how Data Commons fits into the picture, good introduction links include the official Data Commons page on OS-Climate website, as well as the video recording of the Data Commons Platform Overview at the COP26 in Glasgow. Detailed platform documentation maintained by our community is available in this repository and accessible through the links below.

Architecture

Data Commons Architecture Blueprint

Developer Resources

Data Commons Developer Guide

About

Repository for Data Commons platform architecture overview, as well as developer and user documentation

License:Apache License 2.0