huage1994 / feathub

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feathub

Feathub is a feature store that facilitates feature development and deployment to achieve the following objectives:

  • Reduce duplication of data engineering efforts by allowing new ML projects to reuse and share a library of curated production-ready features already registered by existing projects in the same organization.
  • Simplify feature management by allowing users to specify feature definitions and feature processing jobs as code using a declarative framework.
  • Facilitate feature development-to-deployment iteration by allowing users to use the same declarative feature definitions across training and serving, online and offline, without training-serving skew. Feathub takes care of compiling feature definitions into efficient processing jobs and executing those jobs in a distributed cluster.

Feathub provides SDK and infra that enable the following capabilities:

  • Define feature-view (a group of related features) as transformations and joins of the existing feature-views and data sources.
  • Register and retrieve feature-views by names from feature registry.
  • Transform and materialize features for the given time range and/or keys from the feature view into feature stores, by applying transformations on source dataset with point-in-time correctness.
  • Fetch online features by joining features from online feature store with on-demand transformations.

Architecture

The above figures show the Feathub architecture. Please checkout Feathub architecture for more details of these components.

Getting Started

Prerequisites

Prerequisites for building python packages:

  • Unix-like operating system (e.g. Linux, Mac OS X)
  • Python 3.7
  • Java 8
  • Maven >= 3.1.1

Install Feathub

Run the following command to install Feathub from source.

# Build Java dependencies for Feathub 
$ cd java
$ mvn clean package
$ cd ..

# Install Feathub
$ python -m pip install ./python

Quickstart

Quickstart with local processor

Execute the following command to run the nyc_tax.py demo which demonstrates the capabilities described above.

$ python python/feathub/examples/nyc_taxi.py

Quickstart with Flink processor

If you are interested in computing the Feathub features with a local Flink cluster. You can follow the Flink Processor Quickstart.

Additional Resources

  • This tutorial provides more details on how to define, extract and serve features using Feathub.
  • This document explains the Feathub expression language.
  • This document introduces the Flink processor that computes the features with Flink.

Developer Guidelines

Install development dependencies

$ python -m pip install -r python/dev-requirements.txt

Running All Tests

$ pytest -W ignore::DeprecationWarning

Code Formatting

Feathub uses Black to format Python code, flake8 to check Python code style, and mypy to check type annotation.

Run the following command to format codes, check code style, and check type annotation before uploading PRs for review.

# Format python code
$ python -m black python

# Check python code style
$ python -m flake8 --config=python/setup.cfg python

# Check python type annotation
$ python -m mypy --config-file python/setup.cfg python

About

License:Apache License 2.0


Languages

Language:Python 99.5%Language:Shell 0.3%Language:Dockerfile 0.3%