andyk / lineapy

Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.

Home Page:https://lineapy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Versions Build Documentation Status Slack License PyPi Twitter

Supercharge your data science workflow with LineaPy! Just two lines of code captures, analyzes, and transforms Python code to extract production data pipelines in minutes.

Why Use LineaPy?

Going from development to production is full of friction. Data engineering is a manual and time-consuming process. A proliferation of libraries, tools, and technologies means data teams spend countless hours managing infrastructure and repeating tasks. This drastically reduces the team’s ability to deliver actionable insights in real-time.

LineaPy creates a frictionless path for taking your data science work from development to production, backed by a decade of research and industry expertise tackling hyperscale data challenges.

Data engineering, simplified. Your data science artifact works, now comes the cleanup. LineaPy extracts essential operations from the messy development code in minutes not days, simplifying data engineering with just two lines of code.

You analyze, we productionize. Productionization is manual, messy, and it requires software engineering expertise to create clean, reproducible code. LineaPy automatically handles lineage and refactoring so you can focus on experimenting, analyzing, and modeling.

Move fast from prototype to pipeline. LineaPy automates code translations to save you time and help you stay focused. Rapidly create analytics pipelines with a simple API — no refactoring or new tools needed. Go from your Jupyter notebook to an Airflow pipeline in minutes.

Getting Started

Installation

To install LineaPy, run:

$ pip install lineapy

Or, if using poetry, run:

$ poetry add lineapy

Or, if you want the latest version of LineaPy directly from the source, run:

python -m pip install git+https://github.com/LineaLabs/lineapy.git

Interfaces

Jupyter and IPython

To use LineaPy in an interactive computing environment such as Jupyter Notebook/Lab or IPython, launch the environment with the lineapy command, like so:

$ lineapy jupyter notebook
$ lineapy jupyter lab
$ lineapy ipython

This will automatically load the LineaPy extension in the corresponding interactive shell application.

CLI

We can also use LineaPy as a CLI command. Run:

$ lineapy python --help

to see available options.

Quick Start

Once you have LineaPy installed, you are ready to start using the package. We can start with a simple example that demonstrates how to use LineaPy to store a variable's history. The lineapy.save() function removes extraneous code to give you the simplest version of a variable's history.

Say we have development code looking as follows:

import lineapy

# Define text to display in page heading
text = "Greetings"

# Some irrelevant operation
num = 1 + 2

# Change heading text
text = "Hello"

# Another irrelevant operation
num_squared = num**2

# Augment heading text
text = text + " World!"

# Try an alternative display
alt_text = text.split()

Now, we have reached the end of our development session and decided that we like what we see when we print(text). As shown above, text has gone through different modifications, and it might not be clear how it reached its final state especially given other extraneous operations between these modifications. We can cut through this by running:

# Store the variable's history or "lineage"
lineapy.save(text, "text_for_heading")

# Retrieve the stored "artifact"
artifact = lineapy.get("text_for_heading")

# Obtain the simplest version of a variable's history
print(artifact.code)

which will print:

text = "Hello"
text = text + " World!"

Note that these are the minimal essential steps to get to the final state of the variable text. That is, LineaPy has performed code cleanup on our behalf.

Usage Reporting

LineaPy collects anonymous usage data that helps our team to improve the product. Only LineaPy's API calls and CLI commands are being reported. We strip out as much potentially sensitive information as possible, and we will never collect user code, data, variable names, or stack traces.

You can opt-out of usage tracking by setting environment variable:

$ export LINEAPY_DO_NOT_TRACK=true

What Next?

To learn more about LineaPy, please check out the project documentation which contains many examples you can follow with. Some key resources include:

Resource Description
Docs This is our knowledge hub — when in doubt, start here!
Concepts Learn about key concepts underlying LineaPy!
Tutorials These notebook tutorials will help you better understand core functionalities of LineaPy
Use Cases These domain examples illustrate how LineaPy can help in real-world applications
API Reference Need more technical details? This reference may help!
Contribute Want to contribute? These instructions will help you get set up!
Slack Have questions or issues unresolved? Join our community and ask away!

About

Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.

https://lineapy.org

License:Apache License 2.0


Languages

Language:Python 93.4%Language:Jupyter Notebook 6.1%Language:Makefile 0.3%Language:Jinja 0.1%Language:Dockerfile 0.1%Language:Shell 0.0%