DataLearns/proton_db_for_ETL

Introduction · Architecture · Get Started · What's next · Documentation · Contributing · Need help?

Introduction

Proton is a unified streaming and historical data processing engine in a single binary. It helps data engineers and platform engineers solve complex real-time analytics use cases, and powers the Timeplus streaming analytics platform.

Proton extends the historical data, storage, and computing functionality of the popular ClickHouse project with streaming and OLAP data processing.

Why use Proton?

A unified, lightweight engine to connect streaming and historical data processing tasks with efficiency and robust performance.
A smooth developer experience with powerful streaming and analytical functionality.
Flexible deployments with Proton's single binary and no external service dependencies.
Low total cost of ownership compared to other analytical frameworks.

Plus built-in support for powerful streaming and analytical functionality:

Functionality	Description
Data transformation	Scrub sensitive fields, derive new columns from raw data, or convert identifiers to human-readable information.
Joining streams	Combine data from different sources to add freshness to the resulting stream.
Aggregating streams	Developer-friendly functions to derive insights from streaming and historical data.
Windowed stream processing (tumble / hop / session)	Collect insightful snapshots of streaming data.
Substreams	Maintain separate watermarks and streaming windows.
Data revision processing (changelog)	Create and manage non-append streams with primary keys and change data capture (CDC) semantics.
Federated streaming queries	Query streaming data in external systems (e.g. Kafka) without duplicating them.
Materialized views	Create long-running and internally-stored queries.

Architecture

See our architecture doc for technical details and the FAQ for more information on the various editions of Proton, how it's related to ClickHouse, and why we chose Apache License 2.0.

Get started

With Docker engine installed on your local machine, pull and run the latest version of the Proton Docker image.

(For Mac or Linux users, you can also download the single binary or use homebrew to use Proton without Docker.)

docker run -d --pull always --name proton ghcr.io/timeplus-io/proton:latest

Connect to your proton container and run the proton-client tool to connect to the local Proton server:

docker exec -it proton proton-client -n

If you stop the container and want to start it again, run docker start proton.

Query a test stream

From proton-client, run the following SQL to create a stream of random data:

-- Create a stream with random data.
CREATE RANDOM STREAM devices(device string default 'device'||to_string(rand()%4), temperature float default rand()%1000/10);

-- Run the long-running stream query.
SELECT device, count(*), min(temperature), max(temperature) FROM devices GROUP BY device;

You should see data like the following:

┌─device──┬─count()─┬─min(temperature)─┬─max(temperature)─┐
│ device0 │    2256 │                0 │             99.6 │
│ device1 │    2260 │              0.1 │             99.7 │
│ device3 │    2259 │              0.3 │             99.9 │
│ device2 │    2225 │              0.2 │             99.8 │
└─────────┴─────────┴──────────────────┴──────────────────┘

What's next?

Now that you're running Proton and have created your first stream, query, and view, you can explore reading and writing data from Apache Kafka with External Streams, or view the Proton documentation to explore additional capabilities.

To see more examples of using Proton, check out the examples folder.

The following drivers are available:

https://github.com/timeplus-io/proton-java-driver JDBC and other Java clients
https://github.com/timeplus-io/proton-go-driver
https://github.com/timeplus-io/proton-python-driver

Integrations with other systems:

Grafana https://github.com/timeplus-io/proton-grafana-source
Metabase https://github.com/timeplus-io/metabase-proton-driver
Pulse UI https://github.com/timeplus-io/pulseui/tree/proton
Homebrew https://github.com/timeplus-io/homebrew-timeplus
dbt https://github.com/timeplus-io/dbt-proton

Get more with Timeplus

To access more features, such as sources, sinks, dashboards, alerts, data lineage, create a workspace at Timeplus Cloud or try the live demo with pre-built live data and dashboards.

Documentation

We publish full documentation for Proton at docs.timeplus.com alongside documentation for the Timeplus (Cloud and Enterprise) platform.

We also have a FAQ for detailing how we chose Apache License 2.0, how Proton is related to ClickHouse, what features are available in Proton versus Timeplus, and more.

Contributing

We welcome your contributions! If you are looking for issues to work on, try looking at the issue list.

Please see the wiki for more details, and BUILD.md to compile Proton in different platforms.

We also encourage you to join the Timeplus Community Slack to ask questions and meet other active contributors from Timeplus and beyond.

Need help?

Join the Timeplus Community Slack to connect with Timeplus engineers and other Proton users.

For filing bugs, suggesting improvements, or requesting new features, see the open issues here on GitHub.

Licensing

Proton uses Apache License 2.0. See details in the LICENSE.

About

A unified streaming and historical data analytics database in a single binary, powered by ClickHouse.

https://timeplus.com

Apache License 2.0

Languages

Language:C++ 76.7%Language:Python 9.1%Language:Assembly 5.4%Language:Shell 4.4%Language:C 2.4%Language:JavaScript 0.6%Language:CMake 0.6%Language:Jinja 0.3%Language:HTML 0.2%Language:Dockerfile 0.1%Language:Perl 0.1%Language:Clojure 0.0%Language:ANTLR 0.0%Language:SCSS 0.0%Language:CSS 0.0%Language:Cap'n Proto 0.0%Language:Java 0.0%Language:C# 0.0%Language:Go 0.0%Language:GAP 0.0%Language:PHP 0.0%Language:Makefile 0.0%Language:Vim Script 0.0%