vthacker / kaldb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KalDB

KalDB is a cloud-native search engine for log management & analytics. It is designed to be very cost-effective, easy to operate, and scale to petabytes.

πŸ’‘ Features

  • Index data persisted on object storage(S3).
  • No delayed logs: Prioritizes ingesting fresh data over older data.
  • Ingest JSON documents with or without a strict schema
  • Plug-in for Grafana UI.
  • Designed to elastically scale to handle log spikes automatically.
  • Based on Apache Lucene.
  • Works out of the box with sensible defaults
  • Built-in multi-tenancy so you don't have to run multiple clusters.
  • Co-ordination free ingestion, so failure of a single node doesn't stop ingestion.
  • Designed for zero data loss.
  • Distributed search
  • Cloud-native: Kubernetes ready
  • Add and remove nodes in seconds. Elastically scale capacity up and down based on load.
  • Decoupled compute & storage
  • Ingest your documents with exactly-once semantics
  • Kafka-native ingestion

πŸ”Ž Uses & Limitations

βœ…   When to use ❌ When not to use
Your documents are immutable: application logs, system logs, access logs, user actions logs, audit trail (logs), etc. Your documents are mutable.
Your data has a time component. KalDB includes optimizations and design choices specifically related to time. You need a low-latency search for e-commerce websites.
You want a full-text search in a multi-tenant environment. You provide a public-facing search with high QPS.
You want to index directly from Kafka. You want to re-score documents at query time.
You ingest a tremendous amount of logs and don't want to pay huge bills.
You ingest a tremendous amount of data and you don't want to waste your precious time babysitting your ElasticSearch cluster.

🎨 Design

KalDB Architecture

πŸ’¬ Community

Talks

KalDB: A k8s native log search platform

πŸ™‹ FAQ

How is KalDB different from traditional search engines like Elasticsearch or Solr?

The core difference and advantage of KalDB is its architecture that is built from the ground up for cloud and log management. KalDB prioritizes indexing fresh logs over older logs, so your logs are never behind. Thanks to the true decoupled compute and storage, search instances are stateless, it is possible to auto-scale capacity within seconds. Last but not least, we implemented a highly-reliable distributed search and exactly-once semantics during indexing so that all engineers can sleep at night. To is cheaper than ES since we implement a tiered storage API, and on-demand restores.

⚑ Getting Started

To build the binary: mvn clean package

Local development

IntelliJ: Import the project as a Maven project.

IntelliJ run configs are provided for all node types, and execute using the provided config/config.yaml. These configurations are stored in the .run folder and should automatically be detected by IntelliJ upon importing the project.

To start the application dependencies (Zookeeper, Kafka, S3) you can use the provided docker compose file:

docker-compose up

About

License:MIT License


Languages

Language:Java 99.1%Language:Go 0.9%Language:Shell 0.0%Language:Dockerfile 0.0%Language:Makefile 0.0%