Kenia Narsus's starred repositories

spark

Apache Spark - A unified analytics engine for large-scale data processing

Language:ScalaLicense:Apache-2.0Stargazers:38971Issues:2031Issues:0

skywalking

APM, Application Performance Monitoring System

Language:JavaLicense:Apache-2.0Stargazers:23542Issues:840Issues:5255

horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Language:PythonLicense:NOASSERTIONStargazers:14075Issues:335Issues:2237

infracost

Cloud cost estimates for Terraform in pull requests💰📉 Shift FinOps Left!

Language:GoLicense:Apache-2.0Stargazers:10636Issues:73Issues:889

soar

SQL Optimizer And Rewriter

Language:GoLicense:Apache-2.0Stargazers:8631Issues:279Issues:237

keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Language:GoLicense:Apache-2.0Stargazers:8091Issues:94Issues:2168

skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:6297Issues:71Issues:1641

arrow-datafusion

Apache DataFusion SQL Query Engine

Language:RustLicense:Apache-2.0Stargazers:4928Issues:101Issues:4110

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Language:ScalaLicense:Apache-2.0Stargazers:3189Issues:81Issues:333

AutoSpotting

Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.

Language:GoLicense:OSL-3.0Stargazers:2308Issues:58Issues:244

TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Language:ScalaLicense:BSD-3-ClauseStargazers:2228Issues:148Issues:143

crane

Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applications.

Language:GoLicense:Apache-2.0Stargazers:1831Issues:40Issues:334

gctoolkit

Tool for parsing GC logs

Language:JavaLicense:MITStargazers:1238Issues:41Issues:140

gluten

Gluten: Plugin to Double SparkSQL's Performance

Language:ScalaLicense:Apache-2.0Stargazers:920Issues:31Issues:1303

SparkCTR

CTR prediction model based on spark(LR, GBDT, DNN)

Language:ScalaLicense:Apache-2.0Stargazers:900Issues:54Issues:15

blaze

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

Language:RustLicense:Apache-2.0Stargazers:812Issues:23Issues:65

metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Language:ScalaLicense:MITStargazers:578Issues:52Issues:136

katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components

Language:GoLicense:Apache-2.0Stargazers:394Issues:15Issues:51

sparklint

A tool for monitoring and tuning Spark jobs for efficiency.

Language:ScalaLicense:Apache-2.0Stargazers:356Issues:35Issues:46

flink-sql-lineage

The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.

Language:JavaLicense:Apache-2.0Stargazers:350Issues:13Issues:27

delight

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

Language:ScalaLicense:NOASSERTIONStargazers:341Issues:16Issues:13

compass

Compass is a task diagnosis platform for bigdata

Language:JavaLicense:Apache-2.0Stargazers:329Issues:17Issues:135

gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Language:ScalaLicense:Apache-2.0Stargazers:256Issues:19Issues:550

spark-metrics

Spark metrics related custom classes and sinks (e.g. Prometheus)

Language:ScalaLicense:Apache-2.0Stargazers:172Issues:9Issues:50

eks-spark-benchmark

Performance optimization for Spark running on Kubernetes

Language:ScalaLicense:Apache-2.0Stargazers:84Issues:11Issues:7

bestconf

A tool automatically improving the performance of large-scale systems by finding better configuration settings

Language:JavaLicense:Apache-2.0Stargazers:58Issues:6Issues:2

spark-memory

A tool to get better debug info on spark's memory usage

sql-calculator

这是一个基于 TiDB MySQL 语法解析器的一个工具集,支持1. SQL 指纹(sql fingerprint);2. 数据库库表对比(sql diff): 对比两个数据库的库表差异,并生成源库到目标库对应的差异( DDL) 语句。

Language:GoLicense:Apache-2.0Stargazers:21Issues:1Issues:2
Language:PythonStargazers:6Issues:0Issues:0
Language:JavaStargazers:1Issues:0Issues:0