sathya-reddy-m's repositories
cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
CommonDataModel
Definition and DDLs for the OMOP Common Data Model (CDM)
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
xbar
Put the output from any script or program into your macOS Menu Bar (the BitBar reboot)
meltano
Your open source DataOps Platform Infrastructure to let you manage all the data tools in your stack in one place, and turn them into your ideal end-to-end data platform
CDM
The Common Data Model (CDM) is a standard and extensible collection of schemas (entities, attributes, relationships) that represents business concepts and activities with well-defined semantics, to facilitate data interoperability. Examples of entities include: Account, Contact, Lead, Opportunity, Product, etc.
fastapi-lakehouse
Connect FastAPI to a Databricks Lakehouse
aws-cloud-mindmaps
Mindmaps about AWS based on public information
superset
Apache Superset is a Data Visualization and Data Exploration Platform
spark-data-standardization
Excellent Validation Schema Validation and transformation for streaming
aws-athena-query-federation
The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
the-book-of-secret-knowledge
A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.
redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible, 10x faster, ZooKeeper free, JVM free! See more at redpanda.com
dbt-databricks
A dbt adapter for Databricks.
sql-style-guide
An opinionated guide for writing clean, maintainable SQL.
ra_data_warehouse
This dbt package contains a set of pre-built, pre-integrated Load and Transform dbt models for common SaaS applications.
Miscellaneous
Scripts and code examples. Includes Spark notes, Jupyter notebook examples for Spark, Impala and Oracle.
data_engineering_tools
data transformation functions and snippets
God-Of-BigData
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
ckan
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
iceberg
Apache Iceberg
apicurio-registry
An API/Schema registry - stores APIs and Schemas.
aim42
public repository for the "architecture improvement method reference"
prefect
The easiest way to automate your data
spark-metrics
Spark metrics related custom classes and sinks (e.g. Prometheus)
synth
The Declarative Data Generator
snapflow
Functional reactive data pipelines
spec
CloudEvents Specification