Nishanth Kumar's repositories
airflow-pyspark-reddit
Example of using Airflow to schedule downloading data form S3 and launching spark jobs
awesome-crawler
A collection of awesome web crawler,spider in different languages
cp-all-in-one
docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud
news-graph
Key information extraction from text and graph visualization
ssis-queries
A set of queries useful to easily extract monitoring and package performance data from SSISDB database
PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
PySpark-Cookbook
PySpark Cookbook, published by Packt
pyspark-example-project
Example project implementing best practices for PySpark ETL jobs and applications.
RetailRhythm
A dynamic data pipeline for real-time sales metric simulation and analysis for big box retailers. It integrates Kafka, Flink, and DuckDB in a Dockerized environment, enhanced by Metabase for actionable insights and dashboard visualizations.
spacy-course
👩🏫 Advanced NLP with spaCy: A free online course
SQLServerMetadata
SQL Server Metadata Toolkit