Pham Anh Tuan (tuanpa2295)

tuanpa2295

Geek Repo

Location:Hanoi

Github PK Tool:Github PK Tool

Pham Anh Tuan's starred repositories

Language:PythonStargazers:2Issues:0Issues:0
Language:Jupyter NotebookStargazers:1Issues:0Issues:0
Language:Jupyter NotebookStargazers:1Issues:0Issues:0

YoutubeAnalytics

An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlDB. The processed analytics data is then sent to Telegram for real-time notifications.

Language:HTMLStargazers:5Issues:0Issues:0
Language:TypeScriptStargazers:2Issues:0Issues:0

Face-Anonymizer

This repository contains different algorithms and methods to anonymize faces in images by blurring or pixelating them using OpenCV and MTCNN in Python

Language:Jupyter NotebookLicense:MITStargazers:3Issues:0Issues:0

FootballDataEngineering

An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.

Language:PythonStargazers:10Issues:0Issues:0

e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

Language:PythonStargazers:116Issues:0Issues:0

Japan-visa-data-engineering

This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.

Language:HTMLStargazers:7Issues:0Issues:0

RedditDataEngineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

Language:PythonStargazers:31Issues:0Issues:0

AlphaTeam

Complex Network Analysis Using Machine Learning

Language:HTMLStargazers:4Issues:0Issues:0

EMR-for-data-engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

Language:PythonStargazers:3Issues:0Issues:0

ApacheFlink-SalesAnalytics

This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.

Language:JavaStargazers:7Issues:0Issues:0

changecapture-e2e

This project shows how to capture changes from postgres database and stream them into kafka

Language:PythonStargazers:19Issues:0Issues:0

FlinkCommerce

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

Language:JavaStargazers:19Issues:0Issues:0
Language:PythonStargazers:3Issues:0Issues:0

realtime-voting-data-engineering

This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to easily spin up the required services in Docker containers.

Language:PythonStargazers:16Issues:0Issues:0

dbt-bigquery-crash-course

A deep dive into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering.

Stargazers:2Issues:0Issues:0

modern-data-eng-dbt-databricks-azure

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

Stargazers:12Issues:0Issues:0

cicd_for_data_engineering

This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with the realm of modern data engineering using Terraform and Azure as the case study

Language:HCLStargazers:7Issues:0Issues:0

RealtimeStreamingEngineering

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

Language:PythonStargazers:16Issues:0Issues:0

Kubernetes-For-DataEngineering

This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow

Language:PythonStargazers:10Issues:0Issues:0

SparkingFlow

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.

Language:JavaStargazers:13Issues:0Issues:0

data-engineering-zoomcamp

Free Data Engineering course!

Language:Jupyter NotebookStargazers:23211Issues:0Issues:0

MLOps-NLP-with-disaster-tweets

MLOps Implementation for Disaster Tweets Classifier Application

Language:Jupyter NotebookLicense:MITStargazers:21Issues:0Issues:0
Language:HTMLStargazers:67Issues:0Issues:0

awesome-system-design-resources

Learn System Design concepts and prepare for interviews using free resources.

License:GPL-3.0Stargazers:12496Issues:0Issues:0

developer-roadmap

Interactive roadmaps, guides and other educational content to help developers grow in their careers.

Language:TypeScriptLicense:NOASSERTIONStargazers:278103Issues:0Issues:0

Fast-Ansible

This repo covers Ansible with LABs: Multipass, Commands, Modules, Playbooks, Tags, Managing Files and Servers, Users, Roles, Handlers, Host Variables, Templates and details.

Language:JinjaLicense:MITStargazers:596Issues:0Issues:0

Fast-Kubeflow

This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

Language:PythonStargazers:72Issues:0Issues:0