Joyce 's repositories
CodingChallenge
BankruptcyWatch Coding Challenge
Medical_ChatBot
The objective of this project is to create a chatbot that can be used to communicate with users to provide answers to their health issues. This is a RAG implementation using open source stack.
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
BookRecommendationSystem
The objective of the project is to create a recommendation engine using collaborative filtering to recommend books to users.
MovieSpider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
US-immigrations-data-warehouse
A data warehouse to perform analytics on the immigration trends in the US.
SQL-Data-with-Danny-Case-Studies
Case study solutions for #8WeekSQLChallenge at https://8weeksqlchallenge.com
Reddit_Data_Pipeline
The purpose of the project is to create a data pipeline to extract data from Reddit API and create a dashboard to analyse the data. The data is extracted from the subreddit r/Python. The data is extracted daily and uploaded to S3 buckets, and copied to Redshift. The dashboard is created using Google Data Studio.
pyspark_bigdata
Getting started with PySpark for Big data analysis
Data-Warehouse-AWS
A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app. The objective of the project is to create an ETL pieline to build a datawarehouse . We extract data from S3, stage them in Redshift, and transform data into a set of dimensional tables for the analytics team to continue finding insights into what songs their users are listening to.
Data-Modeling-With-Postgres
The main focus of the project is data modeling with Postgres and build an ETL pipeline using Python. The first step is to define fact and dimension tables for a star schema for a particular analytic focus. The second step is to write an ETL pipeline that transfers data from files in different directories into these tables in Postgres using Python and SQL.
networkx
Network Analysis in Python
Kaggle
This repo consists of all my relevant python notebooks in Kaggle. Some of them have a detailed article on Medium.