Francis Joseph's starred repositories
the-algorithm-ml
Source code for Twitter's Recommendation Algorithm
comprehensive-rust
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
learn-regex
Learn regex the easy way
scala-algorithms
Algorithms and Data Structures in Scala
GoogleSummerOfCode
Ideas list for GSoC 2024 mentored by Scala Center
blockchain-documentation-project
Blockchain written in Python - A Documentation Project
full-blockchain-solidity-course-py
Ultimate Solidity, Blockchain, and Smart Contract - Beginner to Expert Full Course | Python Edition
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
ethereum-etl-airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
kafka-crypto-questdb
Using Kafka to track cryptocurrency price trends
Production-of-Cryptocurrency-Data-Lake-Using-Spark-
This project is a ETL pipeline processing structured financial data and unstructured social media data related to cryptocurrencies(dataset with millions of record). Which prepare for exploring the relationship between the price trend of cryptocurrency assets and the sentiment of its social media platform. Use python, spark, BianceAPI, etc. to extract tradedata from the cryptocurrency exchange platform, transform it to marketdata on AWS EMR, and store it in AWS S3 Bucket. Use python, spark, TwitterAPI, etc. to extract tweets from the Twitter platform, transform and store them in AWS S3 Bucket. Perform data quality checks on tweets and marketdata and persist them on AWS S3 Bucket. Utilized:Python,Pyspark,Spark,SQL,AWS,Amazon S3,AWS EMR,BianceAPI,TwitterAPI,Data Quality,Structured data,Unstructured Data,Data Lake,ETL,Big Data,Hadoop.
PySpark-Confluent-Kafka-Apache-Drill-
A code-based tutorial for production level data streaming with PySpark plus Optimus for data cleaning, Confluent Kafka, & Apache Drill using Docker and Cassandra (NoSQL DB) for storage; This allows for for fast feature engineering and data cleaning.
pyspark-boilerplate-mehdio
Pyspark boilerplate for running prod ready data pipeline
Spark-Programming-In-Python
Apache Spark 3 - Spark Programming in Python for Beginners
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
pyspark-pictures
Learn the pyspark API through pictures and simple examples
pyspark-examples
Pyspark RDD, DataFrame and Dataset Examples in Python language
pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.