Ramses Alexander Coraspe Valdez's repositories
apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
data-engineer-challenge
Challenge Data Engineer
pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Dropout-Students-Prediction
The goal of this project is to identify students at risk of dropping out the school
data-engineering-challenge-th
Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)
recommendation-system
Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)
text-analysis-speeches-amlo
Text analysis of the speeches, conferences and interviews of the current president of Mexico
dataengineering-assignment
Prescreening Tasks for Data Engineer
Huffman-decoding
A New Approach for Efficient Sequential Decoding of Static Huffman Codes
Moving-Average-Spark
How to Compute Moving Average with Spark
distance-metrics
Distance metrics are one of the most important parts of some machine learning algorithms, supervised and unsupervised learning, it will help us to calculate and measure similarities between numerical values expressed as data points
Contextual-Data-Transforms
This repository contain the most important contextual data transformation algorithms which help to improve the rate compression reached by statistical encoders. Ramses Alexander Coraspe Valdez
MachineLearning
The repository contains basic experiments using machine learning algorithms with python
Computer-Vision-and-Deep-Learning
This repository contains information on the basic techniques and algorithms used in computer image processing, in addition to some projects related to pattern recognition using deep learning.
Data-Analytics-with-R
Repository for data analytics course using R
GPU-Programming-with-Python
GPU programming with Python, you can take advantage of the incredible computing power of your graphics processing unit GPU. we will work with NVIDIA’s CUDA library.
optimizing-public-transportation
Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.
SparkSQL-with-Python
This repository has some examples of using Spark and SparkSQL with Python through PySpark
burrows-wheeler-transform
Implementation of the algorithm "Burrows Wheeler Transform" in python for data compression
Multiprocessing
Improving the Performance in the Statistical Redistribution of Message Symbols using Architectural patterns for Parallel Programming
Python-recursion
This repository shows the implementation of the most common recursive algorithms
wittline.github.io
My github profile
dag-example
Directed acyclic graph
document-clustering
Agglomerative Hierarchical Document Clustering
move-to-front
Implementation of the algorithm "Move to front" in python for data compression
python-driver
Teradata SQL Driver for Python
SparkInternals
Notes talking about the design and implementation of Apache Spark