Kauvin Lucas's repositories
maven-unicorn-challenge
This is a web app made with Python consisting of a dashboard that was used as submission for a visualization challenge called "Maven Unicorn Challenge" by Maven Analytics
spark-kubernetes
This repository contains files used to build images to deploy Spark clusters on Kubernetes
Optimizing-a-Pipeline-in-Azure
The main goal of this project was to build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn Logistic Regression model to solve a classification problem. Hyperdrive was used to optimize the model. This was then compared to an Azure AutoML run to see which of these approaches returns the best tuned model.
Spark-StudyClub
#DataEngineeringLATAM
big-data-science-notes
My notes of each module in Big Data Science, an online course offered by Semantix Brasil
DataCamp-Projects
Notebooks of Datacamp projects
dio-analise-de-dados-com-pandas
Neste repositório apresentei os notebooks de analise exploratória e visualização de dados feitos no Python com a ajuda das bibliotecas Pandas e Matplotlib. Este repositório responde ao desafio da plataforma Digital Innovation One.
dio-google-cloud-dataproc
Este repositório contêm os arquivos de contagem de palavras gerados no Google Cloud por meio de script de Python e dentro de um ecossistema de Big Data gerenciado em cloud chamado Google DataProc. O repositório em questão responde ao desafio da plataforma Digital Innovation One.
docker-bigdata
Big Data Ecosystem Docker
fifa18-all-player-statistics
A complete catalog of all the players in Fifa 18 and their complete statistics.
jupyter-spark-enem-2019
In this project, I analyzed the scores of the ENEM 2019, a standardized test used for admission in Brazilian colleges, in the context of existing socioeconomic disparities between participants. PySpark was used for data ingestion and transformation. Pandas, Statsmodels, Matplotlib/Seaborn/Folium, and Scikit-learn were used for descriptive analysis and data visualization.
Predicting_car_accident_severity
Final project submission for the IBM Data Science Professional Certificate specialization
pyspark-stateful-processing-with-twitter-kafka
This is a simple project consisting of a pipeline of streaming processing with Apache Kafka, PySpark and Twitter Streaming API. This project is meant to understand the concepts behind stateful processing and event time processing with Spark Streaming