Josu Alonso's repositories
employee_attrition_analysis
Code in R to describe and analyse employee attrition and predict it using statistical models.
credit-card-fraud-detection
Code to prevent credit card fraud based on the data located at https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
daily-climate-ts
Forecasting of the dataset located at https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data
fake-news-detector
API REST for detecting if a text correspond to a fake news or to a legitimate one. It's served using Flask and uses a fine-tuned BERT model.
movielens-recommender
MovieLens 100K dataset exploration and recommender system building
twitter-sentiment-analysis
ML web app that can be deployed through Docker to analyse the sentiment of tweets.
airlines-sentiment-analysis
Analysis of the information located at https://www.kaggle.com/crowdflower/twitter-airline-sentiment combined with the official dataset at https://www.transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ
automobile_cleaning
Cleaning, preparation and analysis of the data from https://archive.ics.uci.edu/ml/datasets/Automobile
classic-ml-sentiment-analysis
Exercise to use simple ML models to perform sentiment analysis and show they are still relevant to the task
CovidBCN
Project based on getting data from public sources and combine it into a single DDBB to explore
darts-forecasting
General project to learn about Darts module in Python
dogs-vs-cats
Resolution of the classification problem stated at https://www.kaggle.com/uysimty/keras-cnn-dog-or-cat-classification using Keras and ConvNets
helm-charts
Community Helm Charts
house-prices-regression-kaggle
Attempt at the Kaggle competition based on https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview
hr-analytics-ds
Data Discovery and Modeling of the set located at https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists
josumsc.github.io
Personal blog about Data and AI related topics
m1_max_testing
Tensorflow testing on m1 Mac
multi30k-machine-translation
Training of a machine translation system from English to German from the ground up. It uses the dataset contained at https://github.com/multi30k/dataset/tree/master/data/task1/raw
names-analysis-ny
Analysis made to discover patterns in baby names on the NY state. Data to be located at https://health.data.ny.gov/Health/Baby-Names-Beginning-2007/jxy9-yhdk
retrocket-implicit-recommender
Implicit Collaborative Filtering trained on the RetailRocket dataset based on the ideas from arXiv:2009.08950
search_with_machine_learning_course
Public repository for the Search with Machine Learning course taught by Daniel Tunkelang and Grant Ingersoll. Available at https://corise.com/course/search-with-machine-learning?utm_source=daniel.
time-series-datasets-forecasting
Quick and dirty forecasting of the time series datasets located at https://www.kaggle.com/shenba/time-series-datasets?select=sales-of-shampoo-over-a-three-ye.csv
titanic_ml_python
Attempt at the Kaggle competition https://www.kaggle.com/c/titanic/ based on the famous Titanic dataset
web_scrapping_datosmacro
Scrapes the website https://datosmacro.expansion.com/ to get a dataset describing the evolution of the 3 main indicators for the petroleum price between 2000-01 and 2020-10.