Dan Tetrick (dantetrick)

dantetrick

Geek Repo

Company:Tetrick Family

Location:Data Scientist at Microsoft

Home Page:https://github.com/dantetrick/

Github PK Tool:Github PK Tool

Dan Tetrick's repositories

US-Pollution-R-Server-Demo

This is a end-to-end data science solution focusing on using R Server and SQL Server 2016 components. In this project we will use the US Pollution data set found at https://www.kaggle.com/sogun3/uspollution. Download this file and place it into the Input Data/Cleaned Data folder after the project is cloned.

Language:RStargazers:3Issues:0Issues:0

Transience_Databricks_LightGBMClassifier

The following notebook is designed to detemine if new info sec assets will have a life-span greater than 3-days using a Light GBM binary classifier. Using Pyspark, the notebook ingests curated data from Azure Data Lake and builds features to censure the data model. ML pipelines are used to vectorized and one-hot encode 10 categorical variables, then over 40 additional numeric variables are added to create a final data model. The data model is randomly sampled into 70/30 training and test data sets and a Light GBM is used to create a classifier with the training data. The training and test data is evaluated with area under the precision/recall curve and model stats are preserved using ML Flow. Final predictions on new data are written back to Azure Data Lake. Please forgive the multiple spelling and grammar errors you will undoubtedly find. Notebook Cmd's have commented code along with markdown to help the reader undestand the details of the processing.

Language:HTMLStargazers:1Issues:0Issues:0

COVID19_NewArticles_RNotebook

DATASET3: ENGLISH NEWS ARTICLES THAT MENTION "CORONA VIRUS" OR "CORONAVIRUS" OR "COVID" (BY WEBHOSE.IO) Link: https://webhose.io/free-datasets/news-articles-that-mention-corona-virus/ Format: JSON | Size: 13.7GB | Crawled: Dec, 2019 - Mar, 2020 Access: Free, but you have to create a profile on webhose.io Main variables: Social media shares and likes; Site name; Site section; Section title; Country; Entities; Participants count; Replies count; Spam score; Performance score; Text; External links

Language:HTMLStargazers:0Issues:0Issues:0

HIBPwned

R Package 📦 for using the HaveIBeenPwned.com API :scream:

Language:RLicense:NOASSERTIONStargazers:0Issues:0Issues:0

XGBoost_Binary_Regression_POC

XGBoost_Binary_Regression_POC

Language:RStargazers:0Issues:0Issues:0