Nemshan Alharthi's repositories

predicting-Paid-amount-for-Claims-Data

Introduction The context is the 2016 public use NH medical claims files obtained from NH CHIS (Comprehensive Health Care Information System). The dataset contains Commercial Insurance claims, and a small fraction of Medicaid and Medicare payments for dually eligible people. The primary purpose of this assignment is to test machine learning (ML) skills in a real case analysis setting. You are expected to clean and process data and then apply various ML techniques like Linear and no linear models like regularized regression, MARS, and Partitioning methods. You are expected to use at least two of R, Python and JMP software. Data details: Medical claims file for 2016 contains ~17 millions rows and ~60 columns of data, containing ~6.5 million individual medical claims. These claims are all commercial claims that were filed by healthcare providers in 2016 in the state of NH. These claims were ~88% for residents of NH and the remaining for out of state visitors who sought care in NH. Each claim consists of one or more line items, each indicating a procedure done during the doctor’s visit. Two columns indicating Billed amount and the Paid amount for the care provided, are of primary interest. The main objective is to predict “Paid amount per procedure” by mapping a plethora of features available in the dataset. It is also an expectation that you would create new features using the existing ones or external data sources. Objectives: Step 1: Take a random sample of 1 million unique claims, such that all line items related to each claim are included in the sample. This will result in a little less than 3 million rows of data. Step 2: Clean up the data, understand the distributions, and create new features if necessary. Step 3: Run predictive models using validation method of your choice. Step 4: Write a descriptive report (less than 10 pages) describing the process and your findings.

Language:Jupyter NotebookStargazers:20Issues:3Issues:2

Predicting-Oil-Price-With-Time-Series-

The goal of our analysis was to use different time series methods to predict the oil price for the last 6 months of the data, September 2017 through February 2018, and determine the best prediction model for this data.

Language:RStargazers:4Issues:2Issues:0

K-NN-Algorithm-From-Scratch

the challenge : write a KNN Algorithm that 1.Must be able to accept both numeric and categorical features. 2.Must at least perform classification, regression is optional. 3-Use Gower distance (Minkowski’s for continuous and Jaccard for categorical 4.Use Titanic data to predict survival and IRIS to predict type

Language:Jupyter NotebookStargazers:1Issues:3Issues:0

Naive-Bayes-algorithm-From-Scratch

replicating Naive Bayes algorithms from scratch that # Restrict to only categorical features # Give an error message if continuous features are provided # Function must have at least three parameters: Train, Test, Classification variable # Output produces predicted probability and classification #Use Titanic data to predict survival.

Language:Jupyter NotebookStargazers:1Issues:2Issues:1

Sentiment-Analysis-Project

The goal of this project is to apply various natural language processing and text mining techniques on tweets from the hashtag #facebook using word clouds and Bing/NRC lexicons

Language:RStargazers:1Issues:0Issues:0

stanford-cs-229-machine-learning

VIP cheatsheets for Stanford's CS 229 Machine Learning

License:MITStargazers:1Issues:0Issues:0

cheatsheet-translation

Translation of VIP cheatsheets https://stanford.edu/~shervine/teaching/cs-229.html

License:MITStargazers:0Issues:0Issues:0

decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0
Stargazers:0Issues:2Issues:0
Stargazers:0Issues:0Issues:0

Learn_Blockchain_in_2_months

This is the code for "Learn Blockchain in 2 Months" by Siraj Raval on Youtube

Stargazers:0Issues:0Issues:0

pulling-tweets

pulling tweets from twitter

Language:Jupyter NotebookStargazers:0Issues:1Issues:0

r2d3-part-1-data

Dataset of homes in San Francisco and New York

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0