TejasSutar01

Tejas Vasant Sutar's repositories

Sentiment-Analysis

Need to get daily analysis of product and extract the sentiments, emotions etc. using Amazon data and correlate it with NSE or BSE stock market over past 3 months.

Language:Python2 20

The objective of this analysis is to provide a reliable regression model to predict the price of a car based on the variables provided as accurately as possible. The idea is for this to be used in the future for any new cars that would added to the dataset going forward.

Language:Jupyter Notebook1 10

DECISION_TREE_COMPANY

About the data: Let’s consider a Company dataset with around 10 variables and 400 records. The attributes are as follows:  Sales -- Unit sales (in thousands) at each location  Competitor Price -- Price charged by competitor at each location  Income -- Community income level (in thousands of dollars)  Advertising -- Local advertising budget for company at each location (in thousands of dollars)  Population -- Population size in region (in thousands)  Price -- Price company charges for car seats at each site  Shelf Location at stores -- A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site  Age -- Average age of the local population  Education -- Education level at each location  Urban -- A factor with levels No and Yes to indicate whether the store is in an urban or rural location  US -- A factor with levels No and Yes to indicate whether the store is in the US or not The company dataset looks like this: Problem Statement: A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A decision tree can be built with target variable Sale (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

Language:Python1 20

DECISION_TREE_FRAUD_DATA

Use decision trees to prepare a model on fraud data treating those who have taxable_income <= 30000 as "Risky" and others are "Good"

Language:Python1 20

Forecast-for-PM2.5

Forecast for PM2.5

Language:Jupyter Notebook1 20

FORECASTING_AIRLINES_DATA

Forecast the Airlines Passengers data set. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

Language:Python1 20

FORECASTING_COCACOLA_SALES

Forecast the CocaCola prices data set. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

Language:Python1 20

FORECASTING_COCASALES_DATADRIVEN_MODEL

1 20

FORECASTING_PLASTIC_SALES_DATA

Forecast the Plastic sales data set. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

Language:Python1 20

KNN_GLASS_DATA

Prepare a model for glass classification using KNN Data Description: RI : refractive index Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) Mg: Magnesium AI: Aluminum Si: Silicon K:Potassium Ca: Calcium Ba: Barium Fe: Iron Type: Type of glass: (class attribute) 1 -- building_windows_float_processed 2 --building_windows_non_float_processed 3 --vehicle_windows_float_processed 4 --vehicle_windows_non_float_processed (none in this database) 5 --containers 6 --tableware 7 --headlamps

Language:Python1 20

KNN_IRIS_DATA

Implement a KNN model to classify the species in to categories

Language:Python1 20

KNN_ZOO_DATA

Implement a KNN model to classify the animals in to categories

Language:Python1 20

LOGISTIC_REGRESSION_AFFAIRS_DATA

I have a dataset containing family information of married couples, which have around 10 variables & 600+ observations. Independent variables are ~ gender, age, years married, children, religion etc. I have one response variable which is number of extra marital affairs. Now, I want to know what all factor influence the chances of extra marital affair. Since extra marital affair is a binary variable (either a person will have or not), so we can fit logistic regression model here to predict the probability of extra marital affair. install.packages('AER') data(Affairs,package="AER")

Language:Python1 20

LOGISTIC_REGRESSION_BANK_DATA

Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no")

Language:Python1 20

Multiple-Linear-Regression_50_Startups

Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model.

Language:Python1 20

Multiple-Linear-Regression_ToyotaCorolla

Consider only the below columns and prepare a prediction model for predicting Price. Corolla<-Corolla[c("Price","Age_08_04","KM","HP","cc","Doors","Gears","Quarterly_Tax","Weight")]

Language:Python1 20

NB_SALARY_DATA

Prepare a classification model using Naive Bayes for salary data Data

Language:Python1 20

NB_SMS_DATA

Build a naive Bayes model on the data set for classifying the ham and spam

Language:Python1 20

RANDOM_FOREST_COMPANY_DATA

A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A Random Forest can be built with target variable Sales (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

Language:Python1 20

RANDOM_FOREST_FRAUD_DATA

Use Random Forest to prepare a model on fraud data treating those who have taxable_income <= 30000 as "Risky" and others are "Good"

Language:Python1 20

Automatic_Candidates_CV_Recommendation

This project fetches the candidates CV present in database already and recommend proper candidates with skills.

Language:Python000

Chronic_Diseases_Prediction

The data was taken over a 2-month period in India with 25 features ( eg, red blood cell count, white blood cell count, etc). The target is the 'classification', which is either 'ckd' or 'notckd' - ckd=chronic kidney disease. There are 400 rows

Language:Jupyter Notebook020

Classifier-Model

Worked on IRIS Data set to classify class if we feed any new data.I was able to build Decision Tree model was giving good accuracy around 91%. Decision Tree was build with “Entropy” & “Gini Index".

Language:Jupyter Notebook000

GRIP_The-Sparks-Foundation_Data-Science-Internship

Task Received for spark foundation internship

Language:Jupyter Notebook000

Health-Insurance---JOB-A-THON---Analytics-Vidhya

Your Client FinMan is a financial services company that provides various financial services like loan, investment funds, insurance etc. to its customers. FinMan wishes to cross-sell health insurance to the existing customers who may or may not hold insurance policies with the company. The company recommend health insurance to it's customers based on their profile once these customers land on the website. Customers might browse the recommended health insurance policy and consequently fill up a form to apply. When these customers fill-up the form, their Response towards the policy is considered positive and they are classified as a lead. Once these leads are acquired, the sales advisors approach them to convert and thus the company can sell proposed health insurance to these leads in a more efficient manner. Now the company needs your help in building a model to predict whether the person will be interested in their proposed Health plan/policy given the information about: Demographics (city, age, region etc.) Information regarding holding policies of the customer Recommended Policy Information

Language:Jupyter Notebook020

Keras-Tuner

000

Marksheet_OCR

Extracting the entities from marksheet using generative AI

000

Pima-Indians-Diabetes-Database

Context This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. Content The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on. Acknowledgements Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press. Inspiration Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

Language:Jupyter Notebook020

test

Language:HTML010

Twitter-Tweet-Analysis

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies.But, it’s not always clear whether a person’s words are actually announcing a disaster

Language:Jupyter Notebook010