vaitybharati

Vaitybharati's repositories

Assignment-11-Text-Mining-01-Elon-Musk

Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.

Language:Jupyter Notebook5 10

Assignment-07-Clustering-Hierarchical-Airlines-

Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.

Language:Jupyter Notebook3 10

Assignment-1-Q20-Basic-Statistics-Level-1-

Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)

Language:Jupyter Notebook3 20

P27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars

Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.

Language:Jupyter Notebook3 10

P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-

Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) Model

Language:Jupyter Notebook3 10

Assignment-06-Logistic-Regression

Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: None

Language:Jupyter Notebook2 30

Assignment-08-PCA-Data-Mining-Wine-

Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)

Language:Jupyter Notebook2 10

P34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-

Unsupervised-ML-t-SNE-Data-Mining-Cancer. Import Libraries, Import Dataset, Convert data to array format, Separate array into input and output components, TSNE implementation, Cluster Visualization

Language:Jupyter Notebook2 10

vaitybharati

Config files for my GitHub profile.

2 20

Assignment-07-DBSCAN-Clustering-Crimes-

Assignment-07-DBSCAN-Clustering-Crimes. Perform Clustering for the crime data and identify the number of clusters formed and draw inferences.

Language:Jupyter Notebook1 10

Assignment-07-K-Means-Clustering-Airlines-

Assignment-07-K-Means-Clustering-Airlines. Perform clustering (K means clustering) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.

Language:Jupyter Notebook1 10

Assignment-09-Association-Rules-Data-Mining-Books-

Association-Rules-Data-Mining-Books. Apriori Algorithm, Association rules with 10% Support and 70% confidence, Association rules with 20% Support and 60% confidence, Association rules with 5% Support and 80% confidence, visualization of obtained rule.

Language:Jupyter Notebook1 10

Assignment-09-Association-Rules-Data-Mining-Groceries-

Association Rules Data Mining (Groceries). Converting the data frame into a list of lists, Using Transactionencoder to transform this dataset into a logical data frame, Building the data frame: rows are logical and columns are the items that have been purchased, Print Column names, We need to drop nan column from the data frame, Most popular items, Top 10 Popular items, Barplot visualization of popular items, Apriori Algorithm: Association rules with 5% Support and 70% confidence, Association rules with 1% Support and 80% confidence, Visualization of obtained rule.

Language:Jupyter Notebook1 10

Assignment-09-Association-Rules-Data-Mining-my_movies-

Assignment-09-Association-Rules-Data-Mining-my_movies. Apriori Algorithm. Association rules with 10% Support and 70% confidence. Association rules with 5% Support and 90% confidence. Lift Ratio > 1 is a good influential rule in selecting the associated transactions. Visualization of obtained rule.

Language:Jupyter Notebook1 10

Assignment-10-Recommendation-System-Data-Mining-books-

Assignment-10-Recommendation-System-Data-Mining-books. Recommend a best book based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique books in the dataset, converting long data into wide data using pivot table, replacing the index values by unique user Ids, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Nullifying diagonal values, Most Similar Users, extract the books which userId 162107 & 276726 have watched, extract the books which userId 276729 & 276726 have watched.

Language:Jupyter Notebook1 10

Assignment-11-Text-Mining-02-Amazon-Product-Reviews

NLP: Sentiment Analysis or Emotion Mining on Amazon Product Reviews - Part-1. Let’s learn the NLP techniques to perform Sentiment Analysis or Emotion Mining on extracted Product Reviews from Amazon. Part-1 covers Text preprocessing and Feature extraction, the next part covers Sentiment Analysis or Emotion Mining on text corpus. https://medium.com/@vaitybharati/nlp-sentiment-analysis-or-emotion-mining-on-amazon-product-reviews-part-1-428d43112027

Language:Jupyter Notebook1 10

Assignment-11-Text-Mining-Amazon-Reviews-using-Scrapy

Text-Mining-Amazon-Reviews-using-Scrapy. Ever wondered? Life would be easier if there could be ways to know how well your product performs and what do people feel about your product? The Solution -Text Mining Techniques. https://medium.com/@vaitybharati/text-mining-how-to-extract-amazon-reviews-using-scrapy-5bd709cb826c

Language:Jupyter Notebook1 10

P26.-Supervised-ML---Multiple-Linear-Regression---Cars-dataset

Supervised-ML---Multiple-Linear-Regression---Cars-dataset. Model MPG of a car based on other variables. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.

Language:Jupyter Notebook1 10

P28.-Supervised-ML---Logistic-Regression---Appointing-Attorney-or-not

Supervised-ML---Logistic-Regression---Appointing-Attorney-or-not. EDA, Model Building, Model Predictions, Testing Model Accuracy, ROC Curve plotting and finding AUC value.

Language:Jupyter Notebook1 10

P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-

Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.

Language:Jupyter Notebook1 10

P30.-Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ.-

Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ. Use Elbow Graph to find optimum number of clusters (K value) from K values range. The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion WCSS. Plot K values range vs WCSS to get Elbow graph for choosing K (no. of clusters)

Language:Jupyter Notebook1 10

P31.-Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers-

Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers. Import Libraries, Import Dataset, Normalize heterogenous numerical data using standard scalar fit transform to dataset, DBSCAN Clustering, Noisy samples are given the label -1, Adding clusters to dataset.

Language:Jupyter Notebook1 10

P32.-Unsupervised-ML---Association-Rules-Data-Mining-Titanic-

Unsupervised-ML---Association-Rules-Data-Mining-Titanic. Data Preprocessing: As the data is categorical format, we are using One Hot Encoding to convert into numerical format. Apriori Algorithm: frequent item sets & association rules. A leverage value of 0 indicates independence. Range will be [-1 1]. A high conviction value means that the consequent is highly depending on the antecedent and range [0 inf]. Lift Ratio > 1 is a good influential rule in selecting the associated transactions.

Language:Jupyter Notebook1 10

P33.-Unsupervised-ML---PCA-Data-Mining-Univ-

Unsupervised-ML---PCA-Data-Mining-Univ. Import Dataset, Converting data to numpy array, Normalizing the numerical data, Applying PCA Fit Transform to dataset, PCA Components matrix or covariance Matrix, Variance of each PCA, Final Dataframe, Visualization of PCAs, Eigen vector and eigen values for a given matrix.

Language:Jupyter Notebook1 10

P35.-Unsupervised-ML---Recommendation-System-Data-Mining-Movies-

Unsupervised-ML-Recommendation-System-Data-Mining-Movies. Recommend movies based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique movies in the dataset, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Slicing first 5 rows and first 5 columns, Nullifying diagonal values, Most Similar Users, extract the movies which userId 6 & 168 have watched.

Language:Jupyter Notebook1 10

vaitybharati

Vaitybharati's repositories

Assignment-11-Text-Mining-01-Elon-Musk

Assignment-07-Clustering-Hierarchical-Airlines-

Assignment-1-Q20-Basic-Statistics-Level-1-

P27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars

P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-

Assignment-06-Logistic-Regression

Assignment-08-PCA-Data-Mining-Wine-

P34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-

vaitybharati

Assignment-07-DBSCAN-Clustering-Crimes-

Assignment-07-K-Means-Clustering-Airlines-

Assignment-09-Association-Rules-Data-Mining-Books-

Assignment-09-Association-Rules-Data-Mining-Groceries-

Assignment-09-Association-Rules-Data-Mining-my_movies-

Assignment-10-Recommendation-System-Data-Mining-books-

Assignment-11-Text-Mining-02-Amazon-Product-Reviews

Assignment-11-Text-Mining-Amazon-Reviews-using-Scrapy

P26.-Supervised-ML---Multiple-Linear-Regression---Cars-dataset

P28.-Supervised-ML---Logistic-Regression---Appointing-Attorney-or-not

P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-

P30.-Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ.-

P31.-Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers-

P32.-Unsupervised-ML---Association-Rules-Data-Mining-Titanic-

P33.-Unsupervised-ML---PCA-Data-Mining-Univ-

P35.-Unsupervised-ML---Recommendation-System-Data-Mining-Movies-

Assignment-12-Naives-Bayes-Classifier-Salary-

Assignment-13-KNN-K-Nearest-Neighbors-Glass-

Assignment-13-KNN-K-Nearest-Neighbors-Zoo-

Assignment-18-Time-Series-Analysis-Forecasting-Airlines-Passengers-

Assignment-18-Time-Series-Analysis-Forecasting-CocaCola-Prices-