Ashish Kumar Yadav's repositories
ALL-Hypothesis-testing
Hypothesis testing using T-test,ANOVA,chi-square test.
DBSCAN-Clustering
Performing DBSCAN(Density based spatial clustering of application with noise) Clustering. As the name suggest it is used specially for diligently handling the noise data or outliers in a dataset.
Detect_Parkinson_XGBOOSTCLASSIFIER
Detecting Parkinson Using extreme gradient boosting(XGBOOSTING) Algorithm.
EDA_on_HousePrice
In this repository I have performed Exploratory data analysis on the dataset famously known as House Price Prediction.
EDA_on_onlineretails
This is an another project in which i have Performed Exploratory data analysis on a dataset about online retailers.
EDA_TitanicSurvivors
In this repository we have performed Exploratory Data analysis to visualise and clean the data. After that we have build two models that is Logistic Regression model and XGBClassifier model to predict the survivors values. And at last we have computed the accuracy for both of our model and also the classifiaction report of the logistic Regression Algorithm.
Encoding_categorical-variables
Mostl oftenly used Encoding techniques for categorical Varibales are performed here.
Exploratory_data_analysis3
In this repository I have performed Exploratory Data Analysis on the dataset student_performance.csv. In which i have tried to detect outliers,missing values,relationship among features and across features,Categorical data and continuous/numerical data.
FE_categorical_missing_values
In this code handling of the missing values for the categorical features from any dataset is shown.
FULL-Feature-Transformations
In this project we have performed all types of feature transfromation on the titanic dataset and we have seen the usage of qqplot to check whether a feature is normal/gaussian distributed or not.
GenChatAssitantBotOAI
This is a plain chatbot devloped using the OPENAI api. It leverages the following libraries - langchain, openai, huggingface_hub, python-dotenv, streamlit, pandas.
Handle-missing-numerical-values
In this code the missisng numerical values inside any feature is handled using various techniques which are mentioned in the coding part itself.
Hierarchical-Clustering
Performing Hierarchical clustering.
Kmeans_Implementation
KMeans algorithm using a random K-value as 2.
KNN-Algorithm
Performing the K-Nearest-Neighbor Algorithm.
LInear-Ridge-Lasso-Regression
Performing all the three regression i.e. Linear, Ridge, Lasso for a dataset.
MAchineLearning_FeatureEngineering1
In this i have performed complete feature engineering that is from handling null values, Categorical features upto performing feature scaling on our test_data and train_data.
ML-FeatureSelection1
Ih this i have tried to perform feature selection from a dataset having 81 features. After feature Selection 81 features reduced to 21 for modelling purpose.
Multicollinearity-in-Regression
Showing how to identify multicollinearity in a regression problem using the OLS(Ordiniary Least Square Method) and correlation chart adn finaly eradicating it.
Multiple-Linear-Regression
Performing multiple linear regression on a simple dataset.
One-hot-Encoding_AllTypes
In this i have tried to perform Simple One hot encoding for categorical features and One hot encoding for Top ten/twenty most frequent categories of a feature.
Optimal-threshold-for-classification
Choosing the most optimal threshold value for classificaation algorithmms in Machine Learning Use cases.
OptimalK-in-KMeans_Clustering
Finding the most optimal k in a KMeans Clustering Algorithm. Here we have discussed two methods used for finding the optimal K-values - Elbow Curve MEthod and Silhouette Analysis method.
ParkinsonDetection_LogisticRegression
This is same problem which is solved in https://github.com/ashishyadav24092000/Detect_Parkinson_XGBOOSTCLASSIFIER project. But here we have used Logistic Regression instead of XGBClassifier to classify the Statuses as 0 or 1 i.e. Parkinson positive or negative. And clearly we can see that how our Accuracy suddenly dropped from 95% to 84% as we moved from XGBClassifier to Logistic Regression.
PCA_dimension_reduction_Technique
Performing PCA(the unsupervised learning technique) for reducing the dimensions
RandomForest-Algorithm
PERFORMING THE RANDOM FOREST CLASSIFIER ALGORITHM ON THE FAMOUS IRIS DATASET.
Seaborn_visualisations
Here we will be taking two dataset from the seaborn library itself i.e. the tip and iris dataset to perform continuous and categorical datapoint visualisations.
Silhouette-Score-In-Clustering
Evaluating the accuracy of Kmeans Clustering using the Silhouette Coefficient or Silhouette Score.
UnivariateAndBivariateAndMultivariate-Analysis
Analysis for univariate, bivariate and multivariate types.