Tech-with-Vidhya

Tech-with-Vidhya's starred repositories

copilot-codespaces-vscode

Develop with AI-powered code suggestions using GitHub Copilot and VS Code

productionized_docker_ML_model_application_into_kubernetes_cluster_using_AWS_EKS_CloudFormation_EMR

This project covers the end to end implementation of deploying and productionizing a dockerized/containerized machine learning python flask application into Kubernetes Cluster using the AWS Elastic Kubernetes Service (EKS), AWS Serverless Fargate Instances, AWS CloudFormation Cloud Stack and AWS Elastic Container Registry (ECR) Service. The machine learning business case implemented in this project includes a bank note authentication binary classifier model using Random Forest Classifier; which predicts and classifies a bank note either as a Fake Bank Note (Label 0) or a Genuine Bank Note (Label 1). Implementation Steps: 1. Creation of an end to end machine learning solution covering all the ML life-cycle steps of Data Exploration, Feature Selection, Model Training, Model Validation and Model Testing on the unseen production data. 2. Saved the finalised model as a pickle file. 3. Creation of a Python Flask based API; in order to render the ML model solution and inferences to the end-users. 4. Verified and tested the working status of the Python Flask API in the localhost set-up. 5. Creation of a Docker File (containing the steps/instructions to create a docker image) for the Python Flask based Bank Note Authentication Machine Learning Application embedded with Random Forest ML Classifier Model. 6. Creation of IAM Service Roles with appropriate policies to access the AWS Elastic Container Registry (ECR) Service and AWS Elastic Kubernetes Service (EKS) and AWS CloudFormation Service. 7. Created a new EC2 Linux Server Instance in AWS and copied the web application project’s directories and its files into the AWS Linux Server using SFTP linux commands. 8. Installed the Docker software and the supporting python libraries in the AWS EC2 Linux Server Instance; as per the “requirements.txt” file. 9. Transformation of the Docker File into a Docker Image and Docker Container representing the application; using docker build and run commands. 10. Creation of a Docker Repository within the AWS ECR Service and pushed the application docker image into the repository using AWS Command Line Interface (CLI) commands. 11. Creation of the Cloud Stack with private and public subnets using the AWS CloudFormation Service with appropriate IAM roles and policies. 12. Creation of the Kubernetes Cluster using the AWS EKS Service with appropriate IAM roles and policies and linked the cloud stack created using the AWS CloudFormation Service. 13. Creation of the AWS Serverless Fargate Profile and Fargate instances/nodes. 14. Creation and configured the “Deployment.yaml” and “Service.yaml” files using the Kubernetes kubectl commands. 15. Applied the “Deployment.yaml” with pods replica configuration to the AWS EKS Cluster Fargate Nodes; using the Kubernetes kubectl commands. 16. Applied the “Service.yaml” using the Kubernetes kubectl commands; to render and service the machine learning application to the end-users for public access with the creation of the production end-point. 17. Verified and tested the inferences of the productionized ML Application using the AWS Fargate end-point created in the AWS Kubernetes EKS Cluster. Tools & Technologies: Python, Flask, AWS, AWS EC2, Linux Server, Linux Commands, Command Line Interface (CLI), Docker, Docker Commands, AWS ECR, AWS IAM, AWS CloudFormation, AWS EKS, Kubernetes, Kubernetes kubectl Commands.

Language:Jupyter Notebook7 10

productionized_docker_ML_model_application_into_AWS_EC2_Linux

This project covers the end to end implementation of deploying and productionizing a dockerized/containerized machine learning python flask application into AWS Elastic Compute Cloud (EC2) Instance and AWS Elastic Container Registry (ECR) Service. The machine learning business case implemented in this project includes a bank note authentication binary classifier model using Random Forest Classifier; which predicts and classifies a bank note either as a Fake Bank Note (Label 0) or a Genuine Bank Note (Label 1). The implementation includes below steps: 1. Creation of an end to end machine learning solution covering all the ML life-cycle steps of Data Exploration, Feature Selection, Model Training, Model Validation and Model Testing on the unseen production data. 2. Saved the finalised model as a pickle file. 3. Creation of a Python Flask based API; in order to render the ML model solution and inferences to the end-users. 4. Verified and tested the working status of the Python Flask API in the localhost set-up. 5. Creation of a Docker File (containing the steps/instructions to create a docker image) for the Python Flask based Bank Note Authentication Machine Learning Application embedded with Random Forest ML Classifier Model. 6. Creation of IAM Service Roles with appropriate policies to access the AWS Elastic Container Registry (ECR) Service and AWS Elastic Compute Cloud (EC2) Service. 7. Created a new EC2 Linux Server Instance in AWS and copied the web application project’s directories and its files into the AWS Linux Server using SFTP linux commands. 8. Installed the Docker software and the supporting python libraries in the AWS EC2 Linux Server Instance; as per the “requirements.txt” file. 9. Transformation of the Docker File into a Docker Image and Docker Container representing the application; using docker build and run commands. 10. Creation of a Docker Repository within the AWS ECR Service and pushed the application docker image into the repository using AWS Command Line Interface (CLI) commands. 11. Deployment of the dockerized/containerized Python Flask ML application into the AWS EC2 Linux Instance; with the creation of the production end-point. 12. Verified and tested the inferences of the productionized ML Application using the AWS EC2 end-point. Tools & Technologies: Python, Flask, AWS, AWS EC2, Linux Server, Linux Commands, Command Line Interface (CLI), Docker, Docker Commands, AWS ECR, AWS IAM

Language:Jupyter Notebook6 10

Automated_ETL_Finance_Data_Pipeline_with_AWS_Lambda_Spark_Transformation_Job_Python

This project covers the implementation of building an automated ETL data pipeline using Python and AWS Services with Spark transformation job for financial stocks trade transactions. The ETL Data Pipeline is automated using AWS Lambda Function with a Trigger defined. Whenever a new file is ingested into the AWS S3 Bucket; then the AWS Lambda Function gets triggered and will implement the further action to execute the AWS Glue Crawler ETL Spark Transformation Job. The Spark Transformation Job implemented using Python PySpark transforms the trade transactions data stored in the AWS S3 Bucket; further to filter a sub-set of trade transactions for which the total number of shares transacted are less than or equal to 100. Tools & Technologies: Python, Boto3, PySpark, SDK, AWS CLI, AWS Virtual Private Cloud (VPC), AWS VPC Endpoint, AWS S3, AWS Glue, AWS Glue Crawler, AWS Glue Jobs, AWS Athena, AWS Lambda, Spark

Language:Python4 20

Bitcoin_Network_Analytics_using_Python_NetworkX_and_Gephi

This group project of 4 members is delivered as part of my Masters in Big Data Science (MSc BDS) Program Module named “Digital Media and Social Network” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the network analysis covering 4 different problem statements and use cases using python NetworkX package, Gephi network analysis tool and Microsoft excel. Dataset: Dataset includes Bitcoin Trade Transactions for the period between 2011 to 2016. Dataset Representation: Bitcoin Trade Transactions -> Attributes (Rater, Ratee, Rating and Timestamp) Network Formation: For every trade transaction between 2 users in the Bitcoin Network; ratings are recorded and tracked in the system with the corresponding timestamp (Directed Network). Size of the Dataset and Network: Users/Nodes = 5881 Transactions/Edges = 35592 Ratings (in the range of -10 to +10; where -10 represents the least rating and +10 represents the highest rating) Basic Network Statistics: Use Cases and Objectives:

Language:Jupyter Notebook4 10

ETL_Finance_Data_Pipeline_Python_AWS_CLI_S3_Glue_Athena

This project covers the implementation of building a ETL data pipeline using Python and AWS Services for financial stocks trade transactions. Tools & Technologies: Python, Boto3 SDK, AWS CLI, AWS Virtual Private Cloud (VPC), AWS VPC Endpoint, AWS S3, AWS Glue, AWS Glue Crawler, AWS Athena, AWS Redshift

Language:Python4 10

MLOps_AWS_LoadBalancing_Docker_Flask_Terraform_Banking_Customers_Churn_Prediction_Ensemble_Technique

This is an AWS MLE and MLOps Bank Customers Churn Prediction Project.

Language:Jupyter Notebook3 10

AWS_ETL_NLP_Auto-Reply_Query_Handler_Using_Kafka_Spark_LSTM_Deep_Learning

Language:Jupyter Notebook2 10

AWS_SageMaker_Bank_Marketing_Predictions_using-XGBoost_Model

This is an AWS SageMaker Bank Marketing Prediction Machine Learning Project.

Language:Jupyter Notebook2 10

bank_credit_card_customers_segmentation_using_unsupervised_k_means_clustering_analysis

This project deals with the segmentation and grouping of the bank credit card customers using UnSupervised K-Means Clustering Algorithm. The project involves below steps in the life-cycle and implementation. 1. Data Exploration, Analysis and Visualisations 2. Data Cleaning 3. Data Pre-Processing and Scaling 4. Model Fitting 5. Model Validation using Performance Quality Metrics namely WCSS, Elbow Method and Silhouette Coefficient/Score 6. Optimized Model Selection with appropriate number of clusters based on the various Performance Quality Metrics 7. Analysis Insights and Interpretations of 2 different business scenarios with various Visualisations

Language:Jupyter Notebook2 20

Building_ETL_Data_Pipeline_on-AWS_EMR_Cluster_Hive_Tables_Tableau_Visualisation

This project covers the implementation of building a ETL batch data pipeline using Python and AWS Services for sales data. The persisted batch sales data is stored in the AWS S3 Bucket and ingested into the AWS Elastic MapReduce (EMR) Cluster. This ingested data is further transformed using Apache Hive Tables and finally consumed by Tableau application for displaying the sales related visualisations as a dashboard. Tools & Technologies: Python, Boto3, AWS CLI, AWS S3, AWS EMR, Apache Hive, Tableau

2 10

anomaly-detection-proximity-based-method-knn

This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Data Mining” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the Implementation of the Outlier Detection using the proximity-based method of k-nearest neighbors to calculate the outlier scores on the”house prices” dataset; with the inclusion of the data pre-processing steps of z-score normalisation and PCA dimensionality reduction techniques. The implementation is executed using Python’s libraries namely pandas, numpy, matplotlib, sklearn and scipy. The solution includes the computation of the Euclidean distance to further detect the top 3 outlier houses with the highest prices when compared with the average house price. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.

1 10

apache-spark-rdd-computations-E2E-implementation-with-transformations-and-actions-gutenberg-data

This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Big Data Processing” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the development of Spark RDD computations from scratch using python’s pyspark package and regular expressions functions for the “Gutenberg” private data files; containing hundreds of books downloaded from the project Gutenberg, written in different languages. Implemented the use of basic transformations (namely flatMap, map, reduceByKey) and actions on the RDDs and submitted spark jobs to the cluster. Identified solutions to the questions namely: 1. Counting the total number of words 2. Total number of occurrences of each unique word 3. Computation of Top 10 words using the Spark’s ‘takeOrdered’ function **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.

1 10

AWS_SageMaker_TensorFlow_Keras_CNN_Model_Fashion_MNIST

This is an AWS SageMaker TensorFlow Keras CNN Machine Learning Project.

Language:Jupyter Notebook1 10

bank_credit_card_transactions_fraud_detection_using_unsupervised_DBSCAN_clustering

This project deals with the segmentation and grouping of the bank credit card fraud transactions using UnSupervised Density Based Spatial Clustering of Applications with Noise (DBSCAN) Clustering Algorithm. The project involves below steps in the life-cycle and implementation. 1. Data Exploration and Analysis 2. Data Pre-Processing, Scaling and Normalisation 3. Dimensionality Reduction using Principal Component Analysis (PCA) 4. Model Fitting 5. Model Hyper Parameters Tuning 6. Model Validation using Performance Quality Metrics namely Silhouette Coefficient/Score and Homogeneity Score 7. Optimized Model Selection with appropriate number of clusters based on the various Performance Quality Metrics

Language:Jupyter Notebook1 10

Coursera-Deep-Learning-Specialization-2021

Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Networks;

Language:Jupyter Notebook100

Coursera-Deep-Learning-Specialization-2023

Contains Solutions to Deep Learning Specailization - Coursera

Language:Jupyter NotebookMIT100

Coursera-Machine-Learning-Specialization-2023

Contains Solutions and Notes for the Machine Learning Specialization By Stanford University and Deeplearning.ai - Coursera (2022) by Prof. Andrew NG

Language:Jupyter NotebookMIT100

data-warehousing-OLAP-implementation-of-IBRD-balance-sheet-data

This project is delivered as part of my Masters in Big Data Science (MSc BDS) Program for the module named “Data Mining” in Queen Mary University of London (QMUL), London, United Kingdom. This project covers the Implementation of the Data Warehousing and On-line Analytical Processing (OLAP) concepts of data cubes, data cube measures, OLAP operations and data cube computations using the International Bank for Reconstruction and Development (IBRD) Balance Sheet private dataset. The implementation is executed using Python and its various packages namely cubes and sqlalchemy. The solution includes indexing the OLAP data using bitmap indices, creation of base tables, creation of bitmap index tables, creation of the data cube model in the JSON file format with aggregate functions, creation of the data cube, and computation of the results for the various aggregate measures as defined in the data cube. **NOTE:** Due to the data privacy and the data protection policy to be adhered by the students; the datasets and the solution related code are not exposed and updated in the GitHub public profile; in order to be compliant with the Queen Mary University of London (QMUL) policies.

1 10

Tech-with-Vidhya

Tech-with-Vidhya's starred repositories

rag-from-scratch

copilot-codespaces-vscode

Youtube-Tutorials

productionized_docker_ML_model_application_into_kubernetes_cluster_using_AWS_EKS_CloudFormation_EMR

productionized_docker_ML_model_application_into_AWS_EC2_Linux

Automated_ETL_Finance_Data_Pipeline_with_AWS_Lambda_Spark_Transformation_Job_Python

Bitcoin_Network_Analytics_using_Python_NetworkX_and_Gephi

ETL_Finance_Data_Pipeline_Python_AWS_CLI_S3_Glue_Athena

MLOps_AWS_LoadBalancing_Docker_Flask_Terraform_Banking_Customers_Churn_Prediction_Ensemble_Technique

AWS_ETL_NLP_Auto-Reply_Query_Handler_Using_Kafka_Spark_LSTM_Deep_Learning

AWS_SageMaker_Bank_Marketing_Predictions_using-XGBoost_Model

bank_credit_card_customers_segmentation_using_unsupervised_k_means_clustering_analysis

Building_ETL_Data_Pipeline_on-AWS_EMR_Cluster_Hive_Tables_Tableau_Visualisation

anomaly-detection-proximity-based-method-knn

apache-spark-rdd-computations-E2E-implementation-with-transformations-and-actions-gutenberg-data

AWS_SageMaker_TensorFlow_Keras_CNN_Model_Fashion_MNIST

bank_credit_card_transactions_fraud_detection_using_unsupervised_DBSCAN_clustering

Coursera-Deep-Learning-Specialization-2021

Coursera-Deep-Learning-Specialization-2023

Coursera-Machine-Learning-Specialization-2023

data-warehousing-OLAP-implementation-of-IBRD-balance-sheet-data

ETL_Stocks_Data_Pipeline_AWS_EMR_Cluster_Hive_Tables_Dynamic_Real-time_Tableau_Dashboard

MLOps_AWS_Docker_Gunicorn_Flask_NLP_LDA_Topic_Modeling_sklearn_Framework

MLOps_AWS_Kubernetes_LoadBalancing_Docker_Flask_Banking_Customers_Digital_Transformation_Classifier

MLOps_AWS_Lightsail_Docker_Flask_ARCH_GARCH_Time_Series_Modeling_Statistical_Framework

MLOps_AWS_Lightsail_Docker_Flask_Gaussian_Based_Time_Series_Modeling_Framework

MLOps_AWS_Lightsail_Docker_Flask_Multi-Linear_Regression_Time_Series_Modeling_sklearn_Framework

NLP_Multi-Class_Text_Classification_using_BERT_Model

NLP_Text_Classification_with_Transformers_RoBERTa_and_XLNet_Models

Technical_Assignment_VidhyalakshmiParthasarathy