WeebMogul / Kaggle-Notebooks

Tutorial on Diverse topics using Python and R from a wide range of Data Science Methodology. Learn different types of Supervised, Unsupervised, and other Machine Learning Algorithms.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Hello User, I am a Kaggle Notebook Master. You can learn to plot, make intelligent models and many more with my Notebooks. I have explained codes and work as well using Jupyter Markdown. All source code are available on GitHub as well as on Kaggle. Please use Linke provided below for Data.

This repo contains projects from wide variety of field including Machine Learning, Deep Learning, Business Intelligent , Big Data Analytics and Many more.

22. Complete Data Visualization Tutorial Seaborn

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. There are more than 100 plots are explained in this tutorial.

21. Inferential Statistics on Diabetes

Learn how to make inferences about population, We always work with sample of data, When we make inferences about population we should always consider standard estimated error. I will be finding mean and proportion of different variables with 95% confidence Interval in this Notebook.

20. Top Machine Learning Algorithms in R

Learn how to make machine learning models such as Linear Regression, Logistic Regresson, Tree Based models, Neural Network, Clustering Analysis, Association Rule and many more in R Programming Language.

19. Natural Language Processing with Python

Natural language processing (NLP) is about developing applications and services that are able to understand human languages. Some Practical examples of NLP are speech recognition for eg: google voice search, understanding what the content is about or sentiment analysis etc.

18. Predictive Modelling (KNN,ANN,XGBoost)

Predictive modeling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred.

17. Multivariate Statistical Analysis on Diabetes

Multivariate analysis is based on the principles of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time.

16. COVID19 India Report (EDA + Statistical Test)

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus.Most people who fall sick with COVID-19 will experience mild to moderate symptoms and recover without special treatment.

15. Univariate Statistical Analysis on Diabetes

Univariate analysis is perhaps the simplest form of statistical analysis. Like other forms of statistics, it can be inferential or descriptive. The key fact is that only one variable is involved. Univariate analysis can yield misleading results in cases in which multivariate analysis is more appropriate.

14. Learn Wide Deep Neural Network!!

Wide & Deep Neural Network is an interesting new model architecture for ranking & recommendation, developed by Google Research. It uses Logistic Regression & Deep Learning in a single model.

13. (99% Acc.) Inception V3 on Rock, Paper & Scissor

I will use PreTrained Model Inception Netowrk to train my model. Off Course because we need to go deeper :) Inceptionv3 is a convolutional neural network for assisting in image analysis and object detection, and got its start as a module for Googlenet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. Just as ImageNet can be thought of as a database of classified visual objects, Inception helps classification of objects in the world of computer vision. One such use is in life sciences, where it aids in the research of Leukemia. It was "codenamed 'Inception' after the film of the same name".

12. Time Series Descriptive Statistics and Tests

There are different forecasting models like ARMA, ARIMA, Seasonal ARIMA and others. Each model addresses a different type of time series. For this reason, in order to select an appropriate model we need to know something about the data.In this section we'll learn how to determine if a time series is stationary, if it's independent, and if two series demonstrate correlation and/or causality.

11. Extracting Dominant Color of an Image

Clustering is used in much real-world application, one such real-world example of clustering is extracting dominant colors from an image. Any image consists of pixels, each pixel represents a dot in an image. A pixel contains three values and each value ranges between 0 to 255, representing the amount of red, green and blue components. The combination of these forms an actual color of the pixel. To find the dominant colors, the concept of the k-means clustering is used. One important use of k-means clustering is to segment satellite images to identify surface features.

10. Facebook Prophet, RNN and EWMA on COVID19 IND

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. In this notebook i will explain time series analysis to forecast cofirmed cases and analye different aspect of COVID19 in INDIA.

9. End to End Machine Learning !!!

The California Housing Prices dataset from the StatLib repository.This dataset was based on data from the 1990 California cen‐ sus.

Here are the main steps you will go through:

Get the data.,Discover and visualize the data to gain insights,Prepare the data for Machine Learning algorithms,Select a model and train it,Fine-tune your model, Present your solution, Launch, monitor, and maintain your system.

8. Mall Customer Segmentation

This data contain informations about customers of a Mall.There is 200 Observations of 5 Variable. Name of Variables are:-'CustomerID' 'Gender' 'Age' 'Annual.Income..k..' 'Spending.Score..1.100.' DataTypes of Datas are Integere or Factor.

7. PostGraduate Admission Analysis

This dataset contains information or Criteria of Post Graduate Admissions from an Indian perspective. The dataset contains several parameters which are considered important during the application for Masters Programs.

6. Reshampling Credit Card Data

There is 284807 observation of 31 variable. Class is target variable where as others are predictor variable. Information given in data is sesitive so i think data has been preprocessed with technique such as PCA or Factor Analysis, So we need not to put extra effort on Data Cleaning and Wrangling. Out of 284807 only 492 observations are detected Fraud so this data is highly imbalanced we will use different sampling technique to increase accuracy.

5. Cardiovascular Disease Analysis

This data contain infromation related to factor responsible for Heart Attack.We need to analyse the trends in heart data to predict certain cardiovascular events or find any clear indications of heart health. We will build Logistic Regression Machine Learning Model to predict future event.

4. Speed Up Your Neural Network!!

If you need to tackle a very complex problem, such as detecting hundreds of types of objects in high-resolution images? You may need to train a much deeper DNN, perhaps with (say) 10 layers, each containing hundreds of neurons, connected by hundreds of thousands of connections. First, you would be faced with the tricky vanishing gradients problem (or the related exploding gradients problem) that affects deep neural networks and makes lower layers very hard to train. Second, with such a large network, training would be extremely slow. Third, a model with millions of parameters would severely risk overfitting the training set.

In this Notebook, I will go through each of these problems in turn and present techniques to solve them.

3. Machine Learning at Scale with PySpark

PySpark is a Python API for Spark released by the Apache Spark community to support Python with Spark. Using PySpark, one can easily integrate and work with RDDs in Python programming language too. There are numerous features that make PySpark such an amazing framework when it comes to working with huge datasets. Whether it is to perform computations on large datasets or to just analyze them, Data Engineers are switching to this tool.

2. Student Performance Correlation

A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.

1. Essential Numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

About

Tutorial on Diverse topics using Python and R from a wide range of Data Science Methodology. Learn different types of Supervised, Unsupervised, and other Machine Learning Algorithms.


Languages

Language:Jupyter Notebook 100.0%