goodpupil's repositories

Credit-Default-via-Deep-Learning

Using DNN/LSTM/1D-CNN to analyze a credit default problem

Language:Jupyter NotebookStargazers:4Issues:0Issues:0

SubstancePredication

# Packages installed 1. Anaconda (conda environment with python 3.6) 2. Keras (conda install -c -conda-forge keras) 3. SciKit-learn 4. Pandas 5. Matpotlib 6. NumPy 7. NLTK 8. Wordcloud # Approach I tried to implement 2 approaches, namely a linear neural network model (Sequential) (model 1) and Convolutional Neural Networks (CNN) (model 2), both using Keras libraries. I have commented the code wherever needed and explained the different strategies I tried during the course of this exercise. I prepared 3 datasets: 1. Just the sentence and label columns 2. Subject + Predicate + Object and label columns 3. Both of the above combined. I ran all three datasets and #3 performed better as compared to the other 2. There were key differences in data preparation for both models. In the case of the neural network, I tokenized and vectorized tokens using word2vec from Google News dataset(1x300 array per word) which has information regarding words being contextually relevant. Further, they were weighted by 'term frequency - inverse document frequency'(tf-idf), added together and divided by the count of words. Hence the 'mean' (so-to-speak) of all words of a sentence was calculated forming the word vector of that sentence. Extending this to all rows, the input was of the order: (number of input rows x vector dimensionality of word2vec) In terms of CNN, the input was also tokenized using the Keras library 'Tokenizer' this time and fit the sentences (row) iteratively. Instead of passing the 'mean' of the word vectors, I passed the vectors of a given sentence. I then zero-padded the vectors of sentences until the length was equal to that of the longest amongst the sentences in the input set. Hence the input was of the order: (number of input rows x vector size of the longest sentence) The embedding matrix was the word2vec mapping of all tokens in the input corpus and hence the order of this matrix was: (number of unique tokens x vector dimensionality of word2vec) In both cases, the train-test data split was 80%-20% respectively. # Results I plotted the history of accuracy and losses of the model predictions. Both models yielded around 60% (+/- 3%) accuracy with training and testing. There seems to be no overfitting/underfitting. Metrics for testing sets were as follows: ### Model 1 (Sequential) 1. Accuracy: 0.5833 2. Precision: 0.5548 3. Recall: 0.8571 4. F-score: 0.6736 ### Model 2 (CNN) 1. Accuracy: 0.6233 2. Precision: 0.6056 3. Recall: 0.7143 4. F-score: 0.6555 These results are not terrible but there's room for improvement with hyperparameter tuning and design tweaks for improved metrics. Also, increasing the data volume may result in better metrics. Transfer learning may also be a good option for data this small. # Limitations The metrics showing the model performance could be improved by using more data and tuning hyperparameters. One strategy I skipped is k-fold cross-validation. Implementing k-fold cross-validation may have high variability according to one study and alternatively, they suggest using a new technique called J-K-fold cross-validations to reduce variance during training (https://www.aclweb.org/anthology/C18-1252.pdf). Another strategy I skipped was performing a grid search to arrive at optimized hyperparameter values something that was done by Yoon Kim et.al.. Training deep learning classifiers with a small dataset may not be reliable. Transfer learning may be a better option. Other libraries like fastai (https://docs.fast.ai/) which is a wrapper over PyTorch could be an alternative. They implement sophisticated techniques like 'LR Finder' that helps users make informed decisions on choosing learning rates for optimizers (SGD, Adam or RAdam). They also implement transfer learning wherein an already trained classifier model (on a variety of corpora) is implemented which proves to be effective. They implement advanced Recurrent Neural Network (RNN) strategies. This could be explored in future work.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0

Alzheimers-DL-Network

A CNN-LSTM deep learning model for prognostic prediction and classification of Alzheimer's MRI neuroimages.

Stargazers:0Issues:0Issues:0

AutomaticWeightedLoss

Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning

License:Apache-2.0Stargazers:0Issues:0Issues:0

CNN-binaryClassification-UCI-datset

UCI dataset을 활용해 CNN모델구축후 K-fold 기법으로 평가

License:UnlicenseStargazers:0Issues:0Issues:0

cnn-lstm

CNN LSTM architecture implemented in Pytorch for Video Classification

License:MITStargazers:0Issues:0Issues:0

CNN_system

Keywords: CNN, Fully connect neural network, SFEW dataset, Image Preprocessing, Data Augmentation, Leakey ReLU, k-fold cross validation, Casper. In this project, I build my own CNN system with Image Preprocessing and Data Augmentation which are based on the computation ability and characteristic of used Dataset. This project implemented with Pytorch.

Stargazers:0Issues:0Issues:0

Deep-Learning-with-TensorFlow-book

深度学习入门开源书,基于TensorFlow 2.0案例实战。Open source Deep Learning book, based on TensorFlow 2.0 framework.

Stargazers:0Issues:0Issues:0

DeepLearningPractice

about deep learning projects

Stargazers:0Issues:0Issues:0

DeepLearningTutorial

Talk is cheap,show me the code ! Deep Learning,Leaning deep,Have fun!

License:MITStargazers:0Issues:0Issues:0

DrumClassifer-CNN-LSTM

Classifies percussion audio samples with a CNN-LSTM, written in python and pytorch. Also exports to Drumkv1 (lv2 plugin)

Stargazers:0Issues:0Issues:0

examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

HAR_Pytorch

Human Activity Recognition using pytorch CNN & LSTM

Stargazers:0Issues:0Issues:0

image-captioning

Used deep learning to train a CNN + RNN/LSTM on the MS-COCO dataset to automatically generate captions.

Stargazers:0Issues:0Issues:0

imgaug

Image augmentation for machine learning experiments.

License:MITStargazers:0Issues:0Issues:0

PhyCNN

Physics-guided Convolutional Neural Network

License:MITStargazers:0Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License:NOASSERTIONStargazers:0Issues:0Issues:0

pytorch-cifar100

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet)

Stargazers:0Issues:0Issues:0

PyTorch-Networks

Pytorch implementation of cnn network

Stargazers:0Issues:0Issues:0

PyTorch-Tutorial-1

Build your neural network easy and fast

License:MITStargazers:0Issues:0Issues:0

pytorch_geometric

Geometric Deep Learning Extension Library for PyTorch

License:MITStargazers:0Issues:0Issues:0

pytorch_resnet_cifar10

Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.

License:BSD-2-ClauseStargazers:0Issues:0Issues:0

ResNeXt.pytorch

Reproduces ResNet-V3 with pytorch

License:MITStargazers:0Issues:0Issues:0

skorch

A scikit-learn compatible neural network library that wraps PyTorch

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

License:MITStargazers:0Issues:0Issues:0

ssqueezepy

Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python

License:MITStargazers:0Issues:0Issues:0

SST

Understanding Synchrosqueezing Transform

Stargazers:0Issues:0Issues:0

Statistical-Learning-Method_Code

手写实现李航《统计学习方法》书中全部算法

Stargazers:0Issues:0Issues:0

Two-stream-CNN-for-rolling-bear-fault-diagnosis

Based on the dual-flow CNN, a new bearing fault diagnosis model is proposed. The model is composed of 2D-CNN and 1D-CNN. Among them, 2D-CNN takes wavelet time-frequency map as input, and 1D-CNN takes original vibration signal as input. After the feature extraction is implemented by the convolutional layer and the pooling layer, the output of the pooling layer of the two is spliced using a fully connected layer, and then the fault classification is achieved through the fully connected layer

Stargazers:0Issues:0Issues:0

vision

Datasets, Transforms and Models specific to Computer Vision

License:BSD-3-ClauseStargazers:0Issues:0Issues:0