mrnapz / Captcha-Breaking

Home Page:https://bisariautkarsh.github.io/Captcha-Breaking/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Captcha Breaking using Deep Learning

Introduction

In this project a CNN model is trained to identify the letters in captcha and hence break the captcha. The problem in hand is training data had 4-letter CAPTCHAs using a random mix of four different fonts and characters "O" and "I" are not used to avoid confusion with "0" and "1". That leaves a total of 32 possible letters and numbers that we need to recognize.

Toolset

  • Python3
  • OpenCV
  • Jupyter Notebook
  • Tensorflow
  • Keras

Dataset

We had access to 9955 captcha images each of which has 4 characters which can be any alphanumeric character other than "O", "I","0","1".

Creating Dataset

Captcha Images are made of 4 letters so we try to seperate the 4 characters in image using opencv. Using OpenCV we figure out the contours( collection of pixels) and mark the border around each character. In some cases there were overlapping letters so model may identify two characters as one for that simply compare the width to height ratio which is if unusally large then divide the contour into two equal halves. So basically captcha is broken down to 4 fragments each of which is saved to the designated folder

Building and the Training Model

Use the extracted images to create a list of data and labels and the create a sequential model having two convolution layers along with max pooling and two fully connected layers and train the model. In 10 epochs of training, almost 99% accuracy is achieved.

Prediction using Trained Model

  • Break up the CAPTCHA image into four separate letter images using the same approach used to create the training dataset.
  • Ask neural network to make a separate prediction for each letter image.
  • Use the four predicted letters as the answer to the CAPTCHA.

Pipeline

Visualization of above explained strategy

Problem at hand

insample_acc

                                                                                                           

Simplifying the Problem

insample_acc

                                                                                                           

Captcha Image in Dataset

insample_acc

                                                                                                           

Marked Contours by making border around each character in Image

insample_acc

                                                                                                           

Overlapping Character Example

insample_acc

                                                                                                           

Recognition of two characters as one due to overlapping

insample_acc

                                                                                                           

Resolved Overlapping Problem

insample_acc

                                                                                                           

Character saved in designated folder with folder name same as recognized character

insample_acc

                                                                                                           

Model

insample_acc

                                                                                                           

Predictions Using Trained Model

insample_acc

About

https://bisariautkarsh.github.io/Captcha-Breaking/


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%