通过简单的图像识别算法来完成验证码识别,打算把机器学习中的分类算法全部使用一遍
enhancer = ImageEnhance.Contrast(img) # 增加对比对
img = enhancer.enhance(2)
enhancer = ImageEnhance.Sharpness(img) # 锐化
img = enhancer.enhance(2)
enhancer = ImageEnhance.Brightness(img) # 增加亮度
img = enhancer.enhance(2)
# kNN algorithm
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize, 1)) - dataSet
sqDiffMat = diffMat ** 2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances ** 0.5
sortedDistIndicies = distances.argsort()
classCount = {}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]] # changed
classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
根据算法的性质,我们把问题设定成一个二分类问题:识别数字1和9(当然也可以是其他的任意两个数字)
- http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html?js=1#svm-toy-js
- http://www.pami.sjtu.edu.cn/people/gpliu/document/libsvm_src.pdf
- 爬取验证码
- 对图像做处理并切分
- 手工标注数据
- 导入训练集
- 使用测试集