shadimsaleh / udemy_PythonOpenCVDL

Course Study Repo for Python for Computer Vision with OpenCV and Deep Learning Course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Udemy Course: Python for Computer Vision with OpenCV and Deep Learning

Section 1 - Course Overview and Introduction

Lecture 3 - Course Curriculum Overview

  • Goals of this COurse
    • Understand Computer Vision Apps
    • Understand how to use OpenCV and Python work with Images and Vectors
    • Be able to apply these skills in our projects
  • Numpy and Image basics
    • Quick section on NumPy basics and how to manipulate images with it
  • Image Basics with OpenCV
    • Begin to work with the OpenCV library with images
    • Basic commands and drawings on Images
  • Image Processing with OpenCV
    • understand more advanced OpenCV operations that are useful in real world apps
  • Video Processing with OpenCV
    • understand the basics of working with video files and streaming webcam video with OpenCV library
  • Object Detection
    • Learn the various different methods of detecting objects in images and videos
    • Start with basic template matching and work our way up to face detection
  • Object Tracking
    • expand from our knowledge of object detection to tracking objects in videos
  • Deep Learning with Computer Vision
    • begin to combine knowledge from prev section with latest tools in Keras and Tensorflow for state of the art deep learning apps

Lecture 4 - Getting Set-Up for the Course Content

  • We need to download and install Anaconda
  • Create a Virtual Environment (different files depending on OS)
  • open JupyterLab
  • Work with both notebooks and .py scripts in jupyterlab
  • we have a plotly conda env that might serve us (has tensorflow) we use it with source activate plotly
  • we lanunch navigator with anaconda-navigator we also update all with conda update --all
  • We will list the installation workflow to see if there is sthing new
  • get anaconda from 'https://www.anaconda.com/download/' get for python3.7 x64
  • when install if conda is already installed choose to add the PATH
  • the tutor gives a .yml file that will create and activate avirtual env for us. so we should do it and maybe delete our own. his has all the necessary libs
  • i will delete my plotly env as the one from course is huge and overlaps it
  • i update conda conda update -n root conda
  • i remove plotly conda env remove -n plotly probably i can remove ztdl also...
  • we cd to our course folder and create the env from the yaml conda env create -f cvcourse_linux_new.yml
  • to activate env source activate python-cvcourse to deactivate source deactivate
  • to start jupyter-lab we run jupyter-lab and it runs as a webAPP at 'http://localhost:8888/lab'
  • we create a notebook and a textfile as py. we run the python script int erminal with python3 myCode/test.py

Section 2- NumPy and Image Basics

Lecture 5 - Introduction to Numpy and Image Section

  • Section Goals
    • Understand how to work with basics in NumPy
    • understand how to create arrays
    • slice and index elements from arrays
    • open and display images with numpy

Lecture 6 - NumPy Arrays

  • we import numpy as np
  • we define a list mylist = [1,2,3]
  • we cast it to an array myarray = np.array(mylist)
  • for docs shift+tab
  • we can generate an evenly spaced array with np.arange(0,10) if we want to add a step size of 2 np.arange(0,10,2)
  • to create multidimensional array we have many options
  • to create a 2d 585 matrix of 0s np.zeros(shape=(5,5)) it is rowsxcols.
  • for 1s we use np.ones() we can omis shapes np.ones((2,4)) is the same
  • to create random numbers i first have to seed the rng. np.random.seed(101) to seed with 101
  • after i seed i can generate random ints and feed them in an array arr = np.random.randint(0.100,10) makes an array size 10 with random insts between 0 and 99
  • seed is of paramount importance as it leads to generating the same random nums
  • to find the max num in an array arr.max() to get the location (index) of the max arr.argmax() same holds for min
  • to get the average val of an array arr.mean()
  • to reshape arrays: i can get the shape of an array with arr.shape for arr its (10,). if i do arr.reshape(2,5) i get the array in 2x5 shape... total number of elements must be equal or i get an error
  • i make a 10x10 ordered array mat=np.arange(0,100).reshape(10,10) to get an element by index
row=0
col=1
mat[row,col]
  • this is called indexing. to gt multiple eleemnts by index its called slicing mat[0:row,0:col] start(incl):end(excl):step for everything [:]
  • to slice a column mat[:,1].reshape(10,1) to slice a row mat[0,:]
  • to grab a submatrix mat[:3,:3]
  • to copy an array mynewmat = mat.copy()

Lecture 7 - What is an Image?

  • each image can be represented as an array
  • in grayscale images the color is represented as a float between 0 and 1 (white = 0 black=1)
  • often default images have vals between 0 and 255 8bit resolution
  • we can always divide the integer by max val to normalize between 0-1
  • what about color images? colorimages can be represented as a comination of Red,Green Blue (additive dcolor mix)
  • RGB allows to produce a range of colours (color triangle)
  • later in course we will learn about alternative colour mappings that can be applied to images
  • each color channel has intensity val 0-255
  • when we read a color image with computers or python. the image has 3 dimansions and is a 3d matrix of size (W,H,3) e.g (1280,720,3) 1280 pixel width, 720 pixels height, 3 color channels
  • computer does not know about colours. only intensity vals. much like greyscale
  • the user has to dictate which channel is for which color.
  • each channel alone is essentially a grayscale image

Lecture 8 - Images and NumPy

  • we import numpy
  • we install matplotlib conda install matplotlib
  • we import pyplot and make it inline
import matplotlib.pyplot as plt
%matplotlib inline
  • we install pillow conda install -c anaconda pillow
  • we import Image pillow lib from PIL import Image
  • Image function allows us to open up images and transform them in an array
  • we use it to open an image pic = Image.open('../DATA/00-puppy.jpg')
  • if i run pic in jupyter i see the pic
  • type(pic) gives 'PIL.JpegImagePlugin.JpegImageFile' . munpy cant process it. to convert it to an array i use pic_arr = np.asarray(pic)
  • pic_arr.shape gives (1300, 1950, 3). i can show the image from the array plt.imshow(pic_arr)
  • i can show first channel as grayscale plt.imshow(pic_arr[:,:,0],cmap='gray')
pic_red = pic_arr.copy()
pic_red[:,:,0]
  • the red channel by default has a viridis colormap becaus the vals are 0-255. we have to normalize by dividing by 255 to have 0-1 scale (greyscale) colorscales
  • lighter color in grayscale is closer to 255 (or 1.) so higher color contribution in pixel
  • i will zero out green and blue channel channel pic_red[:,:,1:] =0 and show it plt.imshow(pic_red)
  • pic red has still 3 channels they are just zeroed out

Lecture 9 - Numpy and Image Assesment Test

  • we do the test
  • fill an empty array with vals
arr=np.empty(shape=(5,5))
arr.fill(10)
  • or
arr,np.ones((5,5))
arr*10

Section 3 - Image basics with OpenCV

Lecture 11 - Introduction to images and OpenCV Basics

  • we will learn how to use OpenCV lib
  • how to open images and draw on them
  • OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision
  • Created by Intel at 1999, is written in C++. Here we will use its Python bindings
  • It contains many popular algorithms for computer vision, including object detection and tracking algorithms
  • Section Goals
    • open inage files with OpenCV in a notebook and in py script
    • Draw simple geometries on images
    • Direclty interact with an image through callbacks

Lecture 12 - Opening Image Files in a notebook

  • we asaw how to use the PIL(PythonImagingLibrary) to open images and trasform them to arrays with numpy and use matplotlib to display the array as image
  • we will use OpenCV + Matplotlib to open and display an image as array
  • we do the usual imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
  • we import openCV lib import cv2
  • we open the image with cv img = cv2.imread('../DATA/00-puppy.jpg') the type is type(img) is numpy.ndarray
  • if i open wrong path i get no error but he type is NoneType
  • the img.shape is (1300,1950,3) so 3 channels
  • if i plt.imshow(img) the image is of diff color because in OpenCV color channels have differt order
  • Matplotlib expects RED, GREEN, BLUE but openCV encodes them BLUE,GREEN,RED
  • we need to fix the order before displaying with matplotlib. cv can do that using the cvtColor function cv2.cvtColor(img,cv2.COLOR_BGR2RGB) cv2 has a lot of colorplane transformations available
  • i can avoid the post transofrmation . i can apply it when i read the image with opencv e.g to show it as grayscale img_gray = cv2.imread('../DATA/00-puppy.jpg',cv2.IMREAD_GRAYSCALE) if we plot it we see the viridis cmap as the vals are integers (even if i normalize them its stil viridis) to solve it i change cmap plt.imshow(img_gray,cmap='gray')
  • to resize images we can use openCV new_img = cv2.resize(fixed_img,(1000,400)) th enumbers i enter are (COl,ROW) or (WIDTH,HEIGHT) if i dont keep aspect ratio the image is transformed. the arguments are swapped in comparizon with numpy order
  • cv allows resizing keeping the aspect ratio
w_ratio = 0.1
h_ratio = 0.1
new2_img = cv2.resize(fixed_img,(0,0),fixed_img,w_ratio,h_ratio)
  • to flip images fl_img = cv2.flip(fixed_img,0) along the horizontal axis, use 1 to flipalong the horizontal. use -1 to combine both flips
  • to write an image (nupy array) maybe a generated one to anew file i use cv2.imwrite('NEW FILEPATH',fl_img) the filetype code i use determines the filetype. beware that as OpenCV does the save it saves them in BGR order
  • to play with canopy space in notebook to display larger images we do matplotlib scripting
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.imshow(fix_img)

Lecture 13 - Opening Image files with OpenCV

  • we will open images with OpenCV using python scripts.
  • in this lecture we will use OpenCV to display images in their own separate window outside of jupyter
  • for more complex video and image analysis, we will need to display outside of jupyter
  • while we often will just use plt.imshow() to display images inside of a notebook. sometimes we want to use OpenCV on its own to display images in their own window
  • Often Jupyter (being browser based) interferes with closing teh window
  • Many times JupyterLab can display a new window with no issues, but the kernel crashes when the OpenCV window is closed
  • To fix this issue if running OpenCV from notebook. restart the kernel
  • this is an issue of MAcOS and linux
  • it better to run the code in a .py script if the issue makes our work difficult.
  • we ll see how to open and display images direclty with OpenCV (no matplotlib) in the notebook and in a script
  • we start with a notebook
  • we do some fixes first to run opencv window in lunux
Being in the course's conda env source activate python-cvcourse

conda remove opencv
conda remove py-opencv
conda update conda
conda upgrade pip
conda install jupyter # dont think it matters but followed instructors advice
then use pip to install opencv pip install opencv-contrib-python (i tried this version as it contains additional libs, I suppose pip install opencv-python will also do the trick)
  • then we use imshow from opencv to invoke the window
import cv2
img = cv2.imread('../DATA/00-puppy.jpg')
cv2.imshow('Puppy',img)
cv2.waitKey()
  • the image is large and we cannot resize it as opencv displays on same pixel dimensions. so depending on teh screen analysis this might cause issues
  • we write a python script to do the same job
import cv2
img = cv2.imread('../DATA/00-puppy.jpg')
while True:
    cv2.imshow('Puppy',img)
    if cv2.waitKey(1) & 0xFF == 27:
        break
cv2.destroyAllWindows()

we puth th imshow in a while loop to be able to brake on keystroke we use the cryptic cv2.waitKey(1) & 0xFF == 27: that means IF we ve waited atleast 1ms and weve pressed the ESC key

  • instead of 27 (ESC) we can use ord('q') to quit with 'q'

Lecture 14 - Drawing on Images - Part One - Basic Shapes

  • we make a new notebook and do the basic imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
  • we create a blank image (all zeroes) in numpy bblank_img = np.zeros(shape=(512,512,3),dtype=np.int16) we spec the datatype to be int16
  • we imshow it and its pureblack
  • we will use opencv to draw on the image a rectangle cv2.rectangle(blank_img,pt1=(384,0),pt2=(510,150),color=(0,255,0),thickness=10) the code draws a rectangle specing 2 vertexes with two opposite edge points. also we spec the color and the thickness . the definition of the points is in OpenCV style W,H. the method alters the passed image so if we replot it with imshow we see the overlay rect
  • outline starts at the specked points so in our example it goes out of bounds
  • if i run multiple times the method it overlays multiple rects
  • the same holds for squares
  • for circles cv2.circle(img=blank_img,center=(100,100),radius=50,color=(255,0,0),thickness=8)
  • to fill a shape witht he color we set thickness to -1
  • to draw a line cv2.line(blank_img,pt1=(0,0),pt2=(512,512),color=(0,255,255),thickness=5)

Lecture 15 - Drawing on Images - Part 2 - Texts and Polygons

  • to write text
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(blank_img,text="Hello",org=(10,500),fontFace=font,fontScale=4,color=(255,255,255),thickness=3,lineType=cv2.LINE_AA)
  • we selecta cv2 font, the bottom left corner, the size, color,thickness and linetype
  • to draw a polygon with opencv we have first to decide on the vertices alist of pairs as nested arrays with x,y coordinates. the dtype has to be the same as the image vertices = np.array([[100,300], [200,200], [400,300], [200,400]],dtype=np.int32)
  • the shape of the vertices is (4,2) so 2D. opencv wants it 3D
  • the conversion we do is pts = vertices.reshape((-1,1,2)) and the pts.shape is (4,1,2)
  • we do this for the color channels
  • to draw the polyline cv2.polylines(blank_img,[pts],isClosed=True,color=(255,0,0),thickness=5) we pass the points as array, also we spec if we want to close the polyline

Lecture 16 - Direct Drawing on an Image with a Mouse - Part One

  • we can use CallBacks to connect Images to event functions with OpenCV
  • this allows us to directly interact with images (and later on videos)
  • In this 2 part lecture we will cover
    • Conecting Callback Functions
    • Adding Functionality through Event Choices
    • Dragging the Mouse for Functionality
  • we will run them as python script
  • we import libs cv2 and numpy
  • we create a blank image img = np.zeros((512,512,3),np.int8)
  • int8 results in grayish color
  • we add the while loop
while True:
    cv2.imshow('Blank',img)
    if cv2.waitKey(20) & 0xFF == 27:
        break
cv2.destroyAllWindows()
  • we define a callbakc function
def draw_circle(event,x,y,flags,param):
    pass
  • and we connect it witha Mouseevent to the image
cv2.namedWindow(winname='Blank')
cv2.setMouseCallback('Blank',draw_circle)

  • the conenction is done on the imShow name (window name)
  • the params passed in the callback have to do with the event
    • x,y is the position
    • event contains the type of event
  • i mod the callback to draw a circle of specific size and color centered at the position i click using the event 'cv2.EVENT_LBUTTONDOWN'
def draw_circle(event,x,y,flags,param):
    if event == cv2.EVENT_LBUTTONDOWN:
        cv2.circle(img,(x,y),100,(0,255,0),-1)
  • we remove np.int8 to solve the grayish look

Lecture 17 - Direct Drawing on an Image with a Mouse - Part Two

  • we mod the callback adding an elif to listen to a RButton down event drawing a circleof another color (red) beware that it is BLUE,RED,GREEN
    elif event == cv2.EVENT_RBUTTONDOWN:
        cv2.circle(img,(x,y),100,(0,0,255),-1)

Lecture 18 - Direct Drawing on an Image with a Mouse - Part Three

  • we will build the rectangle as we grag the mouse with button down on a blank img and set it on button rais
  • we cp the prev script (show image + function boilerplate)
  • the callbak becomes
################
## VARIABLES ###
################

# true while mouse button down false while up
drawing = False
# starting point of rect temp vals
ix,iy = -1,-1

###############
## FUNCTION ###
###############

def draw_rectangle(event,x,y,flags,param):
    global ix,iy,drawing
    if event == cv2.EVENT_LBUTTONDOWN:
        drawing=True
        ix,iy = x,y
    elif event == cv2.EVENT_MOUSEMOVE:
        if drawing == True:
            cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
    elif event == cv2.EVENT_LBUTTONUP:
        drawing=False
        cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
  • we make use of global vars in the callback so we define them as such to be able to alter them

Lecture 19 - Image basics Assessment

  • to fill polyline we can use a beta function cv2.fillPoly(fix_img,[pts],(0,0,250))

Section 4 - Image Processing

Lecture 21 - Introduction to Image Processing

  • Section Goals
    • Learn various image processing operations
    • Perform image operations such as Smoothing,Blurring,Morphological Operations
    • Grab properties such as color spaces and histograms

Lecture 22 - Color Mappings

  • so far we ve only worked with RGB color spaces, in RGB coding, colors are modeled as a combination or Red,Green and Blue
  • in the 1970s HSL (hue,saturation,lightness) and HSV (hue,saturation,value) were developed as alternative color models
  • HSV and HSL are more closely aligned with the way human vision actually perceives color
  • while in the course we will deal more with RGB images, it goog to know how to convert to HSL and HSV colorspaces
  • RGB colorspace represents a color as a combo of R G and B (color cube)
  • HSL is perceived as cylinder (hue is the angle, saturation is the distance from center, lightness the height)
    • H= actual color, Saturation=Intensity of color, Lightness=How dark it is
    • bottom pure black, top pure wight, center line = grayscale
  • HSV is represented as cylinder. instead of lightness we have value (black->full color)
    • top center = white
  • this lecture will be a quick review on using the cvtColor func to change colorspaces
  • we wont have to deal with HSL or HSV based color images for the rest of the course
  • we use a notebook to display an image at default cv2 BGR colorspace
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
img = cv2.imread('../DATA/00-puppy.jpg')
plt.imshow(img)
  • we fix it converting to RGB
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
plt.imshow(img)
  • to convert it at HSV img = cv2.cvtColor(img,cv2.COLOR_RGB2HSV) the image is strange because color channels have RGB vals and are displayed as HSV. we can use cv2.COLOR_RGB2HLS for HLS

Lecture 23 - Blending and pasting Images

  • often we will work with multiple images
  • OpenCV has many programmatic methods of blending images together and pasting images on top of each other
  • Blending Images is done through the addWeighted function that uses both images and combines them
  • To blend images we use a simple formula:
    • new_pixel=α*pixel_1+β*pixel_2+Îł
  • so it adds weights to each contributing image pixel and adds a bias
  • when images are not same size we have to do masking
  • we read 2 images and fix the color order for display
img1 = cv2.imread('../DATA/dog_backpack.jpg')
img1 = cv2.cvtColor(img1,cv2.COLOR_BGR2RGB)
img2 = cv2.imread('../DATA/watermark_no_copy.png')
img2 = cv2.cvtColor(img2,cv2.COLOR_BGR2RGB)
  • we import matplotlib and show them. they are not the same shape
  • well resize them both to make them same size
img1 = cv2.resize(img1,(1200,1200))
img2 = cv2.resize(img2,(1200,1200))
  • we use the addWeighted function to blend them blended = cv2.addWeighted(src1=img1,alpha=1,src2=img2,beta=0.3,gamma=0.5) its src1, alpha,src2,beta,gamma
  • if we blend different size images we get an error
  • we will overlay a small image on top of a larget image without blending.
  • its a simple numpy reassignemnt where the vals of the larger image will be reassigned to equal the vals of the smaller image on the overlayed space
  • we resize img2 to be smaller img2 = cv2.resize(img2,(600,600))
  • we rename images large_img = img1 and small_img = img2
  • overlay is pure numpyarray math
x_offset = 0
y_offset = 0
x_end = x_offset + small_img.shape[1]
y_end = y_offset + small_img.shape[0]
large_img[y_offset:y_end,x_offset:x_end] = small_img

Lecture 24 - Blending and Pasting Images Part Two - Masks

  • we ve seen how to overlay images on top of each other by simply replacing values of the larger images with vals of the smaller image for the desired RegionOfInterest
  • what if we only want to blend or replace part of the image?
  • what if we want to mask part of the smaller image. say replace only the area in the outline of a logo
  • this needs 3 steps. start with img1 => build a mask(the mask will let only certain pixels of img1 filter through) => paste the masked pixels on img2
  • lets explore the sysntax of these steps (check links in lecture notebook for other use cases)
  • we start witht he same 2 images (read and fix and resize logo img)
  • we decide where on the base img(img1) we want to blend in the img2 shape (create a ROI). we ll place it in bottom right (numpy array math ahead)
x_offset = img1.shape[1]-img2.shape[1]
y_offset = img1.shape[0]-img2.shape[0]
  • what these vals represent is the topleft corner of ROI coorsinates
  • i use tuple unpacking to get img2 dimensions rows,cols,channels = img2.shape
  • i grab the ROI roi = img1[y_offset:img1.shape[0],x_offset:img1.shape[1]]
  • i now want to create the mask
    • i get a grayscale version of the image img2gray = cv2.cvtColor(img2,cv2.COLOR_RGB2GRAY) its in viridis cmap but is 1 channel
    • i need to inverse the image because i want with black(0 val) the part to be excluded. we use cv2.bitwise_not (bitwise inversion) for this mask_inv= cv2.bitwise_not(img2gray)
  • mask_inv.shape shows is 1 channel i need to add the other channels (with numpy)
  • we create a white 3channel background for the size of img2 (mask) white_background = np.full(img2.shape,255,dtype=np.uint8) full numpy method fills a speced sized array (shape) with the number we spec (255). as it fills 255 in all cahnnels its white. also we spec dtype=np.uint8 to match the mask dtype
  • to create the actual mask we use cv2.bitwise_or (bitwise disjunction per element) bk = cv2.bitwise_or(white_background,white_background,mask=mask_inv) the result has 3 channels but is essentialy the mask. it applies the mask in all channels. we could just cp the 1 channel in others with numpy
  • we now want to apply the original im2 (red) on the mask to cut the logo out and create the foreground (we use again bitwise_or) fg = cv2.bitwise_or(img2,img2,mask=mask_inv)
  • we get the mask overlayed on the roi with biwise_or (not masked) final_roi = cv2.bitwise_or(roi,fg)
  • we use overlay to overlay roi on the original large image (like we dit before with numpymath)
large_img = img1
small_img = final_roi
x_end = x_offset + cols
y_end = y_offset + rows
large_img[y_offset:y_end,x_offset:x_end] = small_img

Lecture 25 - Image Thresholding

  • In some CV applications it is often necessary to convert color images to grayscale, since only edges and up being important
  • Similarly, some apps only require a binary image showing only general shapes
  • Thresholding is fundamentally a very simple method of segmenting an image into different parts
  • THresholding will convert an image to consist of only two values, white or black
  • what we actually do is convert a color image to binary (3channels -> 1channel, unit8 => binary)
  • We ll dive in syntax anoptions for thresholding with OpenCV
  • we do usual imports and read an image of a rainbow img = cv2.imread('../DATA/rainbow.jpg')
  • we ll see some thresholding options
  • read in a color image as grayscale img = cv2.imread('../DATA/rainbow.jpg',0) simply pass a 0
  • use cv2.thresholding passing options thresh and maxval and type of threshold. so any val <thresh is converted to 0 each val >thresh to maxval. usually we use the halfway point. foa grayscale image th typical is ret,thresh1 = cv2.threshold(img,t27,255,cv2.THRESH_BINARY) ret is the cutoff value and thresh1 is thresh1 is the actual image thresholded
  • we can play with threshold types like THRESH_BINARY_INV (inverse) THRESH_TRUNC (if val is over threshold it replaces it with threshold, if its lower it keeps the original val) THRESH_TOZERO (keep original if >thresh otherwise 0) (see OpenCV docs for more)
  • we will do a real world example reading in a crossword page image img = cv2.imread('../DATA/crossword.jpg',0)
  • we set afunction to display pyplot larger and use it insteat of plt.imshow
def show_pic(img):
    fig = plt.figure(figsize=(15,15))
    ax = fig.add_subplot(111)
    ax.imshow(img,cmap='gray')
  • we see in the image that apart from black letters there is gray noise. we wold like to say: if there is ink its black if not white. we ll play with binary threshold
  • we do simple binary in middle ret,th1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY). the result is not perfect as we loose quality. we can play with types or level. level is not very helpful. THRESH_OTSU and TRIANGLE do a good job
  • a better approach is the adaptive trheshold as it auto adapts the threshold based on pixel and neighboring pixels th2 = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,11,8) its params:
    • srcimage
    • maxval
    • adaptive threshold type (algorithm) GAUSSIAN or MEAN
    • threshold type (the actual threshold)
    • neghbour size for adaptibe threh algo (texel) only odd nums
    • the C val to be subtr. from sum (see docs)
  • we usually play with block size and c val (2 last params)
  • we can now start apply multiple methods like blending adaptive thresholded image with binary thresholded to see the result blended = cv2.addWeighted(th1,0.5,th2,0.5,0)

Lecture 26 - Blurring and Smoothing

  • a common operation for image proc is blurring and smoothing an image
  • smoothing an image can help get rid of noise, and help the app focus on general details
  • there are mnay methods for blurring and smoothing
  • often blurring and smoothing is combined with edge detection
  • edge detection algos show many edges when shown a high res image with no blurring
  • edge detection after blurring gives better results
  • Blurring Methods we ll explore:
    • Gamma Correction: gamma correction can be applied to an image to make it appear brighter or darker depending on the Gamma value chosen
    • Kernel Based Filters: Kernels can be applied over an image to produce a variety of effects. To understand what is check Interactive visualization the examples apply a 3x3 kernel of predef vals dependign the effect we want rolling over the image to apply the effect. we see the original pixels as matrix multiplied with filter kernel vals and the resulting pixel. also in borders ixels are unknown

Lecture 27 - Blurring and Smoothing - Part Two

  • Tutorial
  • we open a notebook and do the normal imports
  • we add convenience read img func
def load_img(name):
    img = cv2.imread(name).astype(np.float32)/255
    img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
    return img
  • we read an img img = load_img('../DATA/bricks.jpg')
  • the image is in float forma of 0-1 scale
  • we add a display image helper func
def display_img(img):
    fig = plt.figure(figsize=(12,10))
    ax = fig.add_subplot(111)
    ax.imshow(img)
  • we start with gamma correction. we define a gamma <1 and raise the image numpy array to the power of gamma. effectively making the image faded. if gamma >1 the image is more intense(darker)
gamma = 1/4
result = np.power(img,gamma)
  • we do our first blurring. a low pass filter witha 2d convolution
  • we will write on the image to show the effect
img = load_img()
font = cv2.FONT_HERSHEY_COMPLEX
cv2.putText(img,text='bricks',org=(10,600),fontFace=font,fontScale=10,color=(255,0,0),thickness=4)
display_img(img)
  • this font is conventient becaus the contour of letters are clear lines. the space betweenlines in letters will be affected by blurring or smoothing
  • we setup the kernel for the filter our kernel is 5x5 of 1/25 (0.04) val kernel= np.ones(shape=(5,5),dtype=np.float32)/25
  • we apply a 2d filter on it with cv2.filter2D. dst = cv2.filter2D(img,-1,kernel) we pass in
    • input image
    • desired depth (ddepth) of the destination image. if using -1 its the same as input image
    • the kernel
  • the result is a blurred image. lines ar emore thick in letters and detail is lost from wall. lines flood
  • we reset the image (w/letters) to try a new method (Smooth image with Averaging). its the same as we did before with our custom kernel.. its just we use cv2 built in cv2.blur() method passing the kernel dimensions. the val is the 1/elements in kernel. so it does averaging. just what we did manually before blurred = cv2.blur(img, ksize=(5,5))
  • increasing kernel size makes the effect more intense
  • we will apply Gaussian and Median Blurring (not Averaging)
  • Gaussian Blur: blurred_img = cv2.GaussianBlur(img,(5,5),10) (src,ksize,sigmavalue)
  • Median Blur: blurred_img = cv2.medianBlur(img,5) (src,ksizedim) it takes an int as ksize as the kernel is square. this blur is different as lines dont flood as before (it does remove noise keeping details)
  • we do a real example. we imread sammy.jpg (tutors dog) and display it after color correcting
  • we imread a noisy version of the picture sammy_noise.jps which is color corrected
  • we ll try to fix the noise with median_blur fixed_img = cv2.medianBlur(noise_img,5) it works wonders
  • we will do bilateral filtering to the brick image 'cv2.bilateralFilter' params: src,d,sigmaColor,sigmaSpace blur = cv2.bilateralFilter(img,9,75,75) it blurs keeping edges

Lecture 28 - Morphological Operators

  • Math
  • Morph Operators
  • Morphological Operators are sets of Kernels that can achieve a variety of effects such as reducing noise
  • Certain operators are very good at reducting black points on a white background (or vice versa)
  • Certain operators can also achieve an erosion and dilation effect that can add or erode from an existing image
  • this effect is most easily seen on text data, so we will practice various morphological operators on some simple white text on a black background
  • we create anotebook and do the normal imports
  • we add a helper func to add white text on blabk background
def load_img():
    blank_img=np.zeros((600,600))
    font = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(blank_img,text='ABCDE',org=(50,300),fontFace=font,fontScale=5,color=(255,255,255),thickness=25)
    return blank_img
  • we add a func to deiplay the img
def display_img(img):
    fig = plt.figure(figsize=(12,10))
    ax = fig.add_subplot(111)
    ax.imshow(img,cmap='gray')
  • we start with errosion (it errodes boundaries of foreround objects) (detect edges and erode boundary)

    • we define a 5,5 ones kernel kernel=np.ones((5,5),dtype=np.uint8)
    • we apply 'cv2.erode' specing src,kernel,iterations result = cv2.erode(img,kernel,iterations=1) the more iterations the more eroded the boundary of the letter (finer) over 5 we lose the letters
  • we follow with opening (erosion followed with dilation). opening removes background noise

    • we add some binary whitenoise as overlay to original image
     img = load_img()
     white_noise = np.random.randint(low=0,high=2,size=(600,600))
     white_noise = white_noise*255
     noise img = white_noise + img
     display_img(noise_img)
    
    • we do opening using the cv2.morphologicalEx opening = cv2.morphologyEx(noise_img,cv2.MORPH_OPEN,kernel) using the same 5x5 kernel we use for erode. the result is a noise image. boundary is not 100% perfect but is very very good
  • sometimes we have foreground noise. we use then Closing to clean

    • we create black noise on the image (is like white noise but reverse as we multiply by -255). it wonth affect the black background but will affect the white foreground
    • black noise subtyracts 255 to random pixels making it darker, white noise adds 255 to random pixels
     img = load_img()
     black_noise = np.random.randint(low=0,high=2,size=(600,600))
     black_noise = black_noise * -255
     black_noise_img[black_noise_img== -255] = 0
     display_img(black_noise_img)
    
    • we apply closing. (like opening but with other morph operation) closing = cv2.morphologyEx(img,cv2.MORPH_CLOSE,kernel) result is OK
  • Morphological gradient takes the difference between dilation and erosion of an image

    • (errosion will eat the foregrounf making it thinner, dialtion will thicken the foreground)
    • morph gradinet will take the difference of two. what we get is the egde or contour of the foreground shape. thsi is a way of edge detection gradient = cv2.morphologyEx(img,cv2.MORPH_GRADIENT,kernel) it does a pretty good job

Lecture 29 - Gradients

  • Image Gradients
  • Sobel Operator
  • understanding gradents will lead to understand edge detection which applies to object detection, tracking and image classification
  • an image gradient is a directional change in the intensity of color in an image. there are algos that can track this direction
  • in this lecture we will mainly explore basic Sobel-Feldman operators
  • later in course we will expand on this operator for general edge detection
  • gradients can be calculated in a specific direction
  • if we use a normalized-x gradient from Sobel operator we see edges mainly on the vertical axes
  • if we use a normalized-y gradient from Sobel operator we see edges mainly on the horizontal axes
  • a normalized gradient magnitud from Sobel operator detects edges on both axes
  • the operator uses two 3x3 kernels which are convoluted with the original image to calculate approximations of the derivatives. one for the horizontal changes and one for vertical
    • Gx = [[+1,0,-1],[+2,0,-2],[+1,0,-1]] * A
    • Gy = [[+1,+2,+1],[0,0,0],[-1,-2,-1]] * A
  • We ll expolre various gradient operators with OpenCV
  • We ll also combine these concepts with a few other image processing techniques we ve learned
  • we open a notebook, do the imports and add the display image helper
  • we read in a sudoku img in grayscale img = cv2.imread('../DATA/sudoku.jpg',0) it has vertical and horizontla lines and nums
  • we apply sobel on the x direction sobelx = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5). we use cv2.Sobel func with params:
    • source image,
    • ddepth (desired depth) selcting from OpenCV available depths it has to do with the desired level of detail
    • x derivative (1 as we want to apply in that direction)
    • y derivcative (0 as we ignore that direction)
    • ksize=5 (sqaue kernel only 1 direction)
  • the result is as expected.detects vertical edges
  • we apply sobel on y direction sobely = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=5) it detects horizontal lines
  • another gradient uses laplacian derivatives. we can calculate these using sobel operators laplacian = cv2.Laplacian(img,cv2.CV_64F) we use cv2.Laplacian passing src image and ddepth. it does a good job in bothj directions
  • a use case of this iage could be to do edge detection to detenc numbers in the image
  • we might want the combined result of sobelx and sobely. we can use addWeighted blended = cv2.addWeighted(src1=sobelx,alpha=0.5,src2=sobely,beta=0.5,gamma=0),
  • a second step in the pipeline could be to do thresholding or apply morphological operators ret, th1 = cv2.threshold(blended,100,255,cv2.THRESH_BINARY_INV) then later do openning to remove noise and so on. or apply morphological gradient gradient = cv2.morphologyEx(blended,cv2.MORPH_GRADIENT,kernel)

Lecture 30 - Histograms - Part One

  • We ll understand what a regular histogram is, then we ll explain what an image histogram means
  • A histogram is a visual representation of the distribution of a continuous feature
  • its a typical plot in data analysis (pyplot offes it seaborn as well), usually we specify a set of bins and display the frequency of a number being in the bin as a barchart
  • we can display it as a genral trend of the frequency drawing a like (KDE plot)
  • for images we can display the frequency of values for colors
  • each of the three RGB channels has vals between 0-255
  • we can plot these as 3 histograms on top of each other to see how much of each channel there is in the picture
  • we ll see how to create picture histograms with matplotmlib and OpenCV
  • we create a notebook and do the useual imports
  • we imread 3 images fixing the color fot matplotlib (horse.jpg , rainbow.jpg, bricks.jpg). we keep 2 copies one for show in RGB and one for processing in BGR (openCV)
  • horse image has a lot of black so we expect peak near 0 for all channels
  • rainbow has even distribution
  • in bricks we expect a peak for blue
  • to calculate the histogram values we use cv2.calcHist() method hist_values = cv2.calcHist([blue_bricks],channels=[0],mask=None,histSize=[256],ranges=[0,256]) that takes a s arguments:
    • [sourse_image in BGR openCV format]
    • channel to show 0=b,1=g,2=r
    • mask: if we want to show histogram for a masked part of image (None=show all)
    • histsize is the num of vals (like buckets) of histogram
    • ranges : the range of vals
  • we use plt.plot(hist_values) to plot the histogram in matplotlib
  • for blue_bricks the B hist ihas a peak in midle. for dark_horse a peak in 0 as image has no blue
  • to plot the 2 color histogram all at once in matplotlib we use for loop and vanilla python
img = blue_bricks
color = ('b','g','r')
for i,col in enumerate(color):
    histr=cv2.calcHist([img],[i],None,[256],[0,256])
    plt.plot(histr,color=col)
    plt.xlim([0,256])
plt.title('Histogram for Image')
  • for dark_horse the histogram is biased as it is a very large picture of mostly pure black so we need to play with plotlimits to see what happens with colors

Lecture 31 - Histograms - Part Two - Histogram on Masked Portion

  • We continue our discussion on histograms with 2 more topics
    • Histograms on a masked portion of the image
    • Histogram Equalization
  • As mentioned in the previous lecture we can select a ROI and only calculate the color histogram of that masked section
  • we ll see how to create amask to achieve this effect
  • histogram equalization is a method of contrast adjustment based on teh images histogram. we saw how we can use gamma correction to increase or degreace the brightness of an image. we will see how to increase or decrease the contrast of an image with histogram equalization
    • we take an image segment (ROI) of a grayscale image and plot its color histogram. the histogram has no vals close to 0 and 255
    • applying histogram equalization will reduce the color depth (shades of gray or inbetween colors)
    • min and max vals in this ROI are 52 and 154. after applying the histogram equalization min is 0 and max is 255 so in essence we increase the contrast
    • histogram is now more evenly distributed or flatened out, high peaks are gone
    • we also see less shaades of gray
    • histogram equalization uses the accumulative histogram. after histogram equalization the accumulative histogram is a linear line from min to max
    • histogram itself maintains the contour but is opened or flatened out
  • we ll do both techniques in opencv
  • we start by building a mask to cut a ROI in the rainbow image. the mask will be white rectangle on black background. we will use bitwise operation (and) on original
rainbow = cv2.imread('../DATA/rainbow.jpg')
show_rainbow = cv2.cvtColor(rainbow,cv2.COLOR_BGR2RGB)
mask = np.zeros(img.shape[:2],np.uint8)
mask[300:400,100:400] = 255
masked_img = cv2.bitwise_and(img,img,mask=mask)
  • we also get a show version of the masked img to visualy confirm the histogram results
  • getiung the masked histogram is easy as we apply the mask in calcHist function. with hist_mask_values_red = cv2.calcHist([rainbow],channels=[2],mask=mask,histSize=[256],ranges=[0,256]) we get the red hist of the ROI
  • to compare we get ared hist for the complete image anfd plot both

Lecture 32 - Histograms - Part Three - Histogram Equalization

  • we load a gorilla image in grayscale and display it using a helper method
  • its a large image
  • we will visualize the histogram then equalize it and see the difference, then convert it back to color image
  • as we work in grayscale we have only one colorchannel to hist hist_values = cv2.calcHist([gorilla],channels=[0],mask=None,histSize=[256],ranges=[0,256]) we have no pure black colors, and white comes from background
  • to equalize histogram we use cv2.equalizeHist() method passing the image eq_goriila = cv2.equalizeHist(gorilla) we display it iand it has high contrast. also we get the histogram and plot it
  • histogram is flatened out and there are alot of 0s in order to get the linear cumulative hist
  • we can apply the equalizeHist in grayscale and color images. for color images we need to convert them to HSV colorspace and use only value channel in equalization
hsv_gorilla = cv2.cvtColor(color_gorilla,cv2.COLOR_BGR2HSV)
value_channel = hsv_gorilla[:,:,2]
eq_value_channel = cv2.equalizeHist(value_channel)
hsv_gorilla[:,:,2] = eq_value_channel
show_eq_color_gorilla = cv2.cvtColor(hsv_gorilla,cv2.COLOR_HSV2RGB)
display_img(show_eq_color_gorilla)
  • the histogram plot of the value channel is the same as of the grayscale version of the image

Lecture 33 - Image Processing Assesment

Section 5 - Video Basics with Python and OpenCV

Lecture 35 - Introduction to Video basics

  • Goals of this Section
    • Connect OpenCV to a WebCam
    • Use OpenCV to open a video file
    • Draw Shapes on video
    • interact with video

Lecture 36 - Connecting to Camera

  • we ll see how to connect with openCV to a usb camera on the laptop or the built-in camera of the laptop

  • also we will see how to video stream from the camera to a file using openCV

  • when we read video data is important to not have multiple notebooks or files running

  • this will create conflicts to openCV

  • we should have only one file reading from camera. the others should have their kernels shutdown

  • a running notebook has a green dot in file tree of jupyter

  • to connect to a camera

    • import cv2
    • create a capture object with cv2.VideoCapture passing the index of the input device we will use cap = cv2.VideoCapture(0), for us 0 is the built in camera and 1 is a usb webcam of better quality
    • we grab the height and width of the frame to use it in processing (they are floats so we cast them to int)
     width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
     height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    • display the imaage
     while True:
     	ret,frame = cap.read()
    
  • what 'cap' is actually is a series of images. a stream of images. a frame is a single image

  • a video is a contiusly updated frame

  • we apply the methods we learned for images on frames. to get the current frame we use cap.read() continuously

  • our processing happens in the while loop (we can add escape logic like before)

  • to convert the frame to gray gray = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

  • then we use cv2.imshow() to show the frame cv2.imshow('frame',gray)

  • we add escape logic

if cv2.waitKey(1) & 0xFF == ord('q'):
	break;
  • we then have to stop capturing cap.release()
  • and then destroy the window cv2.destroyAllWindows()
  • the whole code looks like
import cv2

cap = cv2.VideoCapture(1)

width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

while True:
    
    ret,frame = cap.read()
    gray = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
    cv2.imshow('frame',gray)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()
  • we run it and it works!!!!!!!!!!
  • we will play around a bit... processing the frame
  • we want to be able to save the stream to a file
  • we need a writer object writer = cv2.VideoWriter('./myVideo1.mp4',cv2.VideoWriter_fourcc(*'XVID'),30,(width,height)) we use the cv2.VideoWriter() method to save to a file. it takes 4 arguments
    • the filepath of the file to write
    • the 4-BYTe code specing the codec to be used (different per operating system)
    • the fps to use (frames per second) . we can see our cameras FPS with cv2.CAP_PROP_FRAME_COUNT
    • the size of the frame(wifdth,height)
  • in the while loop after reading we write the frame to the file writer.write(frame)
  • after exiting we release the writer writer.release()

Lecture 37 - Using Video Files

  • in the previous lecture w esaw hpw tp stream and use the video captured by a camera.
  • we will now see how to use existing video files
  • we work on anotebook single cell
  • we import cv2
  • we will read mp4 files from disk. its the same as capturing from camera. we just instead of index provide filepath cap = cv2.VideoCapture('../DATA/hand_move.mp4')
  • if we insert wrong filename opencv does not exit or codec is not supported. it just streams nothing. we put a helpful check
if cap.isOpened() == False:
    print('ERROR FILE NOT FOUND OR WRONG CODEC USED')
  • our while loop is based on cap.isOpened()
  • we read a frame. if we have red sthing the we show it and listen for exit key
  • if we get no frame we braeak th e while loop
  • we cleanup capture and window
cap = cv2.VideoCapture('../DATA/hand_move.mp4')

if cap.isOpened() == False:
    print('ERROR FILE NOT FOUND OR WRONG CODEC USED')
    
while cap.isOpened():
    
    ret, frame = cap.read()
    
    if ret == True:
        
        cv2.imshow('frame',frame)
        
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    else:
        break
cap.release()
cv2.destroyAllWindows()
  • video plays very fast. openCV is not built for presenting videos but processing them so its fast
  • to present the video at human normal speed we import time import time
  • we add a sleep time equal to the frame rate : for 20fps => 50ms in the while loop time.sleep(1/20)

Lecture 38 - Drawing on Live Camera

  • drawing onm video is similar with drawing on image (frame==image)
  • we import cv2
  • we start capture from camera cap = cv2.VideoCapture(0)
  • we get width and height of caputer frame
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
  • we ll draw a rectangle on the stream
  • we first get the 1/2 of frame dimensions x = width // 2 // gets the integer part of the division
  • we set the width of the rect w = width // 4
  • we draw the rect on frame cv2.rectangle(frame,(x,y),(x+w,y+h),color=(0,0,255),thickness=4)
  • we show the frame cv2.imshow('frame',frame)
  • we add escape logic and cleanup
  • to interactively draw on the video we use the capture-show frame boilerplate.
  • we add a callback to modify the global vars
def draw_rect(event,x,y,flags,paran):
    global pt1,pt2,topLeft_clicked,botRight_clicked
    
    if event == cv2.EVENT_LBUTTONDOWN:
        
        # RESET THE RECTANGLE (IT CHECKS IF THE RECT THERE)
        if topLeft_clicked == True and botRight_clicked == True:
            pt1=(0,0)
            pt2=(0,0)
            topLeft_clicked = False
            botRight_clicked = False
            
        if topLeft_clicked == False:
            pt1 = (x,y)
            topLeft_clicked = True
        
        elif botRight_clicked == False:
            pt2 = (x,y)
            botRight_clicked = True
  • we set some global vars
pt1=(0,0)
pt2=(0,0)
topLeft_clicked = False
botRight_clicked = False
  • we do the connection to the callback
cv2.namedWindow('Test')
cv2.setMouseCallback('Test',draw_rect)
  • we draw the rectangle in the while loop using the global values
    if topLeft_clicked:
        cv2.circle(frame,center=pt1,radius=2,color=(0,0,255),thickness=-1)
    if topLeft_clicked and botRight_clicked:
        cv2.rectangle(frame,pt1,pt2,(0,0,255),3)

Lecture 39 - Video Basics Assessment

Section 6 - Object Detection with OpenCV and Python

Lecture 41 - Introduction to Object Detection

  • The Section Goals
    • Understand a variety of object detection methods
    • we ll build up on more complex methods as we go along
  • Template Matching
    • simply looking for an exact ccopy of an image on another image
  • Corner Detection (General Detection)
    • looking for corners in images
  • Edge Detection (General Detection)
    • expanding to find general edges of objects
  • Grid Detection (General Detection)
    • combining both concepts to find grids in images (useful for applications)
  • Contour Detection
    • Allows us to detect foreground vs Background images
    • Also allows for detection of external vs internal contours (e.g grabbing the eyes and smile from a cartoons smile face)
  • Feature Matching
    • more advanced methods of detecting matching objects in another image, even if the target image is not shown exactly the same in the image we are searching
  • Watershed Algorithm
    • Advanced algorithm that allows us to segment images into foreground and background
    • also allows us to manually set seeds to choose segments of an image
  • Facial and Eye Detection
    • We will use Haar Cascades to detect faces in images
    • Note that this is not yet facial recognition that requires deep learning which we will learn in future section
  • Project Assessment
    • A computer vision app that can blur licence plates automatically

Lecture 42 - Template Matching

  • Template matching is the simplest form of object detection
  • it simply scans a larger image for a provided template by sliding the template target image accross the larger image
  • we are talking for an almost exact match
  • the main option that canbe adjusted is the comparison method used as the target template is slid across the larger image
  • the methods are some sort of correlation based metric
  • cv2 offers various methods (TM_SQDFF sqaure difference) (TF_SQDIFF_NORMED normalized square difference) etc
  • we start a notebook with usual imports
  • we imread teh full image we will search in (/sammy.jpg) and color correct it
  • we imread a subset of the full iamge 'sammy_face.jpg' and color correct it
  • the subset is a crop of the full image
  • template matching is part pointless wecause we know beforehand
  • we will use the eval function. eval like sum is a builtin python method. it evaluates a string for a function call
sum([1,2,3])
>> 6
mystring = 'sum'
myfunc = eval(mystring)
myfunc([1,2,3])
>> 6
  • for ease of evaluation we put all TM avaialble methods of cv in an array and loop over it methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR','cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', 'cv2.TM_SQDIFF_NORMED']
  • we loop over the array for m in methods:
  • first thing we make a copy of the full image full_copy = full.copy()
  • we make a method to use using eval method = eval(m)
  • we do the actual template matching using cv2.matchTemplate() es = cv2.matchTemplate(full_copy,face,method) passing full image, the tempalte and the method
  • result is a heatmap (we see it if we plt.imshow(res)). it gives higher values on where it thinks it found the best match. the max value is the best fit (correlation)
  • we will use the min and max val of the heatmap and their locations to draw a rect around the match. we use cv2.minMaxLoc(). min_val,max_val,min_loc,max_loc = cv2.minMaxLoc(res)
  • as SQDIFF works the opposite minval == best corr
    if method in [cv2.TM_SQDIFF,cv2.TM_SQDIFF_NORMED]:
        top_left = min_loc
    else:
        top_left = max_loc
  • we get bottom right of rect to draw the match area from template shape
    height, width,channels = face.shape
    bottom_right = (top_left[0]+width,top_left[1]+height)
  • we draw the rect cv2.rectangle(full_copy,top_left,bottom_right,(255,0,0),10)
  • we plot and show the image
    plt.subplot(121)
    plt.imshow(res)
    plt.title('HEATMAP OF TEMPALTE MATCHING')
    plt.subplot(122)
    plt.imshow(full_copy)
    plt.title('DETECTION OF TEMPLATE')
    plt.suptitle(m)
    
    plt.show()
  • 'plt.show()' helpt the iteration so we dont overwrite images
  • we see the results. only TM_CCORR performs badly

Lecture 43 - Corner Detection - Part One - Harris Corner Detection

  • when thinking about corner detection in computer vision, we should define what is a corner
  • a corner is a point whose local neighborhood stands in two dominant and different edge detections. it can be interpreted as the junction of two edges, where an edge is a sudden change in image brightness
  • we will look at 2 of the most popular algorithms for corner detection.
    • Harris Corner Detection
    • Shi-Thomas Corner Detection
  • Harris Corner Detection
    • 1988 by Chris hharris and Mike Stevens
    • the basic intuition is that corners can be detected by looking for significant change in all directions
    • shifting a window in any direction on a corner region will result in a large change in appearance
    • doing the same on aflat region will have no effect at all
    • doing shifting on an edge wonth have major change if we shift along the direction of the edge
    • In a nutshell Harris Corner Detection math says: if we scan the image with a window (like we did with kernels) and we notice an area where there is major change no matter in which direction we scan, we expect a corner to be there. the window does shifting
  • Shi-Thomasi Corner Detection
    • 1994 by J.Shi and C.Tomasi in the papaer Good Features to Track
    • It made a small mod to the Harris Corner Detection that geve better results
    • the mod is a change to the scoring function selection criteria that Harris uses for corner detection: Harris uses R=λ1λ2-Îş(λ1+λ2) Shi-Tomasi uses R=min(λ1,λ2)
  • We ll explore how to use both with the OpenCV lib.
  • we do normal imports in notebook
  • we imread a chessboard image 'flat_chessboard.png' and color correct it.
  • image is a perfect grid with clear corners and clear edges
  • we cnv it to grayscale
  • we also read a real chess image 'real_chessboard.jpg' we expect the algo to find corners related to pieces as well. we color correct it and turn it to grayscale
  • we apply harris corner detection to the flat nd real chess image
  • first we convert the grayscale image (0-255) to float vals (0. to 1.) we do it with plain casting gray = np.float32(gray_flat_chess)
  • we then apply harris cd dst = cv2.cornerHarris(src=gray,blockSize=2,ksize=3,k=0.04) passing in:
    • src image
    • blocksize of the window
    • ksize of the sobel operator used for edge detection
    • k param (harris detector free param) (typically 0.04)
  • we then dilate results for ploting it dst = cv2.dilate(dst,None)
  • there is a threshold for optimal value that varies upon the image. we choose it to be 0.01*max() value of resutl and use it to turn to red the pixels that are over the threshold in terms of corner detection rerult. we apply it to the original image with numpy array indexing flat_chess[dst> 0.01*dst.max()] = [255,0,0]
  • we plot the image. detection is perfect
  • outer edges are not detected. it is seen as flat space.
  • we will apply haris to the grayscale version of the real chess
gray = np.float32(gray_real_chess)
dst = cv2.cornerHarris(src=gray,blockSize=2,ksize=3,k=0.04)
real_chess[dst> 0.01*dst.max()] = [255,0,0] #RGB
plt.imshow(real_chess)
  • it detec alot of corners of pieces as well

Lecture 44 - Corner Detection - Part Two - Shi-Tomasi Detection

  • we will use the same images for testing (real,flat + gray versions)
  • we use 'cv2.goodFeaturesToTrack' method with params:
    • src image
    • max corners we want returnd (0 to return all)
    • a quality level param (minimum eigen val)
    • minimu distance
  • we will draw little circles in positions he thinks he found the corners. it does not store the points of corners like the harris . we need to flatten out the arrya and draw circles on it
  • corners are float so we turn them to int corners = np.int0(corners)
  • the we iterate through corners getinng coord from flatened rray and drawing circle
for i in corners:
    x,y = i.ravel()
    cv2.circle(flat_chess,(x,y),3,(255,0,0),-1)
  • we then plot the drawun image
  • we do the same for real_chess trying to detect 100 corners
corners = cv2.goodFeaturesToTrack(gray_real_chess,100,0.01,10)
corners = np.int0(corners)
for i in corners:
    x,y = i.ravel()
    cv2.circle(real_chess,(x,y),3,(255,0,0),-1)
plt.imshow(real_chess)
  • we see better results than harris

Lecture 45 - Edge Detection

  • In this lecture we will learn how to use the Canny Edge Detector one of themost popular edge detection algorithms
  • It was developed in 1986 by John Canny and is a multi-stage algorithm
  • Canny Edge Detection Pipeline:
    • Apply Gaussian filter to smooth the image in order to remove the noise
    • Find the intensity gradients of the image
    • Apply non-maximum suppression to get rid of spurious response to edge detection
    • apply double threshold to determine potential edges
    • track edge by hysteresis: Finalize the detection of edges by suppressing all the other edges that are weak and not connected to strong edges
  • For high res images where we only want general edges, it is usually good idea to apply a custom blur before applying canny algorithm
  • Canny algorithm requires the user to decide on low and high threshold values
  • In our notebook we provide an equation for picking a good starting point for threshold vals, but often we will need to adjust to our particular image
  • we add the normal imports
  • we will work on sammy_face.jpg. we dong care about color correction
  • we will apply the canny edge detector straight through (no blurring) edges = cv2.Canny(image=img,threshold1=127,threshold2=127) we set low and hight threshold to half. we plot and see there is a lot of noise in the result
  • we can solve it with:
    • blurring the image beforehand
    • play with threshold
  • we play with threshold and get some good results with edges = cv2.Canny(image=img,threshold1=220,threshold2=240)
  • we will use a formula that helps select good thresholds
  • we calculat ethe median pixel val med_val = np.median(img) its 64
  • we select the thresholds based on
# LOWER THRESHOLD TO EITHER 0 OR 70% OF THE MEDIAN VAL, WHICHEVER IS GREATER
lower = int(max(0,0.7*med_val))
# UPPER THRESHOLD TO EITHER 1300% OF THE MEDIAN VAL oR 255, WHICHEVER IS SMALLER
upper = int(min(255,1.3*med_val))
  • we apply the thresholds edges = cv2.Canny(image=img,threshold1=lower,threshold2=upper) results are actually worse
  • we blur blurred_img = cv2.blur(img,ksize=(5,5)) and apply result is considerably better
  • to improve more we increase kernel size

Lecture 46 - Grid Detection

  • often cameras can create a distortion in an image, such as radial distortion and tangential distortion
  • a food way to account for these distortions when performing operation like object tracking is to have a recognizable pattern attached to the object being tracked
  • grid patterns are often used to calibrate cameras and track motion (eg attach/draw a cube on a grid that will move as grid moves)
  • openCV has built in methods for tracking grids and chessboard like patterns
  • we do normal imports
  • we read the 'flat_chessboard.png' image
  • for grid detection to work the grid has to have a chessboard like appearance. then we have to place it on the samera we want to calibrate
  • to find the chessboard corners we use cv2.findChessboardCorners found,corners = cv2.findChessboardCorners(flat_chess,(7,7)) we pass in the src image and a tuple with the number of grid edges the grid area as in each direction. it returns a tuple with found (a bool if it found the pattern) and the position of the grid edges
  • corners is a list of coordinates
  • we use them with another build in method 'cv2.drawChessboardCorners' cv2.drawChessboardCorners(flat_chess,(7,7),corners,found) which draws on the image we pass the corners found
  • another grid like pattern is circle based grids (dot grids)
  • we read in 'dot_grid.png' with perfectly clean circles
  • we will use equivalent cv2 methods like chessboard but for circlegrid (same concept same params) found,corners = cv2.findCirclesGrid(dots,(10,10), cv2.CALIB_CB_SYMMETRIC_GRID) we use alsoa a grid param
  • corners are in a same format. we use the drawchessboardcorners passing the corners cv2.drawChessboardCorners(dots,(10,10),corners,found)
  • grid detection is used for camera calibration

Lecture 47 - Contour Detection

  • Contours are defined as simply a curve joining all the continuous points (along the boundary), having same color or intensity
  • Contours are a useful tool for shape analysis and object detection and recognition
  • OpenCV has a built-in Contour finder finder function that can also help us differentiate between internal and external contours
  • we do the normal imports
  • we read in (in grayscale) an image 'internal_external.png' with simple contours (internal and external)
  • to extract the contours we use cv2.findContours image, contours, hierarchy = cv2.findContours(img, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) with has as arguments
    • the src image,
    • the type of detection (internal,external contours, both)
    • the algorithm to be used
  • the function returns the image. a list (conttours) and a 3d array with hierarchies (onyly if we use both internal and external.
  • the contrours are 22 whic matches what we expect forthis simple image
  • to show the contours we initialize a black image as zeros external_contours = np.zeros(image.shape)
  • then we iterate in the list of contours and:
  • if the 3rd element of the hierarchy array for the indexed contour is -1 it means it is an external so we draw it using the drawContours method that takes the black image (where to draw on) the contours object, the index the colour it will use and the thicknes (-1 for fill)
for i in range(len(contours)):
    
    # EXTERNAL CONTOUR
    if hierarchy[0][i][3] == -1:
        cv2.drawContours(external_contours,contours,i,255,-1)
  • internal contour edges touch the foreground to get them and draw them we use the same loop but the condition is the hierarchy to be != -1 (non external)

Lecture 48 - Feature Matching - Part One

  • This is the halfway of the course.
  • So far we ve learned a lot of technical syntax, but havent really neen able to apply it to more complex computer vision applications
  • This is the point where we begin to do useful computer vision apps. we will use all our technical knowledge and python sysntax skills with OpenCV to create programs that are directly applicable to realistic situations
  • We will begin with Feature Matching
  • We ve already seen template matching to findobjects (tempalte images) within a larger iamge. it required an exact copy of the image.
  • usually this is not useful in real world situations as we ll have an indicative image of what we are looking for, not an exact copy.
  • what we do in such situations is feature matching
  • Feature matching extracts defining key feats from an input image (using ideas from corner,edge, and contour detection)
  • then using a distance calculation finds all the matches in a secondary image
  • this means we are no longer required to have an exact copy of the target image in the secondary image
  • We will use 3 methods:
    • Brute-Force matching with ORB Descriptors
    • Brute-Force Matching with SIFT Descriptors and Ratio Test
    • FLANN based Matcher
  • we will test with a generic cereal box image and see if we can find its matching box in the cereal isle
  • we do normal imports and use our display helper function
  • we imread a cereal image 'reeses_puffs.png' as grayscale. this is the query image
  • the target image is 'many_cereals.jpg' we imread it as a grayscale. in target image the query image exists but not as exact copy
  • First we apply Brute Force Detection with ORB descriptors
  • we first create the detector orb = cv2.ORB_create() we then apply the detector on both images (target and query) to extract keyponts and descriptors (Nonoe stands if we want to use a mask)
kp1, des1 = orb.detectAndCompute(reeses,None)
kp2, des2 = orb.detectAndCompute(cereals,None)
  • the we apply brute force matcher using default params bf = cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
  • we use the bruteforce result to get the matches between descriptors from 2 images matches = bf.match(des1,des2)
  • we sort the matches based on their distance attribute (represents the match) matches = sorted(matches, key=lambda x:x.distance)
  • we use a convenience method to draw matches for showing using images, keypoints and matches array. reeses_matches = cv2.drawMatches(reeses,kp1, cereals,kp2, matches[:25], None,flags=2)
  • we then display the image. we see that we have no successful match

Lecture 49 - Feature Matching - Part Two

  • we will now use SIFT (scale invariant feature transform) descriptors fro bruteforce feat detection
  • it performs better in cases where query image is scaled in the target image
  • we start with creating a sift object sift = cv2.xfeatures2d.SIFT_create()
  • in the same way as before we extract keypoints and descriptors from both query and target image.
kp1, des1 = sift.detectAndCompute(reeses,None)
kp2, des2 = sift.detectAndCompute(cereals,None)
  • we have calculated the descriptors we will compare them using brute force bf = cv2.BFMatcher()
  • we will calc the matches fromthe bf object in a different manner matches = bf.knnMatch what this does is takes 2 sets of descriptors and a value k (number of best matches it will find per descriptor of the query set)
  • descriptors are coordinates of where feats were found
  • as i set 2 as k the matches obj is an arraw of 2 matchobjects per descriptor. first match is better than the second
  • we will now apply a ratio test. our intution is that if the distance of match1 is close to the distance of match2 the this descriptor's feat is probably a good match between query set and target set
good = []
for match1,match2 in matches:
    # IF MATCH 1 DISTANCE IS <75% OF MATCH2 DISTANCE
    # THEN DESCRIPTOR WAS A GOOD MATCH, KEEP IT
    if match1.distance < 0.75*match2.distance:
        good.append([match1])
  • this filtering does pretty good job (keeps ~5%)
  • we draw the matches using conv method sift_matches = cv2.drawMatchesKnn(reeses,kp1,cereals,kp2,good,None,flags=2)
  • we display. resutls are actually very good
  • we ll work with FlANN (Fast Library for Aproximate Nearest Neighbours) based matcher. its much faster than Bruteforce but it finds general good matches
  • we can play with FLANN params to imporve resutls but it slows down the algo
  • we start like before creating a sift object sift = cv2.xfeatures2d.SIFT_create() and getting keypoints and descriptors from images
  • we set Flann params (to defaults)
FLANN_NDEX_KDTREE = 0
index_params = dict(algorithm=FLANN_NDEX_KDTREE,trees=5)
search_params = dict(checks=50)

we compare descriptors with flann flann = cv2.FlannBasedMatcher(index_params,search_params)

  • we grab the k nearest neigbours matches with matches = flann.knnMatch(des1,des2,k=2)
  • we do a ratio test like before
good = []
# LESS DISTANCE == BETTER MATCH
for match1,match2 in matches:
    # IF MATCH 1 DISTANCE IS <75% OF MATCH2 DISTANCE
    # THEN DESCRIPTOR WAS A GOOD MATCH, KEEP IT
    if match1.distance < 0.75*match2.distance:
        good.append([match1])

  • we use draw helper to draw them sift_matches = cv2.drawMatchesKnn(reeses,kp1,cereals, kp2,good, None, flags=2)
  • we display and see the result. we get good results with increase in speed
  • if we use flags=0 we see also the dots of the matches (potential feats to match on).
  • to play with coloring in presentation to make understanding better we will mask the matches.
  • after geting them and before ratio test we do matchesMask = [[0,0] for i in range(len(matches))] so we get array of nested size 2 zero arrays equal in num to matches
  • we will turn them on (0to1) if we have a good match (during ratio test)
# LESS DISTANCE == BETTER MATCH
for i,(match1,match2) in enumerate(matches):
    # IF MATCH 1 DISTANCE IS <75% OF MATCH2 DISTANCE
    # THEN DESCRIPTOR WAS A GOOD MATCH, KEEP IT
    if match1.distance < 0.75*match2.distance:
        matchesMask[i]=[1,0]
  • we no longer need to gopy in good. we use indexing in mask
  • we create a dray params dict obj draw_params = dict(matchColor=(0,255,0),singlePointColor=(255,0,0),matchesMask=matchesMask,flags=0)
  • our draw method becomes flann_matches = cv2.drawMatchesKnn(reeses,kp1,cereals,kp2,matches,None,**draw_params)

Lecture 50 - Watershed Algorithm - Part One

  • In geograpshy, a watershed is a land area that channels rainfall and snowmelt to creeks,streams and rivers and eventually to outflow points such as reservoirs, bays, and the ocean
  • These watersheds can then be segmented as topolographical maps with boundaries (topographical lines of altitude in maps)
  • Metaphorically the watershed algorithm transformation treat the image it operates upon like a topographic map, with the brightness of each point representiong its height, and finds the lines that run along the tops of ridges (like water aggreages in geography, brightness aggregates in images)
  • Any grayscale image can be viewed as a topographic surface where high intensity denotes peaks and hills while low intensity denotes valleys
  • The algorithm can then fill every isolated valleys (local minima) with different colored water (labels)
  • as "water" (inensity) rises, depnding on the peaks (gradients) nearby, "water" from different valleys (different segments of the image) with different colors could start to merge
  • To avoid this erging, the algorithm creates barriers (segment edge boundaries) in locations where "water" merges
  • this algorithm is especially useful for segmenting images into background and foreground in situations that are difficult for other algorithms
  • a common example is the use of coins next to ech other on a table. for most CV algos when they see the image the coinsed all cois as a large blob
  • it may be unclear to the algo if it should be treated as one large object or many small objects.
  • watershed algo can be very effective for these sort of problems
  • later on we will also learn how to provide our own custom 'seeds' that allow us to manually start where the valleys of the watersheds go
  • we ll begin exploring the syntax of te watershed algorithm with OpenCV and then expand this idea to set our own seeds
  • We start our notebook with the normal imports and the helper method
  • we imread 'pennies.jpg' a high res image of 6 coins attached to each other
  • our goal is to be able to produce 7 segments in the image (6 for coins and 1 for background)
  • we will test algos we know sofar to show the waekness of the algos in distinguising the coins
  • We first apply median blur to get rid of feats we dont need sep_blur = cv2.medianBlur(sep_coins,25)
  • we will turn it to grayscale gray_sep_coins = cv2.cvtColor(sep_blur,cv2.COLOR_BGR2GRAY)
  • We will apply binary threshold ret,sep_coins_thresh = cv2.threshold(gray_sep_coins,160,255,cv2.THRESH_BINARY_INV) wee that no matter how we play the coins are allways attached in one shape (we could erode)
  • We will find the contours image,contours,hierarchy = cv2.findContours(sep_coins_thresh.copy(), cv2.RETR_CCOMP,cv2.CHAIN_APPROX_SIMPLE)
  • we draw the external contours
for i in range(len(contours)):
    if hierarchy[0][i][3] == -1:
        cv2.drawContours(sep_coins,contours,i,(255,0,0),10)
  • contour is one giant contour
  • we need a more advanced method

Lecture 51 - Watershed Algorithm - Part Two

  • we read the same image
  • we apply median blur (huge kernel for huge image) img = cv2.medianBlur(img,35)
  • we turn to grayscale gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  • we try thresholding ret, thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV) we see that despite the high blur we still pick up features in binary thresh
  • We will apply Otsu's method for thresholding with beteer results which is a very good match to the watershed algorithm. we apply again thersholiding adding OTSU (otsu works with full range between low and up treshold) ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
  • our circles are still connected
  • we do noise removal (no effect in such a simple image. makes sense for real complex images) using the morphological operators opening = cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel,iterations=2) for this image has no effect
  • we still face the problem. we have one blob as object
  • we get the sure background area by dilating onthe opened image sure_bg = cv2.dilate(opening,kernel,iterations=3)
  • what we need to do for the watershed is to set seeds that we are sure that are in the foreground (6 seeds 1 per coin)
  • how we grab things we are sure are in the foreground vs things in the background? we use distance transform. what this dows is as we move away from boundaries with background (0) pixels get higher values (become brighter). we can apply this to our thresholded image and expect the coins center to be the brightest points. then we can rethreshold and get the 6 seed points dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
  • we reapply thresholding to get surely foreground points ret,sure_fg = cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0) 0.7*dist_transform.max() is a typical value used. we get 6 points we are sure to be in foreground. we will use them as seeds
  • the region outside the dots is the unknown region. we need the watershed algo to figure out what it is
  • we get the unknown region by subtracting sure foreround from sure background region. that where we need to use watershed algo
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg,sure_fg)
  • we need to add label markers to the 6 points and let them be the seeds of watershed algo
  • we get the markers from foreground ret, markers = cv2.connectedComponents(sure_fg) and add 1 to separate rom background. markers are the same points but with different color
  • we explicitly set the unknown area to 0 markers[unknown==255] = 0 thats why we added 1 before. to clearly separate it from unknown area
  • we are now ready to fill/flood the unknwon area with watershed algrithm.. markers = cv2.watershed(img,markers) we have clear separation.
  • we cna now confidently get the contours
image,contours,hierarchy = cv2.findContours(sep_coins_thresh.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)`
for i in range(len(contours)):
    if hierarchy[0][i][3] == -1:
        cv2.drawContours(sep_coins,contours,i,(255,0,0),10)
  • we did the work manually. in next lecture we will do it autom.

Lecture 52 - Custom Seeds with Watershed Algorithm

  • we want to be able to click on the image setting the seeds manually. and let the algo run all the steps to do the segmentation automatically
  • in a new notebook we do the normal imports
  • we read in an image 'road_image.jpg' and do a copy out of it
  • we wont do color correction as we will use opencv to view and interact with the image
  • we create an empty space to draw on the results of the algorithm using the shape of the road. one for the markers for watershed marker_image = np.zeros(road.shape[:2],dtype=np.int32) and one to draw the segments segments = np.zeros(road.shape,dtype=np.uint8)
  • we then have to choose how to create the colors for the markers. we will use colormaps
  • matlplotlib colormaps have qualitatibe colormaps that are indexable
from matplotlib import cm
cm.tab10(0)
  • what we get is a color in tuple form with rgb vals in float format + alpha param
  • to use the colors we cast them to tuple tuple(np.array(cm.tab10(0)[:3])*255) color is in OpenCV BGR format
  • we make it a func
def create_rgb(i):
    tuple(np.array(cm.tab10(i)[:3])*255)
  • and use it to create 10 distinct colors for markers
colors = []
for i in range(10):
    colors.append(create_rgb(i))
  • we start implementing our application
  • we define the globals
current_marker = 1 # color choice
marks_updated = False # markers updated by watershed algorithm
  • we write our callback function. it mod the global marks_updated. it listens to event LBUTTONDOWN.when this happens it draws a circle ont he road_copy the user sees and draws a marker on the marker_image to be fed to the algorithm
def mouse_callback(event,x,y,flags,param):
    global marks_updated
    if event == cv2.EVENT_LBUTTONDOWN:
        # MARKERS PASSED TO THE WATERSHED ALGO
        cv2.circle(marker_image,(x,y),10,(current_marker),-1)
        # USER SEES ON THE ROAD IMAGE
        cv2.circle(road_copy,(x,y),10,colors[current_marker],-1)
        marks_updated = True
  • we add window and callback bind for Riad Image window
cv2.namedWindow('Road Image')
cv2.setMouseCallback('Road Image', mouse_callback)
  • we start the while loop
    • we show 2 windows: one for placing the markins on road image and one for segments
    • we add escape logic
    • we add reset logic (reseting markers sergments matrices and current marker selection)
    • we add logic to change marker group with digits
    • when user clicks (marker update)
      • we make a copy of marker image
      • we run watershed on original image based on marker image copy (current selection)
      • we rest segments
      • we redrow them based ont the markers_mage copy status (watershed output)
while True:
    
    cv2.imshow('Watershed Segments',segments)
    cv2.imshow('Road Image',road_copy)
    
    # CLOSE ALL WINDOWS
    k = cv2.waitKey(1)
    
    if k == 27:
        break
    
    # CLEARING ALL COLORS IF USER PRESSES C KEY
    elif k == ord('c'):
        road_copy = road.copy()
        marker_image = np.zeros(road.shape[:2],dtype=np.int32)
        segments = np.zeros(road.shape,dtype=np.uint8)
    
    # UPDATE COLOR CHOICE
    elif k > 0 and chr(k).isdigit():
        current_marker = int(chr(k))
        
    # UPDATE THE MARKINGS
    if marks_updated:
        
        marker_image_copy = marker_image.copy()
        cv2.watershed(road,marker_image_copy)
        
        segments = np.zeros(road.shape,dtype=np.uint8)
        
        for color_ind in range(n_markers):
            # COLORING THE SEGMENTS, NUMPY CALL
            segments[marker_image_copy==(color_ind)] = colors[color_ind]
    
cv2.destroyAllWindows()

Lecture 53 - Introduction to Face Detection

  • In this lecture we will explore face detection using Haar Cascades, which is key component of the Viola-Jones object detection framework
  • Keep in mind we are talking about face detection NOT face recognition
  • we will be able to very quickly detect if a face is in an image and locate it
  • however we wont know who's face it belongs to.
  • we would need a really large dataset and deep learning for facial recognition
  • In 2001 Paul Viola and Michael Jones published their method of face detection based oin the simple concept of a few key features
  • They also came up with the idea of precomputing an integral image to save time on calculations
  • Lets understand the main feature types Viola and jones Proposed
  • main feature types are:
    • edge features (horizontal ore vertical) e.g [[0,0,0],[1,1,1]]
    • line features (horizontal or vertical) e.g [[1,1,1],[0,0,0],[1,1,1]]
    • four rectangle features e.g [[0,1],[1,0]]
  • each feature is a single value obtained by subtracting sum of pixels under white rectangle from sum of pixels under black rectangle
  • realistically, our images won't be perfect edges or lines
  • these feats are calculated by subtracting the mean of the dark region from the mean of the light region
  • a perfect line (0,1) whoud result to an 1. the closer our result (delta) is to 1 the better the feature
  • we then set a threshold above which we consider to have a feature
  • calculating these sums for the whole image can be computational expensive
  • the Viola-Jones algorithm solves it by using the precalculated integral image
  • this results in an O(1) running time of the algorithm
  • an integral image is known as a summed area table which comes fromt the original image by summing the pixel values in the area defined as a rectanngle with topleft =(0,0) and pt2=(x,y) the position of the pixel (bottom right)
  • this allows to calculate very fast the mean and the delta
  • this algorithm also saves time by going through a cascade of classifiers
  • this means we will treat the image to a series (a cascade) of classifiers based on the simple feats we saw earlier
  • once an image fails a clasifier we can stp attempting to detect a face
  • a common misconception behind face detection with this algo is that the algorithm slowly scans the entire image looking for a face
  • this would be very inefficient. instead we pass a cascade of classifiers
  • first we need a front face image of a persons face
  • then we turn it to grayscale
  • then we will begin to search for the Haar Cascade features
  • one of the very first features searched for is an edge feature indicating eyes and cheeks
  • if it passes we go to next feature such as the bridge line of the nose
  • if it passes we continue with other features (eyebrows edges, mouth line etc)
  • untill the lagorithm decides it has detected a face based on the features
  • theoretically this approach can be used for a variety of objects or detections (like pretrained eye detector)
  • the downside of this algorithm is that very large datasets are needed to create our own feature sets
  • luckily many pre-trained sets of features exist
  • OpenCV comes wwith pre-trained xml files of various Haar Cascades
  • Later on in the deep learning section of the course we will see how to create our own classification algorithm for any distinct group of images (e.g cats vs dogs)
  • we have placed pre-trained .xml files int eh DATA folder
  • we will also be using a pre-trained file for our upcomming project assessment
  • first we ll expore how to use facial detection with OpenCV

Lecture 54 - Face Detection with OpenCV

  • we do the normal imports
  • we will use two portrait images. one is professionally edited with gradients causing issues down the line
  • also will use a group photo in grayscale
  • we imread the images
  • we need to create a classifier and pass in teh XML classifier face_cascade = cv2.CascadeClassifier('../DATA/haarcascades/haarcascade_frontalface_default.xml')
  • we will functionalize the way cascades work
def detect_face(img):
    face_img = img.copy()
    face_rects = face_cascade.detectMultiScale(face_image)
    for (x,y,w,h) in face_rects:
        cv2.rectangle(face_img,(x,y),(x+w,y+h),(255,255,255),10)
    return face_img
  • in this func we make a copy of image and use detectMultiScale on the cascade object passing in the image
  • what it returns is an array of rectangles (topleft position width and height)
  • we iterate n teh array drawing the rectangles on teh image and return it
  • we test it on our test images . it works but for the multifface image it throws false positives
  • we will adjust some params to imrpove performance (scale factor and minimum neighbors) face_rects = face_cascade.detectMultiScale(face_img,scaleFactor=1.2,minNeighbors=5)
  • we test and we have a false negative of a face not lookng in the camera
  • we will look for eyes using an Eye_cascade file. in nadia it works but not in denis
  • we will do it with video capture

Lecture 55 - Detection Assessment

Section 7 - Object Tracking

Lecture 57 - Introduction to Object Tracking

  • Object Tracking Section Goals
    • Lear basic object tracking techniques: Optical Flow, MeanShift and CamShift
    • Understand more advanced tracking: Review Built-in Tracking APIs

Lecture 58 - Optical Flow

  • Optical Flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movements of the object or camera
  • Optical flow analysis gas a few assumptions
    • the pixel intensities of an object do not change between consecutive frames
    • neighbouring pixels have similar motion
  • The optical methods in OpenCV will first take in a given set of points and a frame
  • Then it will attempt to find those points in the next frame
  • It is up to the user to supply the points to track
  • we consider a five frame clip of a ball moving up and towards the right
  • given the clip we cannot determine if the ball is moving or if the camera moved down and to the left
  • using OpenCV we pass in the previous frame, previous points and the current frame to the Lucas-Kanade function
  • the function attempts to locate the (tracked) points in teh current frame
  • The Lucas-Kanade computes optical-flow for a sparse feature set (meaning only the points it was told to track)
  • But what if we want to track all teh points in teh video
  • In that case we can use Gunner Farnerback's algorithm (also built in to OpenCV) to calculate dense optical flow
  • This dense optical flow will calculate flow for all points in an image
  • It will coler them black if no flow (no movement) is detected

Lecture 59 - Optical Flow Coding with OpenCV - Part One

  • We start with Lucas-kanade method for sparce flow
  • we will use simple SHi-Tomashi corner detection to find point to track
    • we set the params as dict corner_track_params = dict(maxCorners= 10, qualityLevel=0.3, minDistance=7, blockSize=7)
    • we will find 10 corners in first frame and track them
  • we se LK params as dict passing some default vals:
    • winSize = window size (smaller more sensitive to noise and might lose larger motions, larger window might miss small motions),
    • maxLevel = (LK uses image pyramide for image proc) is the levels of pyramid used. what we gain is we can track motions at various resolutions of the image
    • criteria = 2 criteria (maximum num of iterations, epsilon or accuracy) we should adjust them depending on the video
  • we grab a frame from camera to find the poits to track and turn it to grayscale
cap = cv2.VideoCapture(1)
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame,cv2.COLOR_BGR2GRAY)
  • we get the points to track prevPoints = cv2.goodFeaturesToTrack(prev_gray,mask=None,**corner_track_params)
  • we create a mask to draw the points and create lines on the video (initialize with 0) mask = np.zeros_like(prev_frame)
  • we dd the while loop where
    • we capture a frame and turn it to grayscale
    • we calcualte the optical flow with LK nextPts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray,frame_gray,prevPoints,None,**lk_params) we pass inL
      • prev frame
      • curr frame
      • prevPoints
      • None nextPoints (we will get them as return vals)
      • the config params
    • we will use the returned status array which outputs a status vector where each element of the vector is set to 1 if the flow for teh correspondent feature has been found
    • we use it to get the good new points good_new = nextPts[status==1]
    • also to get the good prev poitns (for drawing the line) good_prev = prevPoints[status==1]
    • we zip both iterating through to draw the lines and set the dots of point son frame
    for i, (new,prev) in enumerate(zip(good_new,good_prev)):
        x_new, y_new = new.ravel()
        x_prev, y_prev = prev.ravel()
        
        mask = cv2.line(mask,(x_new,y_new),(x_prev,y_prev),(0,255,0),3)
        
        frame = cv2.circle(frame,(x_new,y_new),8,(0,0,255),-1)
  • we mask the frame and show it
    img = cv2.add(frame,mask)
    cv2.imshow('tracking',img)
  • we set durr frame as prev and good new points as prev for next iteration (frame)
  • we need to reshape the good_new points so its accepted by the LK
  • we release frame and destroy window

Lecture 60 - Optical Flow Coding with OpenCV - Part Two

  • We will take the entire image to detect points
  • to see all the params we can see the course notebook
  • we do usual imports
  • we start capture object
  • we read a frame a frame (initial frame) and turn it to grayscale
  • we setup an HSV based mask hsv_mask = np.zeros_like(frame1)
  • we set saturation to max hsv_mask[:,:,1] = 255
  • we enter the while loop
  • we grab next frame in the loop and turn it to grayscale
  • we calculate the optical flow with Farnerbach flow = cv2.calcOpticalFlowFarneback(prevImg,nextImg,None,0.5,3,15,3,5,1.2,0) passing default params
  • the flow object contains vector flow cartesian info (x,y)
  • we want to convert this into polar coordinates to magnitude and angle, when we get tis info we will map it to the HSV color mapping. magnitude will represent saturation and angle the hue
  • if all moves in the same direction it will be colored the same way
  • we conver to polar coordinates mag, ang = cv2.cartToPolar(flow[:,:,0],flow[:,:,1],angleInDegrees=True) angle in degrees
  • i set the hue in hsv as the angle/2 to reduce the num of hues hsv_mask[:,:,0] = ang/2
  • we set value channel in mask to the mag in 0-255 range hsv_mask[:,:,2] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX)
  • we convert the mask to bgr to be presentable bgr = cv2.cvtColor(hsv_mask,cv2.COLOR_HSV2BGR)
  • we imshow the mask bgr = cv2.cvtColor(hsv_mask,cv2.COLOR_HSV2BGR)
  • we add escape logic, renew the frame prevIng = nextImg
  • and cleanup out of the loop

Lecture 61 - MeanShift and CAMShift Tracking Theory

  • Some of the most basic tracking methods are MeanShift and CAMShift
  • We ll first describe the general MeanShift algortithm, then learn how to apply it for image tracking
  • Afterwards we will learn how to extend the MeanShift into CAMShift (Continuously Adaptive MeanShift)
  • Imagine we have a set of x,y points and we want to assign them into clusters
  • we will take all our data points and stack red and blue points on them (blue on top of red)
  • the direction to the closest cluster centroid is determined by where most of the points nearby are at (weighted mean)
  • so in each iteration each blue point will mov closer to where the most points are at, which is or will lead to the cluster center
  • blue and red datapoints overlap completely in teh first iteration before the Meanshift algorithm starts
  • at the end of iteration one, all the blue points move towards the clusters.
  • in our example in 3rd iteration all clusters reach convergence. there is no reason for more iterations as cluster means stop moving
  • Meanshift algo wont always detect what may appear to us more reasonable
  • In K-means algorithm (Machine Learning) we choose how many clusters we have beforehand
  • How MeanShift applies to object tracking:
  • meanshift can be given a target to track, calculate the color histogram of the target area, and then keep sliding the tracking window to the closest match (the cluster center)
  • Just using Meanshift won't change the window size if the target moves away or towards the camera.
  • We can use CAMShift to update the size of the Window

Lecture 62 - MeanShift and CAMShift Tracking with OpenCV

  • we start notebook and do the imports
  • we start video capture
  • we grab a frame
  • we will get our ROI doing facetracking once in the first frame only
  • we get the haarcascade obj in object face_cascade = cv2.CascadeClassifier('../DATA/haarcascades/haarcascade_frontalface_default.xml')
  • we get the rects where face is dtected using cascades face_rects = face_cascade.detectMultiScale(frame)
  • we grab the first face as we want to track only one (face_x,face_y,w,h) = tuple(face_rects[0]) we convert it to tuple as it is needed for the algo
  • we name our tuple tracking window track_window = (face_x,face_y,w,h)
  • we set an ROI for tracking roi = frame[face_y:face_y+h,face_x:face_x+w]
  • we use hsv colormapping hsv_roi = cv2.cvtColor(roi,cv2.COLOR_BGR2HSV)
  • we will get the hsv histogram to backproject each frame in order to calculat ethe meanshift roi_hist = cv2.calcHist(hsv_roi,[0],None,[180],[0,180]) we get hist for hue channel for vals 0-180
  • the algo works with 0-255 so we normalize cv2.normalize(roi_hist,0,255,cv2.NORM_MINMAX)
  • we set the termination criteria term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT,10,1) for 10 iterations or eps=1
  • we start our while loop
  • if we have a frame we convert it to hsv
  • we will calculate the backprojection based on the roi hist we have dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1) note we work on 0-180
  • using meanShift we get a new trackwindow pasing the previous one, the backpropagation and the term criteria ret, track_window = cv2.meanShift(dst,track_window,term_citeria)
  • we will now draw a new rect on the image based on the new track window (we do tuple unpacking to get coords x,y,w,h = track_window)
  • we add the rectangle img2 = cv2.rectangle(frame,(x,y),(x+w,y+h),(0,0,255),5) and show the frame
  • we add escape logic and cleanup
  • we fix of the rect not resizing based on head size change using the CAMShift
        ret,track_window = cv2.CamShift(dst,track_window,term_criteria)
        pts = cv2.boxPoints(ret)
        pts = np.int0(pts)
        img2 = cv2.polylines(frame,[pts],True,(0,0,255),5)
  • essentialy we draw a polyline and use CAMshift

Lecture 63 - Overview of various Tracking API Methods

  • Boosting Tracker
  • MIL Tracker
  • KCF Tracker
  • TDL Tracker
  • Median Flow Tracker
  • There a re many Object Tracking methods
  • Fortunately, many have been designed as simple API calls with openCV
  • We ll explore a few of these easy to use Object Tracking APIs and in next lect we ll use them with OpenCV
  • BOOSTING TRACKER:
    • based off AdaBoost algorithm (the same underlying algorithm that the HAAR Cascade based Face Detection used)
    • Evaluation occurs across multiple frames
    • Pros: very well known and studied algorithm
    • Cons: Does not know when tracking has failed, there are many better techniques
  • MIL TRACKER:
    • Multiple Instance Learning
    • Similar to BOOSTING, but considers a neighborhood of points around the current location to create multiple instances
    • Check the project page for details
    • Pros: good performance and does not drift as much as BOOSTING
    • Cons: failure to track an object may not be reported back, cannot recover from full obstruction
  • KCF TRACKER:
    • Kernelized Correlation Filters
    • Exploits some properties of the MIL Tracker and the fact that many data points will overlap, leading to more accurate and faster tracking
    • Pros: better than MIL and BOOSTING, Great First Choice
    • Cons: Cnnot recover from full obstruction of object
  • TDLD TRACKER:
    • Tracking, learning and Detection
    • The Tracker follows the object from frame to frame
    • The Detector localized all appearances that have been observed so far and corrects the tracker if necessary
    • The learning estimates detector's errors and updates it to avoid these errors in the future Pros: Good at tracking even with obstruction in frames, tracks well under large changes in scale
    • Cons: Can provide many false positives
  • Median Flow Tracker:
    • Internally, this tracker tracks the object in both forward and backward directions in time and measures the discrepancies between these two trajectories
    • Pros: very good at reporting failed tracking, works well with predictable motion
    • Cons: Fails under large motion (fast moving objects)

Lecture 64 - Tracking APIs with OpenCV

  • we will use the course notebook
  • we can select tracker at runtime
  • we draw manually our roi using 'cv2.selectRoi(frame,False)'
  • we use the roi to initialize tracker ret = tracker.init(frame,roi)
  • in the loop we update tracker with new frames success, roi = tracker.update(frame)

Section 8 - Deep Learning for Computer Vision

Lecture 65 - Introduction to Deep Learning for Computer Vision

  • Section Topics and Goals
    • High level overview of Machine Learning
    • Overview od understanding classification metrics
    • Cover Deep Learning Basics
    • Keras Basics
    • MNIST Data Overview
    • CNN Theory
    • Keras CNN
    • Deep Learning on Custom Image Files
    • Understanding YOLO v3
    • YOLO v3 with Python

Lecture 66 - Machine Learning Basics

  • Before diving into Deep Learning, lets work on understanding the general machine learning process we will be using
  • The specific case of machine learning we will be conducting is known as supervised learning
  • Machine Learning is a method of data analysis that automates analytical model building
  • using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look
  • Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known
  • E.g. a picture have a category label such as either a Dog or Cat
  • The Learning algorithm receives a set of inputs along with the corresponding correct outputs and the algorithm learns by comparing its actual output with correct outputs to find errors
  • It then modifies the model accordingly
  • Supervised learning is used in apps where hist data predicts futrure events
  • Data Acq -> Data Cleaning => repeat[model training and building -> Model testing w/test data ] -> model deployents
  • Image classification and recognition is a very common and widely applicable use of deep learning and machine learning with OpenCV and Keras
  • We continue by learning how to evaluate a classification task

Lecture 67 - Understanding Classification Metrics

  • We learned that after our machine leerarning process is complete, we will use performance metrics to evaluate how our model did
  • The key classification metrics we need to understand are:
    • accuracy
    • recall
    • precision
    • F1-score
  • Typically in any classification task our model can only achieve 2 results
  • either model was correct in its prediction
  • or our model was incorrect in its prediction
  • correct and incorrect expands in situations where we have multiple classes
  • for the purposes of explainin the metrics lets imagine binary classification situation where we have 2 available classes
  • in our example we will try to predict if an image is a dog or cat
  • as we deal with supervised learn we fit/train a model on training data and then test model on testing data
  • once we have the model predictions from the X_test data we compare them with the trye yvals (correct labels)
  • we repeat test process for all images in our test data
  • at the end we will have a count of correct matches and a count of incorrect matches.
  • the key point to take is that in real world not all incorrect or correct matches hold equal value
  • we could organize our predicted values vs the real values in a confusion matrix
  • Accuracy:
    • is the number of correct predictions made by the model / total number of predictions
    • is useful when target classes are well balanced
    • its not a good choice with unbalanced classes
    • = 1 - error rate (misclassification rate)
  • Recall:
    • ability of a model to find all relevant cases within a dataset.
    • number of true positives / num of true positives + num of false negatives
  • Precision:
    • ability of a classification model to identify only the relevant data points
    • num of true positives / num of true positives + num of false positives
  • Recall vs Precision:
    • often we have a trade-off between then
    • while recall expresses the abilityto find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant vs what actually was relevant
  • F1-score:
    • in cases where we want to find an optimal blend of precision and recall we can combine the two metrics using the F1 score
    • it is the harmonic mean of precision and recall (F1 = 2 * (precision * recall)/(precision + recall))
    • we use harm mean vs average it punishes extreme values
    • a classifier with precision 1 and recall 0 has average of 0.5 but F1 is 0
  • Comfusion Matrix
    • FP (False Positive) Type I error (prediction postive VS condition negative)
    • FN (False Negative) Type II error (prediction negative VS condition positive)

Lecture 68 - introduction to Deep Learning Topics

  • We will cover
    • neurons
    • Neural networks
    • Cost Function
    • Gradient Descent and BackPropagation

Lecture 69 - Understanding a Neuron

  • We skip 69-72 (see PythoDSML and Tesorflow Courses notes)

Lecture 73 - Keras Basics

  • We ll learn how to create a machine learning model with keras
  • we ll start with some data on currency bank notes
  • some of these bank notes were forgeries and others were legit
  • researchers created a dataset from these banknotes by taking images of the notes and then extracting various numerical features based on the wavelets of the images
  • the dataset is not images.
  • we are doing general machine learning using Keras
  • when we learn about CNN then we can expand on Keras to feed in image data (pixel images) into a network
  • we open a notebook anmd import
import numpy as np
from numpy import genfromtxt
  • we import data from csv using genfromtxt numpy method seting the delimiter to comma data = genfromtxt('../DATA/bank_note_data.txt',delimiter=',')
  • our data is a (1372,5) array where last column contains the classes 0. = forgery 1. = legit
  • first we need to separate teh label from the actual features
y = labels = data[:,4] 
X = features = data[:,:4]
  • i will now have to split my data to the train and test sets. we use sklearn lib to do it
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
  • when we work with NNs its a good idea to standardize or scale the data.we use sklearn for that
from sklearn.preprocessing import MinMaxScaler
scaler_object = MinMaxScaler()
scaler_object.fit(X_train)
scaled_X_train = scaler_object.transform(X_train)
  • when we scale we always fit on train data (unless is data leakage to the model)
  • all scaled feat vals now are between 0 and 1
  • we start building our keras model with importing our DNN model and our layers type
from keras.models import Sequential
from keras.layers import Dense
  • we create our model and add layers to it
model = Sequential()
model.add(Dense(4,input_dim=4,activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
  • we compile our model adding loss method, oprimizer, and metrics model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  • its time to fit or train the model `model.fit(scaled_X_train,y_train,epochs=50,verbose=2) we set the num of epochs
  • to get the predictions (y_test_pred) model.predict_classes(scaled_X_test)
  • to get the model metrics model.metrics_names
  • we import confusion matrix to get the report of metrics and print the metrics
from sklearn.metrics import confusion_matrix, classification_report
predictions = model.predict_classes(scaled_X_test)
confusion_matrix(y_test,predictions)
print(classification_report(y_test,predictions))
  • to save the model for production model.save('my_banknote_classification_model.h5')
  • to reuse the model we load it with keras
from keras.models import load_model
newmodel = load_model('my_banknote_classification_model.h5')

Lecture 74 - MNIST Data Overview

  • A classic data set in Deep Learning is the MNIST data set
  • We ll quickly cover some basics about it since w ll be using similar data concepts quite frequently during this section of the course
  • The data set is easy to accesss with Keras. it has 60k training images 10k test images
  • it contains hand written single digits from 0 to 9
  • a single digit image can be represented as a numpy array
  • they are 28x28 and have 1 color channel normalized (0-1)
  • our input tensor is 4D (60000,28,28,1) or (samples,x,y,channels) for color images the las dimension would be 3
  • for labels we ll use hot encoding. instad of having labels like 'one' 'two' etc w ll have a single array for each image. the orgiginal labels are given as list of nums [5,4,5,...,7,2] we will convert them to on-hot encoding (easy to do with Keras)
  • Hot encoding:
    • the label is represented based of the index position in the label array
    • the corresponding label will be 1 at the index location and zero elsewere
    • eg 4 will have this label array [0,0,0,0,1,0,0,0,0,0]
    • works well with sigmoid
  • As a result labels for training data ends up being a 2D array (60000,10)

Lecture 75 - Convolutional Neural Networks Overview - Part One

  • we just created a NN for already defined features
  • what if we have the raw image data?
  • we need to learn about CNNs in order to effectively solve the problems that image data can present
  • just like the simple perceptron, CNNs also have their origins in biological research
  • Hubel and Wiesel studied the structure of the visual cortex in mammals winning a Nobel prize in 1981
  • their research revealed that neurons in the visual cortex had a small local receptive field (they are looking ata a subset of the image a person is viewing). these sections overlap and create the larger image
  • these neuron in the visual cortex are only activated when they detect certtain things (e.g a horizontal line,a black circle etc)
  • this idea then inspired an ANN architecture that would become CNN
  • This architecture was implemented by Yann LeCun in 1998
  • THe LeNet-5 architecture was first used to classify the MNIST data set
  • When we learn about CNNs we often see a diagram with subsampling/pooling and convolution layers generating feature maps from an image
  • Topics:
    • Tensors
    • DNN vs CNN
    • Convolutions and Filters
    • Padding
    • pooling Layers
    • Review Dropout
  • Tensors are N-Dimensional Arrays that we build up to as we increase the level of dimension:
    • Scalar -3
    • Vector - [3,4,5]
    • Matrix - [[3,4],[5,6],[7,8]]
    • Tensor - [[1,2],[3,4]],[[5,6],[7,8]]
  • Tensors make it very conveninent to feed in sets of images into our model - (I,H,W,C)
    • I: Images
    • H: Height of Image in Pixels
    • W: Width of Image in Pixels
    • C: Color Channels: 1-Grayscale, 3-RGB
  • Lets explore the difference between a Densely Connected Neural Network and a Convolutional Neural Network
  • Recall that we've aleady been able to create DNNs with tf.estimator API
  • In a DNN every neuron in one layer is connected to every neuron in next layer
  • IN a CNN each unit is connected to a smaller number of nearby units in next layer inspired by biology that in visual cortex that we only look at local receptive fields
  • Why CNN? MNIST dataset is 28x28=784 . most images are at least 256x256 = >56k. this leads to too many params unscalable to new images
  • Convolutions also have a major advantage for image processing, where pixels nearby to each other are much more correlated to each other for image detection
  • Each CNN layer looks at an increasingly larger part of the image
  • Having units only connected to nearby units also aids in invariance
  • CNN also helps with regularization by limiting the search of weights to the size of the convolution
  • Lets explore how the convolutional neural network relates to image recognition
  • We start with the input layer, the image itself
  • Convolutional layers are only connected to pixels in their respective fields
  • we run into a possible issue for edge neurons. there may not be an input there for them. we can fix this by adding a padding of 0x around the image
  • Converting a DNN to CNN with 1-D convolution. we have only local connections to next layer. the weights of the connections work as filters (e.g for edge detection)
  • our filters have a size (how many neuron take part) and a stride hoew many neuron to skip to the next group
  • we can stack multiple filters (conceptually verticaaly) adding a dimension to our tensors
  • Each filter detects a different filter
  • we describe and visualize these sets of neurons as sets of blocks
  • In 2D convolutions (images) our layers (tensor) have 3D (FxWxH) if we have color image we add a dimension
  • subsections of theimage translate to sections of the tensor (layer)
  • Convolutional fitlers are commonly visualized as a grid system (direclty as image processing with kernels)

Lecture 76 - Convolutional Neural Networks Overview - Part Two

  • we saw what convolutions are
  • we ll see what subsampling (pooling) layers
  • Pooling layers will subsample the input image, which reduces the memory use4 and computer load as well as reducing the number of params
  • say we have a layer of pixels in our inpute image
  • for our MNIST digits set, each pixel had a value representing darkness
  • we create a 2x2 pool (or 3x3 or XxX) of pixels and ealuate the max val. only the max val makes it to the next layer
  • the pooling layer removes a lot of info. even a 2x2 pooling kernel with stride of 2 removes 75% of the input data
  • Another technique deployed by CNN is Dropout
  • Dropout can be thought as a form of regularization to help prevent overfitting
  • During training, units are randomly dropped, along with their connections
  • This helps prevent units from co adapting too much
  • We ll see some famous CNN architectures
    • LeNet-5 by Yann LeCun
    • AlexNet by Alex Krizhevsky et al.
    • GoogleNet by Szegedy at Google Research
    • ResNet by Kaiming He et al

Lecture 77 - Keras Convolutional Neural Networks with MNIST

  • we open a new network
  • we import mnist dataset from keras from keras.datasets import mnist
  • we load train and test data (x_train,y_train),(x_test,y_test) = mnist.load_data()
  • we check shape of x_train x_train.shape is (60000,28,28) there is no color channel
  • we import matplotlib and plot the first image plt.imshow(x_train[0,:,:],cmap='gray')
  • the y_train is (60000,0) so esentyally a 1d array of nums 0-9
  • we want to hot encode them as if we feed them like this then network will get confuses as if its a regression rpoblem
  • to hot encode we import from keras.utils.np_utils import to_categorical
  • we do the hotencoding
y_cat_test = to_categorical(y_test,10)
y_cat_train = to_categorical(y_train,10)
  • y_cat_test.shape is (10000,10)
  • our train data are not normalized. x_train[0].max() is 255 . we normalize it to be 0-1 in a way that we dont need sklearn
x_train = x_train / x_train.max()
x_test = x_test / x_test.max()
  • i need to reshape the data to convert it to be fed to a general network. we add the color channel. x_train = x_train.reshape(60000,28,28,1) we do sam e for x_test
  • we start building our model
from keras.models import Sequential
from keras.layers import Dense,Conv2D,MaxPool2D,Flatten
  • we create the model
model = Sequential()
# Convolutional Layer
model.add(Conv2D(filters=32,kernel_size=(4,4),input_shape=(28,28,1), activation='relu'))
# Pooling Layer
model.add(MaxPool2D(pool_size=(2,2)))
# Flatten out 2D --> 1D - Prepare for DNN feed
model.add(Flatten())
# Dense Layer
model.add(Dense(128,activation='relu'))
# Output Layer - Classifier
model.add(Dense(10,activation='softmax'))
# Compile
model.compile(loss='categorical_crossentropy',
             optimizer='rmsprop',
             metrics=['accuracy'])
  • we get the summary of model model.summary()
  • we train our model model.fit(x_train,y_cat_train,epochs=2)
  • we evaluate model.evaluate(x_test,y_cat_test)
  • we build the reports
from sklearn.metrics import classification_report,confusion_matrix
predictions = model.predict_classes(x_test)
print(classification_report(predictions,y_test))

Lecture 78 - Keras Convolution Neural Networks with CIFAR-10

  • we import the dataset from keras.datasets import cifar10
  • we load the dataset (x_train,y_train),(x_test,y_test) = cifar10.load_data()
  • we chexk shape x_train.shape (50000, 32, 32, 3) and x_train[0].max() so is unscaled
  • we normalize
x_train = x_train / x_train.max()
x_test = x_test / x_test.max()
  • we check labels the y are normal integer category forms. we ll hot encode them
from keras.utils.np_utils import to_categorical
y_cat_test = to_categorical(y_test,10)
y_cat_train = to_categorical(y_train,10)

we we build our model

from keras.models import Sequential
from keras.layers import Dense,Conv2D,MaxPool2D,Flatten
model = Sequential()
model.add(Conv2D(32,kernel_size=(4,4),input_shape=(32,32,3),activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(32,kernel_size=(4,4),input_shape=(32,32,3),activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(10,activation='softmax'))
model.compile(loss='categorical_crossentropy',
             optimizer='rmsprop',
             metrics=['accuracy'])
  • we train for 2 epochs model.fit(x_train,y_cat_train,verbose=1,epochs=2)
  • we load a pretrained model
from keras.models import load_model
new_model = load_model('../../Computer-Vision-with-Python/06-Deep-Learning-Computer-Vision/cifar_10epochs.h5')
  • we evaluate both. 2 epochs 0.59 acc 10eopchs 0.64 acc

Lecture 80 - Deep Learning on Custom Images - Part One

  • in real world apps we will have to work with real images ray jpeg images
  • we will use real images
  • in the CATS_DOGS folder with data there are two folders 'train' and 'test' each with 'CAT' and 'DOG' subfolders
  • this is the default way of adding classification data in keras. also hav eto put it in the kjupyter notebook folder
  • we open notwebook and import cv2 and matplotlib
  • we load a cat from train folder cat4 = cv2.imread('CATS_DOGS/train/CAT/4.jpg') we clor corect it and show it
  • we do the same for a dog image
  • we note that images have different shapes
  • we need to prepair the data for the model
  • keras has a method that read data and prepares aflow of batches to pass in the model from keras.preprocessing.image import ImageDataGenerator
  • image_geenrator also creates variations of the images for a stronger model
  • we create an image generator passin in a lot of alteration to the image as params
image_gen = ImageDataGenerator(rotation_range=30,
                              width_shift_range=0.1,
                              height_shift_range=0.1,
                              rescale=1/255,
                              shear_range=0.2,
                              zoom_range=0.2,
                              horizontal_flip=True,
                              fill_mode='nearest')
  • we use it on the dog image image_gen.random_transform(dog2) and plot it. every time we run it we get a different modified version of dog2
  • next we will create a lot of modified images from our train directory. we use image_gen.flow_from_directory('CATS_DOGS/train') which creates a constant feed of images (randomized) to the model
  • it returns a DirecoryIterator object. it also found how many classes (num of folders)

Lecture 81 - Deep Learning on Custom Images - Part Two

  • We will instroduce some slightly different imports to reflect the most recent changes to Keras lib.
  • THese are just a few different imports: MaxPool2D => MaxPooling2D, Adding activation functions separately
  • we import model and layers (keras v2.2 style)
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout,Flatten,Conv2D,MaxPooling2D
  • we create the model and add layers
model = Sequential()
model.add(Conv2D(filters=32,kernel_size=(3,3),input_shape=(150,150,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=(150,150,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=(150,150,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])
  • we set input_shape=(150,150,3) we will add this to image generator flow from idr params. so that we get fed with images resized to have a uniform size
  • we add dropout layer to avoid overfitting
  • we select batch size and the input shape (same like in model)
input_shape = (150,150,3)
batch_size = 16
  • we create our generator for train and test
train_image_gen = image_gen.flow_from_directory('CATS_DOGS/train',
                                               target_size=input_shape[:2],
                                               batch_size = batch_size,
                                               class_mode='binary')
test_image_gen = image_gen.flow_from_directory('CATS_DOGS/test',
                                               target_size=input_shape[:2],
                                               batch_size = batch_size,
                                               class_mode='binary')
  • generator objects are loaded with attributes train_image_gen.class_indices shows us which num belongs to which class
  • we train our model using the generator
results = model.fit_generator(train_image_gen,epochs=1,steps_per_epoch=150,
                             validation_data=test_mage_gen,validation_steps=12)
  • we set steps per epoch to limit the size of the epoch. in our case 150 batches of 16
  • we will also run our validation in same run with 12 steps of 16
  • to ignore warnings
import warnings
warnings.filterwarnings('ignore')
  • we have thre results so we can evaluate thte model
  • we can see its accuracy in 1st epoch validation with results.history['acc']
  • we load a pretrained model for 100 epochs
from keras.models import load_model
new_model = load_model('../../Computer-Vision-with-Python/06-Deep-Learning-Computer-Vision/cat_dog_100epochs.h5')
  • we do prediction using the pretrained model
    • we get an image path dog_file = 'CATS_DOGS/test/DOG/10005.jpg'
    • we import image preproc from keras from keras.preprocessing import image
    • we load image resizing it dog_img = image.load_img(dog_file,target_size=(150,150))
    • we conver it to array dog_img = image.img_to_array(dog_img)
    • we reshape the array so that keras thinks its a batch of 1 image dog_img = np.expand_dims(dog_img,axis=0)
    • shape now is (1,150,150,3)
    • we normalize it dog_img = dog_img /255
    • we do the prediction new_model.predict_classes(dog_img) it is correct id give class 1 (dog)
    • how sure it was? new_model.predict(dog_img)

Lecture 82 - Deep Learning and Convolutional Neural Networks Assessment

Lecture 84 - Introduction to YOLO v3

  • Let's learn about the state of the art image detection algorithm known as YOLO (You Only Look Once)
  • YOLO can view an image and draw bounding boxes over what it perceives as identified classes
  • In this lecture we will use version 3 of the YOLO Object Detection Algo, which is improved in terms of accuracy and speed
  • What makes YOLO different?
    • prior detection systems repurpose classifiers or localizers to perform detection
    • they apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections
    • YOLO uses a totally different approach. Wea pply a single neural network to the the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. these bounding boxes are weighted by the predicted probabilities
  • YOLO has several advantages over classifier-based systems
  • It looks at the whole image at test time so its predictions are informed by global context in the image
  • It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it xtremely fast more than 1000x faster than R-CNN and 100x fastr that Fast R-CNN
  • In next version we will load an already trained YOLO model and see how we can use it with either image or video data
  • We've set up an easy to use notebook. we just have to download the model weights file

Lecture 86 - YOLO v3 with Python

  • Let's explore how to implement YOLO v3 with Python
  • we ll be using an implementation of YOLO v3 that has been trained on the COCO dataset
  • COCO dataset has 1.5million object instances with 80 different obj.categories
  • will use a YOLO v3 pretrained model to explore its capabilities
  • We need many many days and a high end computer to train such a model
  • this model is extremely complex 200MB h5 file
  • we will place the yolo.h5 in the DATA dir of the YOLO folder
  • we will use a ready notebook with easy to call functions
  • COCO dataset
  • COCO paper
  • YOLO v3 paper
  • we do the imports
import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO
  • we do image processing to prepare the input image for the model (frame or image we provide)
def process_image(img):
    """Resize, reduce and expand image.

    # Argument:
        img: original image.

    # Returns
        image: ndarray(64, 64, 3), processed image.
    """
    image = cv2.resize(img, (416, 416),
                       interpolation=cv2.INTER_CUBIC)
    image = np.array(image, dtype='float32')
    image /= 255.
    image = np.expand_dims(image, axis=0)

    return image
  • we get the classes from a text file (that we provide)
def get_classes(file):
    """Get classes name.

    # Argument:
        file: classes name for database.

    # Returns
        class_names: List, classes name.

    """
    with open(file) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]

    return class_names
  • we have the draw function that will fraw on the picture based on teh model outputs
def draw(image, boxes, scores, classes, all_classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        x, y, w, h = box

        top = max(0, np.floor(x + 0.5).astype(int))
        left = max(0, np.floor(y + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(all_classes[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 1,
                    cv2.LINE_AA)

        print('class: {0}, score: {1:.2f}'.format(all_classes[cl], score))
        print('box coordinate x,y,w,h: {0}'.format(box))

    print()
  • we have two methods one for images and one for video. the take the image the model the classes and return or draw the results
def detect_image(image, yolo, all_classes):
    """Use yolo v3 to detect images.

    # Argument:
        image: original image.
        yolo: YOLO, yolo model.
        all_classes: all classes name.

    # Returns:
        image: processed image.
    """
    pimage = process_image(image)

    start = time.time()
    boxes, classes, scores = yolo.predict(pimage, image.shape)
    end = time.time()

    print('time: {0:.2f}s'.format(end - start))

    if boxes is not None:
        draw(image, boxes, scores, classes, all_classes)

    return image
  • we load the model and the classes
yolo = YOLO(0.6, 0.5)
file = 'data/coco_classes.txt'
all_classes = get_classes(file)
  • this runs '/model/yolo.py' which runs the model
  • the params 0.6 is obj threshold and 0.5 is nms thresh
  • lower threshold is more detections but also more prone to errors
  • code for detecting in image
f = 'person.jpg'
path = 'images/'+f
image = cv2.imread(path)
image = detect_image(image, yolo, all_classes)
cv2.imwrite('images/res/' + f, image)

Section 9 - Capstone Project

Lecture 87 - Introduction to Capstone Project

  • We will be creating a program that can detect a hand, segment the hand and count the number of fingers being held up

Lecture 88 - Capstone Part One - Variables and Background function

  • first we will define some global variables
  • after. we will setup a function that updates a running average of the background values in an ROI
  • This will later on allow us to detect new objects (hand) in the ROI
  • in an empty frame we draw a ROI. we wait 60sec for the avg of the background in the roi ti be calculated.
  • then we enter our hand. it can be detected by the change in the backgrouns.
  • we aply thresholding
  • Strategy for counting fingers
    • grab the ROI
    • calculate a running average background val for 60 frames of video
    • once the avg value s found then the hand can enter the ROI once the hand enters the ROI, we will ise a convex hull to draw a polygon around the hand
    • we ll then calculate the center of the hand
    • then using math we will calculate the center of the hand against the angle of outer points to infer the finger count
  • we start a new notebook
  • we do our imports
import cv2
import  numpy as np
from sklearn.metrics import pairwise
  • we create our global variables
background = None
accumulated_weight = 0.5
roi_top = 20
roi_bottom = 300
roi_right = 300
roi_left = 600
  • our roi is preset
  • we add a function to calculate the background value in the roi
def calc_accum_avg(frame,accumulated_weight):
    global background
    if (background is None):
        background = frame.copy()
        return None
    cv2.accumulateWeighted(frame,background,accumulated_weight)
  • this method calculates the accumulated weight using a running average of passed frame (and the background).it updtates the global accumulated weight. the first time it sets the background equal to the frame

Lecture 89 - Capstone Part Two - Segmentation

About

Course Study Repo for Python for Computer Vision with OpenCV and Deep Learning Course


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%