1. Setup
The libraries are needed for learning OpenCV
import numpy as np
import cv2
from matplotlib import pyplot as plt
2. Operations on image
- Load an image: img = cv2.imread('coins.png')
- Get a point at (x,y): img[x,y] -> BGR value
- Access BLUE (BGR) value: img.item(x,y,0)
and access GREEN (BGR) value: img.item(x,y,1)
and access RED (BGR) value: img.item(x,y,2)
- Make all the red pixels to zero using numpy: img[:,:,2] = 0 (for BGR image)
- Split into BGR values: b,g,r = cv2.split(img)
Note: cv2.split() is a costly operation (in terms of time). So do it only if you need it. Otherwise go for Numpy indexing.
- Merge BGR values to img = cv2.merge((b,g,r))
3. Image properties
- img.shape => Shape of image
Note: If image is grayscale, tuple returned contains only number of rows and columns. So it is a good method to check if loaded image is grayscale or color image.
- img.size => Total number of pixels
- img.dtype => Image datatype (e.g: uint8)
Note: img.dtype is very important while debugging because a large number of errors in OpenCV-Python code is caused by invalid datatype.
4. Image ROI
Get ROI (280:340, 330:390): img[280:340, 330:390]
5. Bitwise Operations
This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting any part of the image, defining and working with non-rectangular ROI etc.
Apply bitwise operations to extract background and foreground of image "test.png" below:
6. Changing Colorspaces
Just focus on most widely used ones BGR ↔ Gray, BGR ↔ HSV
- BGR → Gray conversion we use the flags cv2.COLOR_BGR2GRAY
We knew how to convert BGR → Gray in previous example.
- BGR → HSV, we use the flag cv2.COLOR_BGR2HSV.
In HSV, it is more easier to represent a color than in BGR color-space. We can use this to extract a colored object
Note: To get all flags related to COLOR_
Apply BGR → HSV to extract red foreground of picture "opencv_logo.png".
- Convert from BGR to HSV color-space
- We threshold the HSV image for a range of red color
- Now extract the red object alone.
Note: How to find HSV values to track?
The result is [[[ 0 255 255]]]
Now take [0, 100,100] and [H+10, 255, 255]
So the code is:
This method only applies for a bimodal image (an image whose histogram has two peaks). This method approximately takes a value in the middle of those peaks as threshold value. For images which are not bimodal, binarization won't be accurate. We will use cv.threshold(), but add extra flag cv.THRESH_OTSU. That is cv.THRESH_BINARY+cv.THRESH_OTSU.
8. Smoothing Image
Images also can be filtered with low-pass filters(LPF), high-pass filters(HPF) etc.
- LPF helps in removing noises, blurring the images etc.
- HPF filters helps in finding edges in the images.
8.1 2D Convolution (Image Filtering)
In OpenCV, we use cv.filter2D() to convolve a kernel with an image.
Reuse the example using Otsu method, we replace cv.GaussianBlur(gray_img,(5,5),0) with cv.filter2D. Using a 5x5 averaging filter kernel will look like below:
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1
The result shoul be similar to Otsu method.
8.2 Image Blurring (Image Smoothing)
Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing high frequency content (eg: noise, edges) from the image. So edges are blurred a little bit in this operation.
OpenCV provides mainly four types of blurring techniques.
8.2.1 Averaging
This is done by the function cv.blur() or cv.boxFilter(). We should specify the width and height of kernel. A 5x5 normalized box filter would look like below:
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1
Reuse the example above but using 5x5 normalized box filter.
The result should be similar to example above.
8.2.2 Gaussian Blurring
Instead of using box filter, using gaussian kernel. Gaussian blurring is effective in removing Gaussian noise from the image.
OpenCV provides function cv.GaussianBlur() with main parameters:
- The width and height of kernel which should be positive and odd.
- The standard deviation in X and Y direction, sigmaX and sigmaY respectively. If both are given as zeros, they are calculated from kernel size.
- A Gaussian kernel can be created with cv.getGaussianKernel().
Reuse the example of Otsu method.
8.2.3 Median Blurring
This is effective against salt-and-pepper noise in the images. It takes median of all the pixels under kernel area and central element is replaced with this median value. Its kernel size should be a positive odd integer.
In the above filters, newly calculated central element may be a pixel value in the image or a new value. But in median blurring, central element is always replaced by some pixel value in the image.
The result should be similar to example above
8.2.4 Bilateral Filtering
It is effective in noise removal while keeping edges sharp. The operation is slower compared to other filters.
Bilateral filter uses 2 Gaussian filters:
- Gaussian filter in space make sure only nearby pixels are considered for blurring.
- Gaussian function of intensity difference make sure only those pixels with similar intensity to central pixel is considered for blurring. So it preserves the edges since pixels at edges will have large intensity variation.
After running this code and notice the edges of black square you will see the difference comparing to other filters.
It is the difference between dilation and erosion of an image.
Laplacian Operator is also a derivative operator which is used to find edges in an image.
The major difference between Laplacian and other derivative operators like Prewitt, Sobel, Robinson and Kirsch is that these all are first order derivative masks but Laplacian is a second order derivative.
Laplacian don’t find edges in any particular direction (X or Y) but it find edges in classifications:
+ Inward Edge
+ Outward Edge
The Laplacian of the image given by the formula:
$\bigtriangledown src=\frac{\delta ^{2}src}{\delta x^{2}}+\frac{\delta ^{2}src}{\delta y^{2}}$
where each derivative is found using Sobel derivatives.
Canny Edge Detection is a popular edge detection algorithm. With steps:
+ Noise Reduction
+ Edge detection is susceptible to noise in the image, first step is to remove the noise in the image with a 5x5 Gaussian filter.
+ Finding Intensity Gradient of the Image.
+ Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to get first derivative in horizontal direction ($G_{x}$) and vertical direction ($G_{y}$). From these two images, we can find edge gradient and direction for each pixel as follows:
Edge_Gradient (G) = $\sqrt{G_{x}^{2}+G_{y}^{2}}$
Angle ($\theta$) = $tan^{-1}=\frac{G_{y}}{G_{x}}$
Gradient direction is always perpendicular to edges.
+ Non-maximum Suppression: after getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. Every pixel is checked if it is a local maximum in its neighborhood in the direction of gradient. If so, it is considered for next step, otherwise, it is put to zero (suppressed).
Point A is on the edge. Gradient direction is normal to the edge. Point B and C are in gradient directions. So point A is checked with point B and C to see if it forms a local maximum.
+ Hysteresis Thresholding: decides which edges are really edges or not. For this, we define two threshold values (minVal and maxVal). Any edges with intensity gradient greater than maxVal are sure to be edges and those smaller than minVal are sure to be non-edges. For edges between these two thresholds, if they are connected to "real-edge" pixels, they are considered to be part of edges. Otherwise, they are also non-edges.
OpenCV provides cv.Canny() for Canny Edge Detection with parameters:
+ First argument is our input image.
+ Second and third arguments are our minVal and maxVal respectively.
+ Third argument is the size of Sobel kernel used for find image gradients. By default it is 3.
+ Last argument is L2gradient which specifies the equation for finding gradient magnitude.
Apply Canny Edge Detection for the image below:
Write a small application to find the Canny edge detection whose threshold values can be varied using two trackbars.
Consider example when searching face in an image, we are not sure what size the face will be present. So we will create a set of images with different resolution and search for face in all the images. These set of images with different resolution are called Image Pyramids (a stack with biggest image at bottom and smallest image at top).
We have 2 kinds:
+ Gaussian Pyramid : Higher level (Low resolution) is formed by removing consecutive rows and columns in Lower level (higher resolution) image. a M×N image becomes M/2×N/2 image. So area reduces to one-fourth of original area. It is called an Octave. OpenCV provides cv2.pyrDown() and cv2.pyrUp() functions.
+ Laplacian Pyramid : A level in Laplacian Pyramid is formed by the difference between that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid. Laplacian pyramid images are like edge images only.
7.3 Otsu's Binarization

This method only applies for a bimodal image (an image whose histogram has two peaks). This method approximately takes a value in the middle of those peaks as threshold value. For images which are not bimodal, binarization won't be accurate. We will use cv.threshold(), but add extra flag cv.THRESH_OTSU. That is cv.THRESH_BINARY+cv.THRESH_OTSU.
9.5. Morphological Gradient

It is the difference between dilation and erosion of an image.
We manually created a rectangular shape kernel in the previous examples with help of Numpy. But if you need elliptical/circular shaped kernels, you can use OpenCV function cv.getStructuringElement(). Just pass the shape and size of the kernel, you get the desired kernel.
10. Image Gradients
Support to find Image gradients, edges.
OpenCV provides three types of gradient filters or High-pass filters: Sobel, Scharr and Laplacian.
10.1. Sobel and Scharr Derivatives
Sobel operators = Gausssian smoothing + differentiation operation. So it is more resistant to noise. It is also a derivate mask and is used for edge detection. It is used to detect two kinds of edges in an image:
+ Vertical direction
+ Horizontal direction
+ Horizontal direction
Sobel operators need to specify:
- The direction of derivatives to be taken (vertical or horizontal by the arguments).
- The size of kernel by the argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel filter.
Let 's take Sobel Derivative of the image:
Figure: Input for Sobel Derivative
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Sobel Derivative in X direction sobelx = cv.Sobel(gray_img,cv.CV_8U,1,0,ksize=5) #take Sobel Derivative in Y direction sobely = cv.Sobel(gray_img,cv.CV_8U,0,1,ksize=5) #display result image cv.imshow('sobelx',sobelx) cv.imshow('sobely',sobely) cv.waitKey(0) cv.destroyAllWindows() |
Figure: Sobel Derivative in X and Y directions
You can see that Sobel in X detects edges following X direction, Sobel in Y detects edges following Y direction. But the results missed some edges. The reason is that output datatype is uint8 (check by using print(sobelx.dtype)). Moreover Black-to-White transition is taken as Positive slope (it has a positive value) while White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert data to np.uint8, all negative slopes are made zero. So you miss that edge.
In order to fix it, you need to upgrade the output datatype to some higher forms (cv.CV_16S, cv.CV_64F), take its absolute value and then convert back to cv.CV_8U.
So the new code is;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Sobel Derivative in X direction sobelx64f = cv.Sobel(gray_img,cv.CV_64F,1,0,ksize=5) #take Sobel Derivative in Y direction sobely64f = cv.Sobel(gray_img,cv.CV_64F,0,1,ksize=5) #absolute value and then convert back to cv.CV_8U abs_sobelx64f = np.absolute(sobelx64f) sobelx_8u = np.uint8(abs_sobelx64f) abs_sobely64f = np.absolute(sobely64f) sobely_8u = np.uint8(abs_sobely64f) #display result image cv.imshow('sobelx',sobelx_8u) cv.imshow('sobely',sobely_8u) cv.waitKey(0) cv.destroyAllWindows() |
Figure: edges were recovered
10.2. Laplacian Derivatives

Laplacian Operator is also a derivative operator which is used to find edges in an image.
The major difference between Laplacian and other derivative operators like Prewitt, Sobel, Robinson and Kirsch is that these all are first order derivative masks but Laplacian is a second order derivative.
Laplacian don’t find edges in any particular direction (X or Y) but it find edges in classifications:
+ Inward Edge
+ Outward Edge
The Laplacian of the image given by the formula:
$\bigtriangledown src=\frac{\delta ^{2}src}{\delta x^{2}}+\frac{\delta ^{2}src}{\delta y^{2}}$
where each derivative is found using Sobel derivatives.
Let 's take Laplacian Derivative of the image:
Figure: Input for Laplacian Derivative
1 2 3 4 5 6 7 8 9 10 11 12 | import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Laplacian laplacian = cv.Laplacian(gray_img,cv.CV_64F) #display result image cv.imshow('laplacian',laplacian) cv.waitKey(0) cv.destroyAllWindows() |
Figure: Input for Laplacian Derivative
11. Canny Edge Detection

Canny Edge Detection is a popular edge detection algorithm. With steps:
+ Noise Reduction
+ Edge detection is susceptible to noise in the image, first step is to remove the noise in the image with a 5x5 Gaussian filter.
+ Finding Intensity Gradient of the Image.
+ Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to get first derivative in horizontal direction ($G_{x}$) and vertical direction ($G_{y}$). From these two images, we can find edge gradient and direction for each pixel as follows:
Edge_Gradient (G) = $\sqrt{G_{x}^{2}+G_{y}^{2}}$
Angle ($\theta$) = $tan^{-1}=\frac{G_{y}}{G_{x}}$
Gradient direction is always perpendicular to edges.
+ Non-maximum Suppression: after getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. Every pixel is checked if it is a local maximum in its neighborhood in the direction of gradient. If so, it is considered for next step, otherwise, it is put to zero (suppressed).
Point A is on the edge. Gradient direction is normal to the edge. Point B and C are in gradient directions. So point A is checked with point B and C to see if it forms a local maximum.
+ Hysteresis Thresholding: decides which edges are really edges or not. For this, we define two threshold values (minVal and maxVal). Any edges with intensity gradient greater than maxVal are sure to be edges and those smaller than minVal are sure to be non-edges. For edges between these two thresholds, if they are connected to "real-edge" pixels, they are considered to be part of edges. Otherwise, they are also non-edges.
OpenCV provides cv.Canny() for Canny Edge Detection with parameters:
+ First argument is our input image.
+ Second and third arguments are our minVal and maxVal respectively.
+ Third argument is the size of Sobel kernel used for find image gradients. By default it is 3.
+ Last argument is L2gradient which specifies the equation for finding gradient magnitude.
Apply Canny Edge Detection for the image below:
Figure: input of Canny Edge Detection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import cv2 as cv import numpy as np #load the image img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply bilateral filter blur = cv.bilateralFilter(gray_img,9,75,75) #apply Canny Edge Detection edges = cv.Canny(blur,7,200) #display result image cv.imshow('edges',edges) cv.waitKey(0) cv.destroyAllWindows() |
Figure: output of Canny Edge Detection
Another example

Write a small application to find the Canny edge detection whose threshold values can be varied using two trackbars.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import numpy as np import cv2 as cv def nothing(x): pass # Create a black image, a window img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) blur = cv.bilateralFilter(gray_img,9,75,75) cv.namedWindow('image') # create trackbars for color change cv.createTrackbar('minVal','image',0,255,nothing) cv.createTrackbar('maxVal','image',0,255,nothing) while(1): cv.imshow('image',img) k = cv.waitKey(1) & 0xFF if k == 27: break # get current positions of four trackbars minVal = cv.getTrackbarPos('minVal','image') maxVal = cv.getTrackbarPos('maxVal','image') img = cv.Canny(blur,minVal,maxVal) cv.destroyAllWindows() |
Figure: Solution for example
12. Image Pyramids

Consider example when searching face in an image, we are not sure what size the face will be present. So we will create a set of images with different resolution and search for face in all the images. These set of images with different resolution are called Image Pyramids (a stack with biggest image at bottom and smallest image at top).
We have 2 kinds:
+ Gaussian Pyramid : Higher level (Low resolution) is formed by removing consecutive rows and columns in Lower level (higher resolution) image. a M×N image becomes M/2×N/2 image. So area reduces to one-fourth of original area. It is called an Octave. OpenCV provides cv2.pyrDown() and cv2.pyrUp() functions.
+ Laplacian Pyramid : A level in Laplacian Pyramid is formed by the difference between that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid. Laplacian pyramid images are like edge images only.
Figure: example input
Example of Gaussian Pyramid:1 2 3 4 5 6 7 8 | import numpy as np import cv2 reso = cv2.imread('toy-story.jpg') for i in range(0,4): reso = cv2.pyrDown(reso) cv2.imshow(str(i+1),reso) cv2.waitKey(0) cv2.destroyAllWindows() |
Figure: Image Pyramid pyrDown
Example of Laplacian Pyramid:
Figure: The image above was resized to 512x512
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | import numpy as np import cv2 A = cv2.imread('images/toy_story.jpg') G = A.copy() gpA = [G] for i in xrange(6): G = cv2.pyrDown(G) gpA.append(G) # generate Laplacian Pyramid for A lpA = [gpA[5]] for i in xrange(5,0,-1): GE = cv2.pyrUp(gpA[i]) L = cv2.subtract(gpA[i-1],GE) lpA.append(L) for i in range(0,len(lpA)): cv2.imshow(str(i+1),lpA[i]) cv2.waitKey(0) cv2.destroyAllWindows() |
Figure: Laplacian Pyramid