5-4 Adjusting Images

Image Modifications: Let Your Code Become an Image Designer!

Ever wished your code could do more than just "see" pictures? What if it could also act like a designer and give them a makeover? Well, that's where NumPy comes in!


Image Modifications: Let Your Code Become an Image Designer!

Ever wished your code could do more than just "see" pictures? What if it could also act like a designer and give them a makeover? Well, that's where NumPy comes in!

NumPy: The "Bricks" and "Toolbox" of Images!

We've talked about NumPy before—it's a super helper in Python for handling numbers and data. In the eyes of a computer, an image is just a dense collection of numbers. These numbers are arranged into neat "bricks," which is exactly what NumPy's powerful "multi-dimensional arrays" are (think of them as a giant grid of numbers).

Even though OpenCV has many functions for image processing, it often relies on NumPy behind the scenes. When you use imread() to read an image, the data you get back is actually a NumPy array! This means that by learning to use NumPy, you can directly manipulate these image "bricks" and modify pictures however you want!


Preparation: A Little Setup

To let your code "edit" images, you need to invite both OpenCV and NumPy to the party:

import numpy as np # Import the NumPy module and use 'np' as its short name

This line of code is like saying, "Hey NumPy, come on in! We're ready to start editing images!”


Modification: Directly "Commanding" the Image!

Since the image has become a NumPy array, modifying it is like editing a giant spreadsheet! You can use coordinates to find a specific pixel and change its color:

# Assuming your image is in color (OpenCV's default is BGR: Blue, Green, Red)
# We're going to turn the pixel at position (50, 50) into a bright blue!
# [255, 0, 0] means: Blue is brightest (255), Green is off (0), Red is off (0)
img[50, 50] = [255, 0, 0] # Modify the pixel at row 50, column 50 to be blue

You can also modify a whole area at once, like painting a corner of the image black:

# Turn a 100x100 pixel area in the top-left corner completely black (0, 0, 0)
img[0:100, 0:100] = [0, 0, 0]

Here, 0:100 means from the 0th pixel (the very edge) to the 99th pixel, for a total of 100 pixels.


Hands-On Example: "Invert" Your Image for a Film Negative Effect!

This example will show you a cool image modification: color inversion! Just like old film negatives, white becomes black, black becomes white, and colors turn into their opposites.

import cv2 as cv
import numpy as np
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
# The image data is stored as number bricks in the 'source_picture' variable.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded. If not, print a warning!
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f"Please check if the file path is correct: {file_path}")
else:
    # We'll create a "copy" of the original image called 'modification_img'.
    # This way, our modifications won't affect the original picture, 
    # and you can see a side-by-side comparison!
    modification_img = source_picture.copy()
    # Now, we'll start messing with every single small brick (pixel) of the image!
    # 'modification_img.shape[0]' is the height of the image (how many rows of pixels).
    for y in range(modification_img.shape[0]):
        # 'modification_img.shape[1]' is the width of the image 
        # (how many columns of pixels).
        for x in range(modification_img.shape[1]):
            # Get the color values for the pixel at the current (y, x) coordinates.
            # 'pixel' will be a list containing three numbers for blue, green, and red.
            pixel = modification_img[y, x]
            # For each color (blue, green, red), we subtract its original value from 255.
            # Why 255? Because the color values range from 0 to 255.
            # This turns it into its "complementary color"!
            pixel[0] = 255 - pixel[0] # Blue (B)
            pixel[1] = 255 - pixel[1] # Green (G)
            pixel[2] = 255 - pixel[2] # Red (R)
    # Start an infinite loop to display the modified image window.
    while True:
        # Use OpenCV's imshow function to display both the original and modified images.
        cv.imshow('Original Picture', source_picture)
        cv.imshow('Modification Picture (Inverted)', modification_img) 
        # Changed the window name to be clearer
        # The cv.waitKey(1) function "pauses" your program 
        # for 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # Check if the pressed key is the ESC key (the key code for ESC is 27).
        if key & 0xFF == 27:
            break # If you press the ESC key, exit the loop and close the display.
    # Finally, call cv.destroyAllWindows() to close all image windows created 
    # by OpenCV and free up computer resources.
    cv.destroyAllWindows()

Program Result:

Graphic confirming that OpenCV is successfully installed on the system

When you run the code, you'll see two windows pop up: one with your original image and another with the color-inverted image! It looks just like the negative of an old photo, right?


By learning to use NumPy to modify pixel values, you can try out more fun changes, like:

1. Selective Coloring: Change a specific area of the image to a color you like.

2. Black and White: Try turning a color photo into a black and white one.

3. Adjusting Brightness/Contrast: Change the lightness and darkness of the image by adding or subtracting from the pixel values.

NumPy has many other powerful features, and you'll find it useful in many more areas in the future!

Image Resizing: Making Your Pictures Big or Small, with Ease!

Ever wished your pictures could transform like a Transformer, getting bigger or smaller without getting blurry? The resize function in OpenCV is your helper! It lets you easily adjust the size of an image, perfect for creating thumbnails, zooming in on details, and more.


Function Prototype and Parameters: The "Resizing Command"!

Here's the "prototype" of the resize function:

def resize(src: cv2.typing.MatLike,         
					 # This is the "original image" you want to resize.
           dsize: cv2.typing.Size | None, 
		       # This is the "target size" after resizing (width, height).
           dst: cv2.typing.MatLike | None = ...,   
	         # The resized image will be stored here 
	         # (usually, you don't set this yourself; the function returns it).
           fx: float = ...,                 
	         # The "scaling factor" for the horizontal direction 
	         # (e.g., 0.5 to shrink by half, 2.0 to double).
           fy: float = ...,                 
	         # The "scaling factor" for the vertical direction.
           interpolation: int =...)          
	         # The "interpolation method" that determines the quality of the resized image.
-> cv2.typing.MatLike: ...

Return Value: resize gives you the resized image!

When resize successfully resizes your image, it returns a new image data to you.


Hands-On Example: Instantly Magnify Your Image!

Now, let's write some code to double the size of an image!

import cv2 as cv # Import the OpenCV module
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f"Please check if the file path is correct: {file_path}")
else:
    # We want to double the image's width and height!
    # source_picture.shape[1] is the original image's width.
    # source_picture.shape[0] is the original image's height.
    # `<< 1` is a bitwise operation that's the same as multiplying by 2!
    # So (source_picture.shape[1] << 1, source_picture.shape[0] << 1)
    # means the new dimensions will be double the original width and height.
    resized_image = cv.resize(source_picture,
                             (source_picture.shape[1] << 1, 
                              source_picture.shape[0] << 1))
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original image in a window.
        cv.imshow('Original Picture', source_picture)
        # Display the resized image in another window.
        cv.imshow('Scaled Picture (2x)', resized_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program 
        # for 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (ASCII value is 27), 
        # break the loop and close the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows pop up: one with your original-sized image and another with the image that's been doubled in size! Compare them to see if the enlarged image is still clear!


By learning the resize function, you can perform all sorts of size adjustments on your images, such as:

1. Creating Thumbnails: Shrink large images to make websites or apps load faster.

2. Magnifying Details: Only want to see a certain detail in an image? Just magnify it!

3. Standardizing Sizes: Make images of different sizes all the same, which is useful for processing.

resize has many advanced uses, like choosing different interpolation methods to optimize image quality. If you're interested, you can try changing the interpolation parameter to see the different effects!

Image Cropping: Only Keep Your Favorite Part!

Have you ever taken a photo with something you didn't like in it, or just wanted to highlight a specific part? That's where image cropping comes in! With code, we can use a "magic scissor" to "cut out" the part you want from the original image, leaving only the best parts.


The "Magic Scissor" for Cropping: NumPy Slicing!

Remember we said that an image is just a bunch of neatly arranged numbers (a NumPy array) in a computer? To crop an image, we can use NumPy's "slicing" feature, which is as simple as cutting a slice from a large cake!


Modification: Tell the Code Where to Cut!

Cropping an image means extracting a rectangular area from the original image. You just need to tell the code where to start and where to end:

# Imagine the image is a coordinate system, 
# [y_start:y_end, x_start:x_end] is your cropping area.
# 'y' represents the "row" (height), and 'x' represents the "column" (width).
# For example, we want to cut out the area from "row 50 to row 199"
# and "column 100 to column 299".
# (Note: NumPy slicing is "exclusive of the end value"!)
roi = img[50:200, 100:300]

A quick tip: The order for NumPy array slicing is [y_start : y_end, x_start : x_end], which corresponds to [height_start : height_end, width_start : width_end] for an image.


Important Notes: Little Secrets Before You Crop!

When cropping images, there are a few important concepts and "tricks" you need to know:

1. Is the image data "shared" or "copied"?
When you use NumPy slicing to crop an image, it usually creates a "view," not a "copy." What's the difference?

(1) View:

  • It's like opening a "window" to a specific part of the original image.
  • If you modify the color (pixel value) through this "window," the corresponding part of the original image will also change! This is because they share the same data.

 

(2) Copy:

  • If you want the cropped image to be independent, so modifying it doesn't affect the original, you need to explicitly use the .copy() method to create a "completely separate duplicate."
  • For example: roi = img[50:200, 100:300].copy()

 

2. Don't crop outside the image!

(1) When defining your cropping area, make sure your start and end coordinates are within the image's actual size.

(2) If your cropping area goes beyond the image's boundaries, NumPy usually won't throw an error directly, but it might give you an "empty" image or only crop up to the edge.

(3) If you then try to perform operations on this "invalid" cropped area, your program might fail or give unexpected results.

(4) Good practice: Before cropping, check if your y_start, y_end, x_start, and x_end are within the image's height and width!

3. A little knowledge about memory management:

(1) In Python, memory is usually managed automatically (like an automatic vacuum cleaner). When an image (or its view) is no longer "referenced" by any variable, the memory it occupies is freed.

(2) Important! If you only created a "view," the view itself hasn't copied the image data. Even if the "view" variable is removed, as long as the original image still exists, that memory won't be freed. The actual pixel data memory is only released when the original image is also cleared.

 


Hands-On Example: Cut Out Your Desired Image Area!

Now, let's write some code to crop a part of an image and see the result!

import cv2 as cv # Import the OpenCV module
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f'Please check if the path is correct: {file_path}')
else:
    # Now, we'll "cut" a region from the original 'source_picture'.
    # We're cropping the area:
    # Height (y-axis): From row 100 to row 100 + 200 = 300 (not including row 300).
    # Width (x-axis): From column 100 to column 100 + 300 = 400 
    # (not including column 400).
    # This will give us a new image with 
    # a height of 200 pixels and a width of 300 pixels.
    crop_image = source_picture[100 : 100 + 200, 100 : 100 + 300]
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original image in a window named 'Original Picture'.
        cv.imshow('Original Picture', source_picture)
        # Display the cropped image in a window named 'Cropped Picture'.
        cv.imshow('Cropped Picture', crop_image)
        # The cv.waitKey(1) function pauses the program for 
        # 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows pop up: one with your original image and another with the cropped region you "cut out"! Pretty cool, right?


By learning image cropping, you can:

1. Focus on the main subject: Crop out unrelated people or objects in a photo.

2. Create an avatar: Precisely crop your headshot from a larger photo.

3. Analyze a specific area: In image processing or machine learning, focus only on a certain region of an image for analysis.

Remember the difference between a "view" and a "copy"! If you want to modify the cropped image without affecting the original, you must use .copy()!

Image Translation: Making Your Pictures "Slide"!

Have you ever wanted to make an image "move" to a different position on the screen? It's like dragging a photo on your phone, but now we'll do it with code! In OpenCV, we can use a function called warpAffine to perform this "sliding" magic, which we call "image translation."


warpAffine: The "Movement Command" for Images!

The warpAffine function in OpenCV is used to perform an "affine transformation." It sounds complicated, but simply put, it's a powerful way to geometrically transform images.

Imagine your image is like a rubber band that can be stretched, squeezed, rotated, or moved. An affine transformation keeps parallel lines parallel, but the image's size, shape, or angle might change. Today, we'll mainly use it to achieve the simplest transformation: "translation," which means moving the image on the screen!


Function Prototype and Parameters: Tell warpAffine How to Move!

The warpAffine function needs some "commands" to know how to move the image:

def warpAffine(src: cv2.typing.MatLike,         
							 # This is the "original image" you want to move.
               M: cv2.typing.MatLike,           
               # This is a crucial "transformation matrix,"
               # which contains the "secret formula" for how the image moves.
               dsize: cv2.typing.Size,          
               # This is the "new image's size" after the movement (width, height).
               dst: cv2.typing.MatLike | None = ...,   
               # The moved image will be stored here 
               # (usually you don't set this yourself; the function returns it).
               flags: int = ...,                
               # Some extra flags, like the interpolation method, 
               # which affects image quality.
               borderMode: int = ...,           
               # How to handle the image's edges when they go out of bounds 
               # (e.g., what color to fill them with).
               borderValue: cv2.typing.Scalar = ...) 
               # The color value to fill the border with.
-> cv2.typing.MatLike: ...

Return Value: warpAffine gives you the moved image!

When warpAffine successfully moves your image, it returns a new image data to you.


Important Notes: "Tips" for Moving Images!

To make sure your image moves smoothly and correctly, these "tricks" are important:

1. The M Matrix: The "Secret Formula" for Movement!

(1) The most important parameter for warpAffine is M, which is a 2x3 "transformation matrix." This matrix contains the mathematical formula for how the image should "move, rotate, scale," etc.

(2) Don't worry! We don't usually need to calculate this complex M matrix manually. OpenCV has a helpful function called cv2.getAffineTransform(src_points, dst_points) that can automatically calculate this M matrix for us!

2. Use "Three Points" to Decide How to Move!

(1) The cv2.getAffineTransform function is smart. You just need to tell it "three points" from the original image (src_points) and "where those three points should go" after the move (dst_points), and it can calculate the "secret formula" for the movement for you!

(2) Important: You must provide exactly three pairs of points, no more, no less. And these three points cannot be on the same straight line (otherwise it won't know how to transform).

(3) src_points will be a list containing the coordinates of the three "original points," and dst_points will be a list containing the "corresponding new coordinates" for those three points.

3. Don't mess up the order of the points!
The first point you give in src_points will correspond to the first point in dst_points, the second point in src_points will correspond to the second point in dst_points, and so on. Make sure your points are matched up correctly, or the image will move to a weird place!


Hands-On Example: Move Your Image to the Bottom Right!

Now, let's write some code to make an image "slide" a little bit!

import cv2 as cv
import numpy as np # We need NumPy to handle the point coordinates
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f'Please check if the path is correct: {file_path}')
else:
    # --- 1. Define the "three reference points" on the original image ---
    # We'll choose the top-left, top-right, and bottom-left corners as reference points.
    # The coordinates must be of type float32 (np.float32).
    src_points = np.float32([
        [0.0, 0.0],                          
        # Point 1: Top-left corner of the original image (x=0, y=0)
        [source_picture.shape[1] - 1.0, 0.0], 
        # Point 2: Top-right corner (x=far right, y=0)
        [0.0, source_picture.shape[0] - 1.0]  
        # Point 3: Bottom-left corner (x=0, y=far bottom)
    ])
    # --- 2. Define where these reference points should go "after the move" ---
    # Here, we'll make the whole image move slightly to the bottom right, 
    # with a bit of a squeeze and tilt.
    dst_points = np.float32([
        [0.0, source_picture.shape[0] * 0.1],  
        # Point 1 moves: y-axis moves down by 10% of the image's height
        [source_picture.shape[1] * 0.9, 0.0],  
        # Point 2 moves: x-axis moves left by 10% of the image's width
        [source_picture.shape[1] * 0.2, source_picture.shape[0] * 0.9]
        # Point 3 moves: x-axis moves right by 20%, y-axis moves down by 10%
    ])
    # --- 3. Calculate the image's "movement formula" (Transformation Matrix M) ---
    # cv.getAffineTransform will automatically calculate a 2x3 transformation matrix 
    # (warp_matrix)
    # for us based on src_points and dst_points, 
    # which contains the commands for how to
    # translate, rotate, and scale the image.
    warp_matrix = cv.getAffineTransform(src_points, dst_points)
    # --- 4. Apply the "movement formula" to make the image really move! ---
    # The cv.warpAffine function will take the original image (source_picture)
    # and move it to a new position based on the warp_matrix we calculated.
    # (source_picture.shape[1], source_picture.shape[0]) 
    # specifies the size of the new image after the move.
    # Here, we'll keep it the same size as the original.
    modified_image = cv.warpAffine(source_picture, warp_matrix,
                                  (source_picture.shape[1], source_picture.shape[0]))
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original image.
        cv.imshow('Original Picture', source_picture)
        # Display the moved image.
        cv.imshow('Moved Picture', modified_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program for 
        # 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows: one with your original image, and another with the image that's slightly "slid" to the bottom right and a little bit distorted!


Besides simple translation, warpAffine can perform many more interesting image transformations, such as:

1. Rotating an image: Turn the image to face a different direction.

2. Tilting an image: Make the image look like it's been blown sideways by the wind.

3. Scaling an image: Of course, it can also make the image bigger or smaller.

By just adjusting the positions of the three points in src_points and dst_points, you can play around with all sorts of different affine transformation effects!

Image Flipping: Making Your Pictures Look in a Mirror!

Have you ever looked in a mirror and seen your reflection flipped horizontally? Or after taking a photo, wanted to flip the image horizontally to see a different perspective? In image processing, this "mirroring" action is called "image flipping." OpenCV has a super simple function called flip that can do this for you!


flip: The "Mirroring Command" for Images!

The flip function in OpenCV is a tool specifically designed to “flip” an image. You can choose to flip the image horizontally (left–right), vertically (up–down), or even both horizontally and vertically at the same time.


Function Prototype and Parameters: Tell flip How to Flip!

The flip function needs some simple "commands" to know how to mirror the image:

def flip(src: cv2.typing.MatLike,       
				 # This is the "original image" you want to flip.
         flipCode: int,                 
         # This number determines how the image will be flipped!
         dst: cv2.typing.MatLike | None = ...) 
         # The flipped image will be stored here (usually you don't set this yourself; 
         # the function returns it).
-> cv2.typing.MatLike: ...

flipCode: The "Secret Number" for Flipping!

flipCode is the key to telling the flip function how to flip the image:

1. flipCode = 0: Vertical flip (up-and-down). Imagine the image flipping around a horizontal center line.

2. flipCode = 1: Horizontal flip (left-to-right). Imagine the image flipping around a vertical center line.

3. flipCode = -1: Horizontal and vertical flip. It flips both left-to-right and up-and-down at the same time!


Return Value: flip gives you the flipped image!

When flip successfully flips your image, it returns a new image data to you.


Hands-On Example: Flip Your Image Upside Down!

Now, let's write some code to "horizontally flip" (left-to-right) an image!

import cv2 as cv
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.") 
    print(f"Please check if the path is correct: {file_path}")
else:
    # Here, we'll directly call the cv.flip function to flip the image.
    # source_picture is the image we want to flip.
    # 1 stands for a horizontal flip (left-to-right).
    # modified_image will receive the flipped image data.
    # Note: cv.flip returns a new image by default, so you don't need to use .copy()!
    modified_image = cv.flip(source_picture, 1) # **Horizontal** flip (flipCode=1)
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original image.
        cv.imshow('Original Picture', source_picture)
        # Display the flipped image.
        cv.imshow("Flipped Picture (Horizontal)", modified_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program for 
        # 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows: one with your original image, and another with the image flipped horizontally!


By learning the flip function, you can try different flipCode parameters to see how the image changes:

1. cv.flip(source_picture, 0): The image will flip vertically (upside down).

2. cv.flip(source_picture, -1): The image will flip both horizontally and vertically, creating a very unique effect!

The flip function is not only for creating fun visual effects; it's also commonly used in image processing for data augmentation, which helps models learn from a wider variety of image variations!

Image Grayscaling: Turning Your Color Pictures into "Black and White Photos"!

Have you ever seen a classic black and white photo with so much character? Or sometimes, to focus on the shapes and shadows in an image, we turn it into grayscale. In OpenCV, there's a super useful function called cvtColor that can easily convert your image from color to grayscale, like helping the image "change into a black and white outfit"!


cvtColor : The "Color-Changing Magician" for Images!

The cvtColor function in OpenCV is a tool specifically for "converting an image from one color mode to another." Images have many "color modes" (or "color spaces"), such as the color images we're familiar with (RGB or BGR mode), and the "grayscale mode" we're learning today.


Function Prototype and Parameters: Tell cvtColor How to Change Colors!

The cvtColor function needs some simple "commands" to know how to change an image's colors:

def cvtColor(src: cv2.typing.MatLike,    
						 # This is the "original image" you want to change colors for.
             code: int,                  
	           # This "conversion code" determines which 
	           # color mode to convert from and to!
             dst: cv2.typing.MatLike | None = ...,   
             # The color-changed image will be stored here 
             # (usually you don't set this yourself; the function returns it).
             dstCn: int = ...,           
             # The number of channels for the output image 
             # (usually you don't set this; the function figures it out).
             hint: AlgorithmHint = ...)    
             # A hint for the conversion algorithm (usually the default is fine).
-> cv2.typing.MatLike: ...


Return Value: cvtColor gives you the color-changed image!

When cvtColor successfully changes the colors of your image, it returns a new image data to you.


Important Notes: "Tips" Before You Change Colors!

To ensure your image changes colors correctly and naturally, these "tricks" are important:

1. BGR or RGB? OpenCV's "Little Habit"!

(1) You might have heard of RGB (Red, Green, Blue), which is the color order used by many image software. But OpenCV has its own "little habit": its default color image order is BGR (Blue, Green, Red)!

(2) So, when you're doing color conversions, pay special attention to the order of your input image and your desired output to avoid a color "mix-up"!

(3) For example, to convert from BGR to grayscale, we use cv.COLOR_BGR2GRAY.

2. Pay attention to the "value range" of colors!

(1) Image color values usually range from 0 to 255. But in some special color conversions (for scientific calculations, for example), the function might need your color values to be decimals between 0 and 1.

(2) If the value range is wrong, the converted colors might "go astray" or look strange. However, for a common conversion like color to grayscale, you usually don't need to worry about this.

3. The image's "data type"!

(1) The most common image data type is uint8 (8-bit unsigned integer), which means each color value is an integer between 0 and 255. This format saves the most memory and is sufficient for most applications.

(2) Although some more precise calculations might use data types like floating-point numbers, for learning about grayscaling, uint8 is more than enough!


Hands-On Example: Turn Your Image into a Classic Black and White Photo!

Now, let's write some code to turn a color image into a nostalgic grayscale photo!

import cv2 as cv
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f" Please check if the path is correct: {file_path}")
else:
    # --- Image Grayscaling! ---
    # Use the cv.cvtColor function to convert the original color image (source_picture) 
    # to a grayscale image.
    # cv.COLOR_BGR2GRAY is the conversion code, 
    # which means "convert from BGR color mode to grayscale mode."
    # The grayscale image will only have one color channel (representing brightness), 
    # no more red, green, and blue channels!
    modified_image = cv.cvtColor(source_picture, cv.COLOR_BGR2GRAY)
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original color image.
        cv.imshow('Original Picture (Color)', source_picture)
        # Display the converted grayscale image.
        cv.imshow("Modified Picture (Grayscale)", modified_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program for 1 millisecond to check for 
        # a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows: one with your original color image, and another with the grayscale image you just converted with code! Pretty cool, right?


mage grayscaling is one of the most basic and common operations in image processing. It not only creates an artistic feel, but in many advanced image processing or machine learning tasks, converting an image to grayscale can simplify calculations and make it easier for the program to analyze the image's content!

Image Binarization: Turning Your Pictures into "Black and White Silhouettes"!

Have you ever wondered how to turn a picture into a simple black and white image, like a silhouette? In the world of computer vision, this is called "image binarization." It's like setting a "light threshold" for the image: anything brighter than the threshold turns completely white, and anything darker turns completely black. This is very common in tasks like text recognition and object detection. The threshold function in OpenCV is our tool for performing this magic!


threshold: The "Light Gatekeeper" for Images!

The threshold function in OpenCV is specifically used to convert grayscale images into binary images (the black and white images we're talking about), or to classify pixels in an image based on a "threshold value" you set.


Function Prototype and Parameters: Tell threshold Your "Light Threshold"!

The threshold function needs some commands to know how to set the black and white boundary for the image:

def threshold(src: cv2.typing.MatLike,      
							# This is the "grayscale image" you want to process 
							# (Note: it must be a grayscale image!).
              thresh: float,                
              # This is your "threshold value." Pixels brighter than this value will 
	            # be processed.
              maxval: float,                
              # This is the "maximum brightness value" for pixels that 
              # exceed the threshold (usually 255 for white).
              type: int,                    
              # This is the "threshold type," which tells the function 
              # how to classify pixels based on the threshold.
              dst: cv2.typing.MatLike | None = ...)
-> tuple[float, cv2.typing.MatLike]: ...

Return Value: threshold gives you the "threshold value" and the "black and white image"!

When threshold successfully binarizes your image, it returns two things to you:

1. float: The actual "threshold value" used.

(1) If you set the threshold value yourself (like 200 in the example), it will return the value you set.

(2) But if you choose to have OpenCV automatically find the best threshold value for you (e.g., using cv.THRESH_OTSU or cv.THRESH_TRIANGLE automatic modes), it will return the optimal threshold value it calculated.

2. cv2.typing.MatLike: The "black and white image" after binarization.

This is the processed image data, which will be a NumPy array. The pixel values inside will only be 0 (black) or 255 (white)!


Hands-On Example: Turn Your Picture into a Cool Black and White Silhouette!

Now, let's write some code to first turn a color image into grayscale, and then binarize it, to see the effect!

import cv2 as cv
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image. ")
    print(f"lease check if the path is correct: {file_path}")
else:
    # --- 1. First, turn the color image into a grayscale image! ---
    # cvtColor is our "color-changing magician," converting BGR color to grayscale.
    convert_img = cv.cvtColor(source_picture, cv.COLOR_BGR2GRAY)
    # --- 2. Perform image binarization! ---
    # Use the cv.threshold function to binarize the grayscale image.
    # convert_img: The input grayscale image.
    # 200: This is the "threshold value" we set. Pixels with brightness higher than 200.
    # 255: Pixels that exceed the threshold will become 255 (white).
    # cv.THRESH_BINARY: This is the threshold type. It means "if pixel brightness > 200,
    #                   it becomes 255; otherwise, it becomes 0."
    # ret: The returned actual threshold value (here, it's 200).
    # modified_image: This is the binarized black and white image. 
    # The pixels inside will only have values of 0 or 255.
    ret, modified_image = cv.threshold(convert_img, 200, 255, cv.THRESH_BINARY)
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original color image.
        cv.imshow('Original Picture (Color)', source_picture)
        # Display the binarized black and white image.
        cv.imshow("Modified Picture (Binary)", modified_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program for 1 millisecond to check for 
        # a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows: one with your original color image, and another with the black and white image you just converted with code! Doesn't it look like a silhouette or a woodcut?


Image binarization is very important in many applications, such as:

1. Optical Character Recognition (OCR): Binarizing document images makes text clearer, making it easier for the computer to recognize.

2. Object Detection: Separating objects from the background in an image.

3. Barcode Scanning: Simplifying a barcode image so the scanner can read it.

You can try adjusting the threshold value of 200 in the threshold function to see what different numbers produce what kind of black and white effects!

Image Edge Detection: "Outlining" Your Pictures!

Have you ever wondered how a computer "understands" what objects are in a picture? A very important clue is the objects' "edges"! Just like a cartoonist outlines a drawing first, image edge detection is a technique that lets the computer find all the line contours in an image. OpenCV has a super powerful function called Canny that can draw precise "edge lines" for your pictures!


Canny: The "Sketching Master" for Images!

The Canny function in OpenCV is specifically used to detect edges in images. It's not just a random drawing tool; Canny edge detection is a very classic and effective algorithm. It works hard to find the places in an image where the brightness changes most significantly, as those places are usually the edges of objects.


Function Prototype and Parameters: Tell Canny How to "Outline"!

The Canny function needs some commands to know how to draw edge lines for the image:

def Canny(image: cv2.typing.MatLike,        
					# This is the "grayscale image" you want to detect edges on 
					# (Note: it must be a grayscale image!).
          threshold1: float,               
          # This is the first "low threshold value."
          threshold2: float,               
          # This is the second "high threshold value."
          edges: cv2.typing.MatLike | None = ...,   
          # The detected edge image will be stored here 
          # (you usually don't set this yourself).
          apertureSize: int = ...,           
          # A parameter that affects the fineness of the edge detection 
          # (usually the default is fine).
          L2gradient: bool = ...)            
          # The method for calculating the gradient (usually the default is fine).
-> cv2.typing.MatLike: ...

Return Value: Canny gives you an "edge map"!

When Canny successfully detects the edges of your image, it returns a new image data to you.


Important Notes: "Little Secrets" of Canny Edge Detection!

There are a few "tricks" you should know about Canny edge detection, as they affect the results:

1. "Hysteresis Thresholding": Smart Edge Linking!

(1) The most powerful part of Canny edge detection is that it uses "two thresholds (threshold1 and threshold2)" to process edges.

(2) High threshold (threshold2): Used to find "very obvious" edges (strong edges).

(3) Low threshold (threshold1): Used to find "less obvious" edges (weak edges).

(4) If a weak edge pixel can be "linked" to a strong edge pixel, it will also be considered a true edge! This clever mechanism helps us keep real edges while filtering out false edges caused by noise.

(5) It's usually recommended that threshold2 is 2 to 3 times threshold1.

2. Sobel Operator: Finding "Brightness Changes"!

(1) Internally, Canny uses a tool called the "Sobel operator" to calculate "how much the brightness changes and in what direction" for each part of the image. The greater the brightness change, the more likely it is to be an edge.

(2) The apertureSize parameter affects the size of the area the Sobel operator "looks at." A larger value results in smoother edges but might miss some small, fine edges.

3. 8-bit Grayscale Images: Canny's "Favorite"!
The Canny function typically only accepts 8-bit (0 to 255) grayscale images as input. So, if your image is in color, remember to convert it to grayscale first with cv.cvtColor!

4. Choosing the right "thresholds" is important!

(1) The choice of threshold1 and threshold2 is very important! They will directly affect which edges are detected in the final image.

(2) There is no "one-size-fits-all" threshold combination. The best way is to try different values on your images to find the best effect for your needs.


Hands-On Example: Make Your Picture an Outline!

Now, let's write some code to first turn a color image into grayscale, and then use the Canny function to find its edges!

import cv2 as cv
# Set the file path for the original image.
# Don't forget to replace 'Pandora.png' with your own image!
file_path = 'Pandora.png'
# Use OpenCV's imread function to read the image.
source_picture = cv.imread(file_path)
# Check if the image was successfully loaded.
if source_picture is None:
    print(f"Warning: Oops! Could not load the image.")
    print(f" Please check if the path is correct: {file_path}")
else:
    # --- 1. First, turn the color image into a grayscale image! ---
    # cvtColor is our "color-changing magician," converting BGR color to grayscale.
    convert_img = cv.cvtColor(source_picture, cv.COLOR_BGR2GRAY)
    # --- 2. Perform Canny edge detection! ---
    # Use the cv.Canny function to find the edges of the grayscale image.
    # convert_img: The input grayscale image.
    # 50: This is the "low threshold value."
    # 150: This is the "high threshold value."
    # modified_image: This is the detected edge image, 
    # where edges are displayed as white lines.
    modified_image = cv.Canny(convert_img, 50, 150)
    # --- Image Display Loop ---
    # Enter an infinite loop to keep the image windows open.
    while True:
        # Display the original color image.
        cv.imshow('Original Picture (Color)', source_picture)
        # Display the Canny edge-detected image.
        cv.imshow('Edge Detected Picture', modified_image) 
        # Changed the window name to be clearer!
        # The cv.waitKey(1) function pauses the program for 
        # 1 millisecond to check for a key press.
        key = cv.waitKey(1)
        # If the ESC key is pressed (number 27), break the loop and end the display.
        if key & 0xFF == 27:
            break
    # --- Release Resources ---
    # Finally, close all image windows created by OpenCV to free up computer resources.
    cv.destroyAllWindows()

Program Result:

When you run the code, you'll see two windows: one with your original color image, and another with the image "outlined" by the Canny function! The contours of the objects in the image will appear as white lines. Doesn't it look like a sketch or a comic book draft?


Canny edge detection plays a crucial role in many advanced computer vision tasks, such as:

1. Object Recognition: Find object contours first, then use their shapes to determine what the objects are.

2. Image Stitching: Use edges to align multiple images when stitching them together.

3. Robot Navigation: Help robots identify the boundaries of obstacles.

You can try adjusting the two threshold values, 50 and 150, in the Canny function to see how different numbers make the edge lines thicker, thinner, or detect more/fewer details!

 

Copyright © 2026 YUAN High-Tech Development Co., Ltd.
All rights reserved.