7-6 Object Detection
Learning Objectives
Object Detection is an important task in computer vision. Its goal is to enable a computer to identify what objects are present in an image or video and where they are located.
For a single image, an object detection system outputs:
1. Object Category (What)
For example: person, car, dog, cup
2. Object Location (Where)
Usually represented by a bounding box
→ a rectangular box that encloses the object
3. Confidence Score (Confidence)
Indicates how certain the model is that the detected object belongs to a specific category
In this course, we will use Python and a pre-trained object detection model—YOLO to detect objects.
Detect objects in images and print results
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
results = model.predict("https://ultralytics.com/images/bus.jpg")
for result in results:
print(f'xywh: {result.boxes.xywh}') # Center coordinates (x, y) and dimensions (w, h)
print(f'xyxy: {result.boxes.xyxy}') # Coordinates of the top-left and bottom-right corners (x1, y1, x2, y2)
print(f'names: {[result.names[cls.item()] for cls in result.boxes.cls.int()]}') # Class name of the detected object
print(f'confs: {result.boxes.conf}') # Confidence score of the detection
Enter the following command in the terminal.
python object_detect.py
You will then get the result.

Perform object detection using a webcam and render results
First, use v4l2-ctl --list-devices to identify the webcam, then check the order of the video devices listed.

import cv2
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
# Set the order before video_path (example: video0 → set to 0).
video_path = 0
cap = cv2.VideoCapture(video_path)
while cap.isOpened():
success, frame = cap.read()
if success:
results = model.predict(frame)
annotated_frame = results[0].plot()
cv2.imshow("YOLO Inference", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
else:
break
cap.release()
cv2.destroyAllWindows()
You will then obtain the object detection results.
