How to Create an Object Detection App with Python and YOLOv3

In this tutorial, we will learn how to create an object detection app using Python and the YOLOv3 (You Only Look Once) algorithm. Object detection is a computer vision task that involves the detection and localization of objects in an image or video.

We will be using the Darknet framework, which is an open-source neural network framework written in C and CUDA. It has implementations of several state-of-the-art object detection algorithms, including YOLO.

To follow this tutorial, you should have some familiarity with Python and deep learning concepts. It is recommended to have a basic understanding of convolutional neural networks (CNNs) and how they are used for object detection.

Let’s get started!

Step 1: Installing Dependencies

Before we begin, we need to install the necessary dependencies. Ensure that you have Python 3.x installed on your system.

We will be using the following libraries:

OpenCV: to load and process images and videos.
NumPy: to handle multi-dimensional arrays and matrices.
Darknet: the framework that provides the implementation of YOLO.

To install these libraries, run the following command:

pip install opencv-python numpy darknet

Step 2: Downloading the YOLOv3 Pre-trained Weights and Configuration Files

YOLOv3 requires pre-trained weights and configuration files to perform object detection. These files are not included in the Darknet library and need to be downloaded separately.

You can download the pre-trained YOLOv3 weights and configuration files from the official Darknet website using the following command:

wget https://pjreddie.com/media/files/yolov3.weights
wget https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg
wget https://github.com/pjreddie/darknet/blob/master/data/coco.names

Alternatively, you can manually download these files and place them in a directory of your choice.

Step 3: Loading the YOLOv3 Model

Now that we have the necessary files, let’s load the YOLOv3 model into our Python script.

First, we need to import the necessary libraries:

import cv2
import numpy as np
import darknet

Next, we’ll define a function to load the YOLOv3 model:

def load_yolo():
    net = darknet.load_net(b"yolov3.cfg", b"yolov3.weights", 0)
    meta = darknet.load_meta(b"coco.names")
    return net, meta

The darknet.load_net() function loads the YOLOv3 network from the configuration and weights files. The darknet.load_meta() function loads the class names associated with the COCO dataset.

We can now call this function to load the YOLOv3 model:

net, meta = load_yolo()

Step 4: Performing Object Detection

Now that we have loaded the YOLOv3 model, we can use it to perform object detection on images or videos.

Let’s start with image object detection. First, we’ll define a function to detect objects in an image:

def detect_image(image, net, meta):
    # Resize the image to the network's input size
    resized_image = cv2.resize(image, (darknet.network_width(net), darknet.network_height(net)), interpolation=cv2.INTER_LINEAR)

    # Convert the image to a darknet-compatible format
    darknet_image = darknet.make_image(darknet.network_width(net), darknet.network_height(net), 3)
    darknet.copy_image_from_bytes(darknet_image, resized_image.tobytes())

    # Perform object detection
    detections = darknet.detect_image(net, meta, darknet_image)

    # Clean up
    darknet.free_image(darknet_image)

    return detections

This function takes an image, the YOLOv3 network, and the metadata as input. It resizes the image to match the network’s input size, converts it to a darknet-compatible format, performs object detection, and returns the detections.

Let’s test this function by loading an image and detecting objects in it:

image = cv2.imread("image.jpg")
detections = detect_image(image, net, meta)

for detection in detections:
    print(detection)

This code will print the detected objects’ class names, confidence scores, and bounding box coordinates.

To display the image with bounding boxes around the detected objects, we can modify the detect_image() function as follows:

def detect_image(image, net, meta):
    # ... existing code ...

    for detection in detections:
        class_name = detection[0].decode()
        confidence = detection[1]
        bounds = detection[2]

        x, y, w, h = bounds
        x1 = int(x - w/2)
        y1 = int(y - h/2)
        x2 = int(x + w/2)
        y2 = int(y + h/2)

        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(image, f"{class_name} ({confidence:.2f})", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    cv2.imshow("Object Detection", image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

This modified function draws bounding boxes and class labels on the original image using OpenCV’s drawing functions.

Step 5: Creating a Simple Object Detection App

Now that we can detect objects in images, let’s create a simple app that performs object detection on either an image file or a video file.

First, we need to create a new Python file, let’s call it object_detection_app.py. Open the file and import the necessary libraries:

import cv2
import numpy as np
import darknet

Next, we’ll define the load_yolo() and detect_image() functions:

def load_yolo():
    # ... existing code ...


def detect_image(image, net, meta):
    # ... existing code ...

Finally, we’ll add the app logic:

def main():
    net, meta = load_yolo()

    file_path = input("Enter the path of the image or video file: ")
    file_type = file_path.split(".")[-1].lower()

    if file_type in ["jpg", "jpeg", "png"]:
        image = cv2.imread(file_path)
        detections = detect_image(image, net, meta)

        for detection in detections:
            print(detection)

        if detections:
            cv2.imshow("Object Detection", image)
            cv2.waitKey(0)
            cv2.destroyAllWindows()
    elif file_type in ["mp4", "avi", "mkv"]:
        video = cv2.VideoCapture(file_path)

        while True:
            _, frame = video.read()

            if frame is None:
                break

            detections = detect_image(frame, net, meta)

            for detection in detections:
                print(detection)

            if detections:
                cv2.imshow("Object Detection", frame)

                if cv2.waitKey(25) & 0xFF == ord("q"):
                    break

        video.release()
        cv2.destroyAllWindows()
    else:
        print("Unsupported file type. Please provide an image or video file.")

if __name__ == "__main__":
    main()

This code prompts the user to enter the path of an image or video file. It then determines the file type and performs object detection accordingly. The detected objects and their information are printed to the console. If an image is detected, it is displayed with bounding boxes.

To run the app, save the file and execute it using Python:

python object_detection_app.py

Follow the prompts to enter the file path and see the object detection results.

Conclusion

Congratulations! You have successfully created an object detection app using Python and the YOLOv3 algorithm. You learned how to load the YOLOv3 model, detect objects in images, and create a simple app that performs object detection on either an image or video file.

Feel free to experiment further with the app by adding additional functionalities, such as saving the detected objects’ information to a file or integrating it with a user interface.

Remember to explore other object detection algorithms, such as YOLOv4 or SSD, and customize the app according to your requirements.

Happy coding!