AI AND VIDEO ANALYTICS BLOG
Video Surveillance & Physical Security Industry Viewpoints
May 16th, 2024
Author: Tristan Foro

Deep Learning and Neural Networks: The Technology Behind Video Analytics

Artificial Intelligence, the outer most wrapper of neural networks

While Artificial intelligence (AI) has become the overused buzzword of our day, its prevalence has contributed little to widespread understanding of what the technology is and how it works. Because we understand the importance of AI — as it has and continues to serve as the basis for everything that we do in the video analytics industry — we feel it’s important to help you navigate the inner workings of this technology so you can become an educated consumer regardless of how you choose to leverage AI in your life.

What is Artificial Intelligence

AI is the most general term used to describe a robust field of programming technology that was developed to mimic the intelligent way by which humans think and learn.

Machine learning (ML) is a subset of AI that focuses on developing algorithms and statistical models that enable computers to perform tasks without explicit programming. It involves training systems on data to improve their performance on a specific task over time.

Deep learning is even more granular and is a specific discipline of machine learning that involves neural networks with multiple layers. Deep learning teaches computers to process data in a way that was inspired by the human brain.

Lastly, we have convolutional neural networks (CNNs), a very specific type of deep learning. CNNs are feed-forward networks that are primarily used for image recognition and processing because of their ability to recognize patterns in images. While CNNs are currently the most predominant neural network technology used in today’s video analytics market, newer neural network architectures, like Visual Transformers (ViTs), are also gaining some momentum in this space.

Convolutional Neural Networks (CNNs)

Now that we better understand the nuances of AI, let’s dive deeper into convolutional neural networks and their role in advanced video analytics.

While mathematics plays a pivotal role in CNNs, we’re going to skip the number crunching and break down the technology into 3 progressive layers:

CNNs layers

  • Convolutional Layer: Filters, or kernels, are applied to video frames to extract features. Each filter slides over the input video frame and produces a single output value, with the filter’s weight determining the type of feature such as an edge, texture, or motion. Convolutional layers learn to extract hierarchical features (feature extraction) like patterns and structures, as well as spatial hierarchies, like the relationship between objects and actions.
  • Pooling Layer: A down sampling mechanism that is used to retain the most relevant information produced by the convolutional layer. It’s crucial because it reduces the number of parameters in the neural network, improves computational efficiency, and reduces the risk of overfitting to noise.
  • Fully Connected Layer: The final stage in a convolutional neural network. The fully connected layer aggregates high-level features learned from the preceding layers and performs classification based on these features. This layer computes the output scores for different classes present in the video frames.

Real-World Technology Powered by CNNs

Even though AI is the term that steals the spotlight when advanced technologies take the stage, CNNs are the underlying technology used in video analytics today. Check out some of the well-known technologies that are CNN-driven and used for critical real-world applications.

  • Object Classification: The ability to classify different objects such as people, vehicles, and objects. Object classification is commonly used to automatically categorize objects detected within video footage into predefined classes or categories that can be searched and alerted on.
  • Face Recognition (FR): Utilizes deep learning and CNNs to detect faces in constrained and in the wild scenarios. These neural networks are trained on datasets that include a diverse set of ethnicities and representations of both genders.
  • License Plate Recognition (LPR):CNNs are used to detect and differentiate between the license plate and vehicle, dealing with issues such as edge detection and color transitions between the license plate and the car body.
  • Vehicle Make and Model Recognition (VMMR): State-of-the-art methods for detecting vehicles based on their make and model.

Convolutional neural networks are fundamental to video analytics and other advanced technologies that impact the safety of people everywhere. AI is an integral part of our world, and it’s imperative that we understand the technology so we can implement and utilize it effectively, responsibly, and sustainably now and in the future. If you’d like to explore real-world examples of how CNNs are applied in video analytics applications specific to your industry, register for a demo today!