Edge Computer Vision – Video Analysis in Real-World Application using Deep Learning Libraries

Edge Computer Vision – Edge Intelligence is about supplying real-time decision making at the edge. In this post, we will discuss how to apply computer vision at the edge for different security and surveillance activities. For instance, you can use it to discover suspicious activities at a retail store or understand what products are picked by customers in real-time. Sounds like an Amazon Go store? Computer vision is used in simple use cases like detecting customer interaction in stores to determine things in remote areas utilizing drones like crop length or mining activities. In such a case, you would need real-time decision making at the edge as it may not make sense to move the information over the cloud for processing. Processing video data will cause latency issues, bandwidth, and cost issues that prevent scalability. In this case, you utilize intelligence at the edge of the devices. 

To develop the solution, you require to use computer system vision algorithms on edge. You can construct this utilizing commercial available APIs or utilizing different open-source deep learning frameworks like Theano, TensorFlow, Mathworks, and Cafe. Deep learning is a concept of machine learning for finding out several levels of representation through neural networks. Neural network in image processing, instantly learn guidelines for recognizing images instead of you extracting countless functions to identify pictures.

Convolutional Neural Network (CNN)

Convolutional neural networks – Sounds like a weird combination of biology and math with computer science blent. Today, CNNs are the most influential innovations in the field of computer vision. Alex Krizhevsky used CNN in 2012. It was the first year that neural nets grew to prominence as Alex Krizhevsky used them to win that year’s ImageNet competition. CNN dropped the image classification error record from 26% to 15%, an astounding improvement at the time.

Numerous deep learning architectures are offered like convolutional neural network (CNN), repeating neural network to fix particular problems like computer vision, natural language processing, speech recognition, and achieves the state of the art results.

Using CNNs for deep learning has gained popularity due to three key factors:

  • CNNs remove the need for manual feature extraction. The features are learned directly by CNN.
  • CNNs provide state-of-the-art recognition results.
  • CNNs can be reused for new recognition tasks, enabling you to build on pre-existing networks.

Convolutional neural networks (CNNs, or ConvNets) are useful tools for deep learning and are extremely useful in image classification, object detection, and recognition tasks. CNNs are implemented as a series of many interconnected layers. These layers are made up of repeated blocks of three fundamental layers, namely convolutional layer, ReLU (rectified linear units) layer, and pooling layer. The purpose of convolutional layers is to convolve their input with a set of filters that are automatically learned during network training. The use of ReLU layer is to add nonlinearity to the network, that makes the network to approximate the nonlinear mapping between the image pixels and the content of an image. The purpose of pooling layers is to downsample their inputs and help consolidate local image features.

Mathworks’ Deep Learning Toolbox

Convolutional neural networks require Mathworks’ Deep Learning Toolbox™. Training and prediction need a CUDA®-capable GPU with a to compute capability of 3.0 or higher. The use of a GPU is recommended and requires Mathworks’ Parallel Computing Toolbox.

You can build a CNN architecture, train a network using semantic segmentation, and utilize the trained network to predict object class labels or detect objects. You can also obtain features from a pre-trained network, and use these features to train a model classifier. Additionally, you can do transfer learning, which retrains the CNN on new data. 

Google’s Inception-v3 model

Google’s Inception-v3 model is a 48 layer deep network that attains the start of the art results for category and detection for images. For a computer system, images are nothing but a vector or set of numbers/pixels. Convolutional Neural Networks discovers functions immediately utilizing small frames of similar size images, and each network gets the same input, and exciting pieces are extracted. The extract of fascinating parts (which are once again vector of numbers) is the heart of how CNN works. For example, in case of a customer wearing a helmet inside a retail store, CNN would learn a round edge while some CNN would learn a glass panel in the front. The concept is irrespective of where the object is in the frame; CNN must be able to determine the image.

Before CNN for image detection, you would require to crop the images and supply location of interest. For instance, if you are identifying numerous categories of birds, you would typically crop the bird image and remove any surrounding context, like trees, bushes, sea, sky, and supply the bird image only. With CNN, the concept is to train with those images and let the network figure this out. CNN can predict things most of the times which have surrounding context, but with lower accuracy. As discussed, the goal here is to recognize things irrespective of where they are found in the image. A great deal of research is required to improve the CNN networks. Having the ideal training information (pictures and label) is a must for training networks with such variations.

Here is guidance on how to apply CNN for image recognition with TensorFlow.

  • Develop your own CNN or start with a pertained CNN network like an Inception Model.
  • Get the training and test information (images and labels).
  • Train or retrain the network.
  • Optimize with learning rates, group size for preferred accuracy. Get the trained model.
  • Utilize the trained design for classification.
  • Deploy TensorFlow API and qualified model on an edge system as a docker image.
  • Use the category code on the edge entrance to find items.

Edge Computer vision technology is becoming a popular use case across industry verticals like retail, automotive, healthcare, and IIOT. It needs deep technologies implementation expertise for business effectiveness and efficiencies. Reach our to the Moboexter team to know more.


  • Mobodexter, Inc., based in Redmond- WA, builds internet of things solutions for enterprise applications with Highly scalable Kubernetes Edge Clusters that works seamlessly with AWS IoT, Azure IoT & Google Cloud IoT.
  • Want to build your Edge Computer Vision solution – Email us at [email protected]
  • Check our Edge Marketplace for our Edge Innovation. 
  • Join our newly launched marketing partner affiliate program to earn a commission here.
  • We publish weekly blogs on IoT & Edge Computing: Read all our blogs or subscribe to get our blogs in your Emails. 
%d bloggers like this: