Running Machine Learning Models at the Edge

Designing solutions that combine machine learning with edge computing are a significant focus for future AI solution development.

Running Machine Learning Models at the Edge

As the opportunities to leverage AI expand, organizations are increasingly looking to run machine learning models at the edge. Edge computing refers to analyzing and processing data near where the data is generated, to decrease data flow and thereby reduce network traffic and response time. With the recent rise in use of machine learning- based technologies, and new interest in applications such as the Internet of Things (IoT), designing solutions that combine machine learning with edge computing emerged as a significant and valuable part of AI research and development.

What you need to know

Research in new machine learning technologies accelerated over the last few years, and are the main force behind advances in a wide range of application domains such as object detection and tracking, image classification, language processing, and machine translation. However, designing a suitable deep learning model remains a challenging task. Many of the deep learning architectures used today are prohibitively expensive and require an abundance of computational resources and time to process data and make decisions. The field of edge computing addresses this problem by designing lightweight deep learning models that can be placed closer to the data sources, reducing network latency, bandwidth usage and run time, while still providing performance on par with heavier and larger deep learning models that require more computational resources ([1]).

The emergence of devices such as NVIDIA graphics processing units (GPUs) and Google tensor processing units (TPUs) implies that the power of server or super-computer platforms is no longer necessary to deploy deep learning models. Further, a new research area recently emerged in the AI field that focuses solely on finding sparse deep learning architectures with fewer neurons and connections (weights and biases). This is usually done by removing the unnecessary neurons and weights inside a larger network to produce smaller deep learning models that can perform faster and better than the older heavier-duty deep learning models such as YOLO, BERT and RCNN ([2]). The recent advances in this field point to the mathematical possibility of designing smaller and more efficient deep neural networks that process data and make decisions faster, while achieving the expected performance with a lower computational load. Further, research shows that AutoML in combination with some cutting-edge engineering and mathematics can help automate the process for finding these more efficient deep learning architectures suitable for edge computing in a range of different computer vision and natural language processing tasks ([3]).

Modzy approach to running machine learning models at the edge

The design and production of lightweight AI models that can run at the edge is a major focus area at Modzy. There is a pressing need to develop algorithms that can take large networks with high accuracy as input and compress their sizes while maintaining good performance. These machine learning models should be able to perform closer to the data and under limited computational resources in an efficient manner while still producing the desired performance expected by the user. Our edge computing solutions reduce the data transmission bandwidth and task response delay and enable customized learning specific to the edge device. Our solution can provide collaborative multi-agent learning models for multi-agent scenarios. We believe that edge computing can provide further security for sensitive data collected, while at the same time decentralizing intelligent edge devices to reduce the negative consequences of DDoS attacks affecting entire networks.

What this means for you

In many large-scale machine learning applications, data is acquired and processed at network edge nodes, such as mobile devices, users’ devices, and Internet of Things sensors. Compute at edge, as opposed to traditional distributed computing systems such as data centers, will be the main force behind the next AI revolution. It is often the case that oversized networks have many redundant or unused parameters. Inefficient network architectures can waste computational resources, and oversized networks can prevent them from being used in many applications. The design of more efficient machine learning models for edge computing will increase security, reduce the negative effects of communication bottlenecks on performance, and scale better to the future needs of our society. At Modzy, we believe that the future of AI depends on the design and development of efficient, reliable and secure deep learning models, and we are actively developing AI models that can run at the edge to realize this vision.


  • Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, Juzheng Liu, Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh Nguyen, Eunbyung Park, Denis Repin, Liang Shen, Tao Sheng, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo, Low-Power Computer Vision: Status, Challenges, Opportunities.
  • Shengcao Cao, Xiaofang Wang and Kris M. Kitani, Learnable Embedding Space For Efficient Neural Architecture Compression.
  • Mingxing Tan and Quoc V. Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.