Modzy v1.6 offers improvements in inference speeds and throughput, and support for batching, real-time, and streaming inferences.
We are thrilled to announce several major additions to Modzy’s Edge capabilities with the release of Modzy v1.6. These additions, from accelerated inference times to expanded connectors and integrations, make it possible to execute machine learning workloads at speed and scale on nearly any device from the cloud, to on-prem, to the edge.
Modzy v1.6 offers significant improvements in inference speeds and model throughput, and support for batching, real-time, and streaming inferences at the edge. With these new capabilities, customers unlock the power of running accelerated ML workloads anywhere
We’ve heard from customers that faster processing speed is a top priority, enabling AI-enabled insights to be generated anywhere, in near-real time. Customers can now leverage our new inference API, which is 15 times faster than previous versions thanks to a new caching layer and the removal of expensive data writes. This accelerated inference API offers both HTTP and gRPC interfaces and supports bi-directional streaming.
Modzy customers now also have access to direct-mode, a new model serving mode on Modzy edge which maximizes model throughput. With direct-mode enabled a TinyBERT model can process 300 documents per second on an Intel Xeon Platinum 8259CL CPU processor. This mode is ideal for streaming applications on smaller devices as it provides high performance inference with limited computing overhead. It also reduces how much local storage is consumed. Storing results for streaming video, for example, could quickly fill a small SD card on a Raspberry Pi or Jetson Nano, but direct-mode removes this obstacle.
Modzy Edge now supports bi-directional streaming for video, audio, and sensor data. Customers can now run batching, real-time, and streaming inferences on their choice of edge devices. Bi-directional streaming in Python is made easy thanks to the new inference API and a Python library that binds to the gRPC server. This offers flexibility to process ML workloads right where data is collected, ensuring faster responses and increased security due to a reduced need for unnecessary data transfer.
Customers can use Modzy Edge to run remote ML workloads on AWS, Azure, or NetApp StorageGRID thanks to new data connectors, expanding the ability to run remote ML workloads on target systems. This includes the ability to add an unlimited number of custom tags to inference requests, making it easier to link predictions back to source data, related systems, user info, and anything else needed to link with predictions.
Full release note details can be found on our docs site.