MLOps is the layer in your AI tech stack that allows you to deploy, connect, and run models at scale.
So you’ve invested in tools to clean your data and train your models – now what? The next step is investing in the right MLOps pipeline that will allow you deploy, run, and monitor models in production with confidence and create value for your organization. Unfortunately, it is often difficult to understand what features are provided by different MLOps solutions, how they integrate or overlap, and what you really need for your organization to be successful.
To help navigate this emerging space, the AI Infrastructure Alliance, which is comprised of the leading AI/ML tech providers, developed a framework to cut through the noise and get to the heart of a scalable machine learning tech stack (see Figure 1.) Rather than reinventing the wheel, it can be used to help you establish a vision for your own MLOps pipeline and more specifically, how the different composable parts play together to form the whole, while providing yourself flexibility to grow and evolve.
To achieve flexibility for both now and the future, it helps to think about the major functions or layers of the stack as discrete but connected components, and identify the right mix of solutions that enable what your organization needs – especially when it comes to MLOps. To identify the right components for your organization, it helps to keep three things top of mind when trying to maximize the return on your AI/ML investments:
• Production deployment is the north star
• Your ML stack should be integration first
• Keep your deployment options open
Even if you’re just getting started with your ML tech stack, production deployment should be the guiding north star. To ensure that your models don’t get stuck in what we at Modzy call “the AI valley of death,” expand your focus on productionizing your AI investments, which is where an MLOps (or ModelOps) solution can help. MLOps refers to how teams move AI and ML models from a development environment into production. It helps to think about MLOps being the DevOps for ML/AI models.
An MLOps solution ensures that the models you’ve built using your favorite training tools or frameworks can be easily integrated into the systems and applications your users already know and love. Whether you’re using open-source tools to build a pipeline or relying on a commercial solution, at the very least, it should provide data scientists a way to quickly deploy models into production (and by quick we mean minutes, not months) and an easy handoff for development teams to integrate these models into those front-end systems. From there, comprehensive MLOps doesn’t just stop with deployment – data scientists and end users need to be able to monitor models for things like drift, understand and explain results to non-technical stakeholders, and retrain and quickly redeploy models as needed. There are lots of different individual tools and solutions for different parts of the pipeline, but a comprehensive MLOps solution will provide features for all of these functions so that you’re not stuck trying to piecemeal together vertical or disparate solutions to meet these needs.
That brings us to our next point: your ML tech stack should be designed with integration-first principles. And to take that a step further, you only need an inference layer to handle your MLOps functionality rather than a whole end-to-end platform.
In many cases, end-to-end platforms began as tools focused on one specific area, for example, autoML platforms for model training, but fall short in other areas, like model serving or monitoring, or can’t deploy to multiple types of infrastructure. Additionally, the inflexibility of these solutions extends to the costs required to run them – rather than providing the flexibility to scale up or down as you go, some of them are incentivized to drive up your compute costs.
There are plenty of great model training tools and frameworks that your teams are already using to build models from your proprietary data. Similarly, development teams have their go-to DevOps tools, CI/CD pipelines, and other solutions they’re already working with. The key is to figure out the best way to connect these different solutions in a way that not only works with existing technologies and workflows, but also gives you the flexibility to incorporate new tech or solutions in the future. This space is rapidly evolving – so, by choosing a flexible, API-first inference layer, you’ll not only be able to quickly build up a robust ML tech stack, but you’ll also leave room for additional updates in the future. Composable APIs for inference win—because end-to-end solutions lock you into fixed architecture and existing features from a single provider.
Speaking of changes for the future, the same should be considered for your infrastructure, the foundation that underpins your entire tech stack. Time and time again, we’ve heard organizations make the mistake of designing for a particular cloud service or specific piece of hardware, without considering two main points: cost, and future flexibility.
For starters, we usually hear these refrains in speaking with customers who are today working with an individual cloud service providers’ (CSP) offerings. And while these solutions are great for teams that are deeply familiar with the tech, they are literally designed to drive as much compute as possible – it is how they make their money. Fortunately, alleviating this issue is as easy as outsourcing that MLOps layer to a tool or solution that allows for smart infrastructure autoscaling, regardless of where your compute is coming from.
And that brings us to the next point, which is allowing yourself flexibility for multiple infrastructure types, or really, the ability to run your AI models anywhere. CSP tools today work well if you’re working within an individual providers’ end-to-end ecosystem, but they don’t play nicely together. You might be currently working with on-premises infrastructure, but what about hybrid options, or even running models on edge devices? According to Gartner research, there are billions of installed assets with some form of embedded computing. Gartner forecasts that many of these installed assets will be retrofitted with AI code; additionally, many millions more assets will be manufactured with embedded technologies to create a computing environment for AI-enabled software. We have no idea what this will look like in practice, but you can make smart choices about your ML tech stack today that will make life easier for you in the future by ensuring that your MLOps pipeline supports deploying to multiple different types of infrastructure.
While this blog primarily focuses on the MLOps layer of the stack, the important thing to note is that MLOps connects to all other components involved – your infrastructure layer, your data management solutions, your model training tools, and your notebooks and dashboards. That’s why MLOps is so important: it ensures your models remain healthy and accounted for throughout their full lifecycle, it enables you the flexibility to build the right tech stack to meet your organization’s needs, and finally, the right mix of solutions will stand the test of time and allow you to run your AI models anywhere. Hopefully this blueprint can help you as you build out your ML tech stack and provide you with a starting point for picking and choosing the right tools for your organization, especially for the all-important MLOps layer.