MLOps helps teams get value from AI, and should be the anchor point for any AI tech stack.
There are three worlds forming in the AI solar system: the data prep/engineering world, the model experimentation/ training world, and the model operations world. The first two worlds are a bit more mature, and that’s okay – the demand for and investment in data prep/engineering and model experimentation/training tools has exploded in the last decade. The third world is the final frontier in the AI/ML pipeline – it focuses on productionizing the results of the first two, giving organizations the ramp towards accelerated AI adoption and value creation at scale. This blog breaks down how your MLOps architecture can ensure your pipelines are built for production at-scale.
If you’ve already invested in data prep/engineering and model experimentation/training capabilities, your next investment should be in operationalizing your AI/ML. Fortunately, the proverbial “wheel” has been invented and there are many resources to help you figure out the right approach for your organization. One of those resources is from the AI Infrastructure Alliance and it’s a reference architecture for a complete MLOps tech stack.
The primary components to consider when designing your MLOps tech stack include:
The saying goes that an AI model is only as good as the data upon which it was trained; more eloquently, garbage in, garbage out. Data labeling or preparation is one of the most time intensive tasks and involved parts of your MLOps stack, and with good reason. Prior to training and experimentation, you need balanced, clean, and labeled data, in a quantity that’s appropriate for the model type. Consider how you can use programmatic labeling tools to enable scalability, adaptability, and governability for the creation of high-quality training datasets.
Once you have access to high-quality, labeled training data, you’re ready to move into the model experimentation and training. Your models are a reflection of your data and should be treated as your most prized asset; they underpin your organization’s competitive advantage. Consider solutions or approaches that allow for easy training and experimentation to yield the best quality models as quickly as possible. Most importantly, your teams should adopt tools and techniques that enable flexibility, easy integration and easy containerization for production deployment.
Flexibility above all:There are lots of frameworks and training tools out there. By leveraging an API-driven solution, you’ll not only avoid vendor lock challenges that come with an end-to-end solution, but you’ll also give yourself freedom to adjust, update, and add in new connections in the future.
After you’ve run your experiments and trained your models, they need to be prepared for deployment and use. There are several free and open-source projects that make packaging models for deployment and use simple, like the Open Model Interface and chassis.ml that help you containerize models from all different training tools and frameworks to a common container specification that exposes your models as API endpoints. From there, your models are ready to be deployed and integrated into production systems in the cloud, on-premises, or at the edge through a DevOps process or CI/CD pipeline. An approach like this allows you to process data as pre-scheduled batches or in real-time streams; both options give you greater control of how your infrastructure is used, which means you can keep a better handle on costs.
Production deployment doesn’t stop once your models are running. You need to provide a means for data scientists and users to monitor when models have drifted out of performance, and for infrastructure managers to monitor associated costs. Depending on regulatory or industry-specific rules, you may be required to demonstrate or explain how models reach their decisions, otherwise known as explainability or observability. The ability to produce explanations for model results can also be helpful during model retraining efforts by providing a labeled dataset to use for new model experimentation and training.
Outside of model-specific monitoring, processes and controls should be in place to govern the usage and maintenance of your AI, as well as detect and inspect auditable events.
Although it may seem overwhelming to get started building a MLOps tech stack, by considering how each of the discrete parts connect to and enable your MLOps capability will set you up for success in the long run. Just like organizations wanted best-of- breed DevOps components, there’s a similar trend in the MLOps space. There are a lot of reasons for wanting composability and interoperability in both your business workflows and tech stacks. By considering all these functions together, you’ll:
MLOps is the missing link between teams experimenting and organizations getting value from AI, and should be the anchor point for any AI tech stack. The next step is to decide whether or not you have the resources to build or buy your MLOps platform, which we cover in the next post in this series entitled: Why a Hybrid Approach to MLOps is Best.