ML Model Production Deployment Checklist

Production deployment is the first step to getting trained ML models out of the lab and running at scale.

ML Model Production Deployment Checklist

Today I’d like to introduce you to a piece of technology that revolutionized aviation. It’s contributed to a reduction in accidents and fatalities, it’s easy to build, easy to use, yet, surprisingly, has only been around since the mid-1930s. I’m talking about checklists!

Before we dive in- the checklist itself is available onGitHub. If you’re in the business of building, running, or using ML models, consider subscribing to our YouTube channel where we post regular videos about working with ML in the wild.

Now let’s get back to the humble checklist. So where did it come from? In 1935 the Boeing Model 299 was undergoing a flight test when one of the flight controls, something called a gust lock was accidentally left engaged, resulting in the plane’s crash and the deaths of two pilots. Instead of cancelling the program, the US Army Air Corps responded by created the first known checklist, meant to help new pilots avoid similarly costly errors. This plane was eventually adopted into service as the B-17 bomber, used heavily during WWII.

It's worth noting that aviation checklist use was particularly important in take-off, approach and landing. ‘Although these segments comprise only 27% of average flight duration, they account for 76% of accidents.’

So the idea for today is to try and apply this philosophy and build a production ML checklist from first principles that minimizes the risks inherent in deploying a new model to production with the least amount of effort possible. You can use this approach to build your own checklist, or you can use the one we’ve put together on GitHub.

What makes for a good checklist?

So, first things first, what makes for a good checklist? Well, according to author, surgeon, and checklist enthusiast Dr. Atul Gawande, who literally wrote the book on checklists, a good check list should be precise, to the point, and easy-to-use. It shouldn’t try to spell out everything. Instead, it should focus on adding in reminders of the most critical and important steps that might get missed. More than anything, it should be practical.

So, with that in mind, let’s build a simple checklist that maximizes risk reduction. We’re about to min/max AI.

I’m going to start by proclaiming that there are 5 things we should all want out of a production model:

  1. It's returning predictions
  2. It can talk to other software
  3. It's as fast and responsive as business rules dictate
  4. It's about as accurate as you expected
  5. Nobody hacked it

If all five of those conditions are true, then you’ll be in a pretty great spot. So, let’s devise some error traps that we can turn into checklist items to help make our dream a reality. If you’ve done this before, make a list of the issues that your team has run into in the past. Here are some common issues that pop-up when you’re deploying models to production.

Issue 1: Deploying the wrong model version to prod

If you’re managing even a few models, it’s incredibly easy to accidentally deploy the wrong version. Is the model behaving strangely because you accidentally deployed an old version? Or perhaps your other software is pointed to the wrong version number?

Issue 2: You’re missing key model info

When devs are trying to connect to a model, it’s easy to miss important info such as data format, response times, etc. Who you gonna call?

Issue 3: Unexpected model performance

Sometimes are models don’t work quite right in production. Was the model tested on the same kind of hardware that you’ve got in production? On the same volume and type of data? Is it experiencing the same amount of traffic as it was tested on?

Issue 4: Stuff is breaking and I don’t know why!

Sometimes stuff just breaks. Are you able to figure out why your model isn’t working? Is it slow/wrong/offline? Did the input data change? You’re going to want to be able to get answers fast so you can fix the glitch.

The Checklist

So, we’ll try to build some error traps for these common issues, and a few others, into our checklist. Drum roll please, here’s what that checklist might look like:

Follow this link for the checklist

Here you can see that we’ve built in traps to:

  • Document the most important info about the model
  • Doublecheck the model name and version at each step in the process
  • Test the model on the same hardware, data, and workload as it will experience in production
  • Ensure no one can touch the model without permission
  • Make sure we always have enough info to quickly debug model issues and rollback changes if needed

Now, how you choose to put these checklist items in place isa whole different story.

One more aviation example for you; the Convair B-36“Peacemaker” had a pre-flight checklist that took over 6 hours to complete. Not only did this cumbersome process hamper the Air Force’s ability to deploy the aircraft quickly, but you can imagine that the during this 6-hour slog, most pilots have turned their brains off. As a result, flight risk remained high because it was easy to check off items that hadn’t successfully been completed, or have been completed haphazardly.

Lessons Learned

The lesson here is to look for opportunities to automate parts of the process that do not benefit from human intervention. MLOps tools provide a way to automate low risk items so they no longer need to be a part of your checklist. Things like checking on the model name, ID, and version number can happen automatically if you’re using a model serving platform like Modzy.Similarly, you might choose to automate a set of tests that simulate a production workload in your test environment, or use a CI/CD pipeline that automatically documents a model’s author, training data, and validation metrics, all within Github or some other centrally available resource. The point is to focus on designing a checklist the takes advantage of these tools to automate tasks that do not benefit from human oversight. 

If you use these principles to build a production checklist you can reduce error, improve uptime, and still give engineers the freedom and flexibility to creatively solve hard problems. 

If you have other advice on building good checklists for production ML, let us know in our discord server.

References


https://www.flightsafetyaustralia.com/2018/11/one-thing-at-a-time-a-brief-history-of-the-checklist/

https://github.com/modzy/model-deployment-checklist