As organizations around the world continue to adopt artificial intelligence (AI) and machine learning (ML), they face two tasks: (1) creating or acquiring these cutting-edge technologies, and (2) deploying these AI models into production. Often, this second task – moving AI models out of the lab and into production – presents an entirely new set of challenges for organizations to overcome. Modzy’s™ solution makes this deployment and scaling challenge not only feasible, but also streamlined and simple for data scientists. Once a ML model is trained, tested, and fine-tuned, we package our models in a specific, yet straightforward, way and import them into the Modzy platform. At this point, the models are then accessible via the Modzy APIs and Software Developer Kits (SDKs).

Modzy Container Requirements

In our developer documentation center, we outline a few methods that can serve as starting points for packaging models to meet the API specification. Namely, we provide a Python-specific template repository, an example that writes a RESTful interface from scratch, and the raw API endpoint specifications. While the Python-specific template can serve as an excellent resource for Python-written models, it is important to note that Modzy places no limitations on the model development process. The programming language, machine learning framework, or other development-specific choices are entirely up to the user as long as the end product can be containerized. Put simply, all model containers in Modzy must adhere to the following criteria:

  • Be fully self-contained
  • Contain a generic Web server
  • Expose an HTTP API that implements the three Modzy API routes

Fully Self-Contained – Securing AI Models

Due to several security vulnerabilities, Modzy requires all models adhere to a fully self-contained format. This means containers cannot make any external calls to other APIs, other containers, outside databases, or other potential model dependencies. Moreover, all model code, model weights, and model architectures must reside within the model container. As an example, consider a simple image classification model from the Keras framework that downloads pre-trained weights from the Keras website. For this model to be containerized in a Modzy-compatible manner, the weights need to be downloaded locally and saved inside the container.

Contain Generic Web Server

Modzy serves model containers as microservices through our API, and in order to fit this deployment paradigm, model containers must contain a generic Web server to create the HTTP Web application. Fortunately for developers, these Web servers exist for many common programming languages. The Python-specific template we provide on the developer documentation page leverages Flask, a microframework native to Python, however users could use Sprint Boot for Java, Sinatra or Unicorn for Ruby, dance.jl for Julia, and many others that work with different programming languages. Satisfying this requirement ensures model containers can in fact be accessed by the Modzy API services.

Once a model container meets these criteria, it can then be uploaded and deployed to the Modzy platform. At this phase, the model container must be accompanied by a model metadata file (YAML file), that defines some important information about inputs, outputs, and hardware requirements our API uses to run the container when users submit jobs.

Expose Modzy API Endpoints

Modzy expects each model container to respond to three restful API endpoints: GET /status, POST /run, and POST /shutdown. Each endpoint is responsible for a different functionality as it pertains to running inference for AI models.

GET /status

  • The Modzy API hits the first endpoint when a model container is called and spun up. This only needs to execute one time for the duration of the time the user is submitting job requests to the model. And as a result, all model initialization should be mapped to this endpoint, which are processes including the loading of model weights, defining of any instance variables, or the creation of labels and configuration files used during the inference process.

POST /run

  • The second API route is called when a user submits a job to a model container. In doing so, the user passes through a json file containing a few pieces of information, including the model container ID, model version, input type, and the filesystem directory path from which the model should read the input data file(s). At this point, the model code that runs inference on a user-specified input should execute and write out the prediction results to the user-specified filesystem output directory path.

POST /shutdown

  • The last API route shuts down the container after the user is finished submitting jobs, whereby the model server process inside the container should exit with an exit code of zero.

Figure 1. This diagram showcases the structure of a Modzy-compatible Docker container that connects to the three Modzy API endpoints.

Streamlined Model Deployment with Standardized Packaging

If your organization struggles with productionizing powerful AI capabilities, Modzy’s standardized and streamlined solution is a great option for you. Our model container requirements allow data scientists to take their well-performing models and deploy them at scale for their enterprise to use in a matter of a few hours, even without a strong knowledge base in web frameworks or containerization.