A Guide to Building LLM-Powered Applications with Code Llama

This guide walks you through the process of using Code Llama on an edge device to build a customized dashboard application.

A Guide to Building LLM-Powered Applications with Code Llama

If you’re a developer looking to accelerate your workflows, or if you are looking to improve your skills, you’ve likely heard of Code Llama, a large language model (LLM) finetuned for generating and discussing code. Code Llama is great for AI-assisted coding and is already being used by many developers for experimentation with AI and co-pilot programming. What’s more, Code Llama can be used by citizen developers and experts alike who want to accelerate and uplevel their coding skills.

One of the ways that Code Llama has the potential to really take off is by empowering analysts in remote, restricted environments to build applications in environments with minimal connectivity and compute capacity. This combines the power of two hot topics today – LLMs and edge computing.  In this tutorial, we’ll walk you through how even a novice developer can run code Llama on an edge device in a remote location to build a customized dashboard application. But first, let’s learn more about Code Llama and the other tools you’ll need to get going.

What are Large Language Models? (LLMs)

LLMs are a class of deep learning models that excel at generating human-like text based on the input they receive. These models are typically built upon the architecture of transformers, which are a breakthrough in the field of natural language processing (NLP). Transformers understand context and significance by identifying connections between elements in a sequence. LLMs are characterized by their vast scale, often trained on billions of words from diverse sources, and equipped with billions of parameters. In the context of NLP and AI, LLMs like GPT-3 have garnered significant attention. These AI models possess the remarkable ability to comprehend and produce text that closely resembles human language. They are versatile tools and have found applications in various domains, including writing assistance, translation, and programming support (Code Llama.) LLMs are quickly becoming integral components for tasks such as content generation, language translation, answering questions, and more.

These LLMs are redefining the way we interact with machines and leverage artificial intelligence. They function as intelligent language processors that can understand and respond to textual inputs in a manner that is often indistinguishable from human-generated text. The implications of LLMs are far-reaching, and they are instrumental in developing applications like ChatGPT and various writing assistants, enabling more natural and contextually relevant interactions with users.

What is Code Llama?

Released by Meta in August 2023, Code Llama builds upon the foundation of Llama 2 and is specifically fine-tuned for the purpose of generating and discussing code. The model was initially developed by training Llama 2 on code-specific datasets, which led to its proficiency in understanding and generating code-specific content. It is available for both research and commercial use at no cost, and represents a significant leap forward in the realm of AI-driven coding assistance.

Code Llama uses text prompts as a means of interacting with an LLM. You can provide prompts containing code and/or natural language, and the model responds by generating code and providing natural language explanations or discussions about the code. This unique approach not only makes coding workflows more efficient for developers, but it also serves as an invaluable learning resource for people learning to code. Code Llama has the potential to revolutionize the way developers approach their work, making it a powerful tool for enhancing coding productivity and comprehension.

You can download a version of Code Llama here.

MLOps Enabled by Chassisml.io and Modzy

Chassis is your go-to, open-source solution for automatically containerizing ML and AI models, streamlining the process of deploying them into production environments. With Chassis, it’s possible to package models like Code Llama into containerized prediction APIs in minutes so that they can be used anywhere – in the cloud, on-prem, or even at the edge like we’ll walk through in just a minute. Chassis abstracts away the complexity of model containerization, enabling ease, speed, and reliability, all of which will be crucial for building our LLM-enabled application with Code Llama.

Containerization is an important step in the machine learning operations (MLOps) or LLMOps (MLOps for LLMs) lifecycle. MLOps refers to the deployment and maintenance of machine learning models in production. Containers offer portability, immutability, versioning, scalability, dependency management, and resource efficiency. They ensure machine learning models and their dependencies can be packaged and deployed consistently across different environments, facilitating version control and reproducibility. Containers also aid in scaling workloads, especially in applications with varying demands for machine learning inference. Additionally, they play a pivotal role in CI/CD pipelines by automating testing, validation, and deployment, streamlining the management of machine learning models and optimizing resource utilization.

You can learn all about the newest features of the Chassis v1.5 beta release in this blog, or by checking out the video below.

For the purposes of today’s tutorial, we’ll be using Modzy to deploy, run, and scale our Code Llama model. Modzy is a software platform that allows you to deploy, connect, and serve machine learning models in the enterprise and at the edge. Modzy’s software platform brings the power of advanced analytics and machine learning to the edge, enabling AI-powered solutions for monitoring and diagnostics, predictive maintenance, and safety and security use cases. Modzy’s software is ideally suited for OEMs, systems integrators, and end customers in manufacturing, pharmaceuticals, telecommunications, energy and utilities, infrastructure, retail, as well as smart cities and buildings. For more on Modzy, you can visit modzy.com.

Tutorial

Now that we’ve broken down the tools involved, let’s use a Code Llama model to build a dashboard that visualizes data. For the sake of example, consider a geologist working in a remote oil field who could benefit from a dashboard to help with the maintenance and monitoring of oil production. Much of a geologist’s day-to-day job revolves around analyzing well data, sensor readings, and 3D models of oil fields to determine the field’s profitability. Typically, oil fields are located in remote, network-restricted environments, which constrains on-site users from leveraging analytical tools to synthesize this data more efficiently.

1 – Getting Started with Code Llama

Code Llama can help build dashboards to interact with and analyze data in real-time, even in network-restricted environments. To facilitate this, in our example, the geologist would likely have an instance of their MLOps platform (Modzy) running in the cloud in a private data center with more robust compute and connectivity, and to manage their custom library of LLM models. To get the Code Llama model running locally, we’ll use Modzy’s Edge product, called Modzy core, to deploy and serve the LLM workload on a remote device.

Let’s begin with, specifically the 7B parameter version of Code Llama in Hugging Face Transformers format.

2- Containerize the Model with ChassisML.io

From here, we can can use Chassisml.io to containerize the model. To do so, we will need a Python environment with the Chassis packaged installed. Next, we simply need to load Code Llama into memory,  define a single predict function (inference function), and Chassis will handle the rest, including creating a container with requirements and metadata using the local Docker Daemon.

Check out the code here.


3 – Deploy the Model

Once our Code Llama model container is built, we will deploy it into our central model library, in this case, Modzy. A central model library allows us the flexibility to use LLMs alongside other ML models.  Clicking on the Code Llama model from Hugging Face shows that it supports the most popular programming languages, such as Python.

Code Llama model from Hugging Face in Modzy

Next, we will use Modzy's edge deployment feature and device groups to deploy the model to the remote oil field facilities.

Deploy Code Llama model to remote oil field facility.

The Python API makes it easy, standard, and consistent to access any model.

Access Code Llama model with Python API.

4 – Build the Dashboard

Note: we highly recommend running Code Llama with accelerated hardware for optimal performance. This demo was run on hardware with a T4 GPU onboard.

With our model deployed to our remote device, let’s put Code Llama to work! First, we open a development environment in VS Code (use whichever IDE you prefer!). In our VS Code window, we open three terminals:

  1. In the first, we will run Modzy core
  2. In the second, we will run our inference script that drives the creation of the dashboard by interacting with Code Llama being served via Modzy core
  3. In the third, we will launch our Streamlit dashboard to view the progress of our work in the browser

In our first terminal, we will start Modzy core. Doing so will spin up the Code Llama container and make it accessible via an API server.

./modzy-core server --resume --model.runtime nvidia --model.timeout 200


Next, in our third terminal, we will use Streamlit to open the dashboard file with a simple command. Note: to start, this file will be blank, so when we open this in our browser, we should expect the dashboard to be a blank white page.

streamlit run dashboard.py

With Modzy core running and our blank dashboard in our browser, we will do all development in the second terminal by running the below inference script. This Python script will read the dashboard.py file as input, submit it for inference to Code Llama using Modzy’s edge client, and overwrite the file with the output from Code Llama. This means that as we iterate and interact with Code Llama, we will simply add sequential code comments in the dashboard.py file to prompt our model the way we see fit.

import json from modzy import EdgeClient from modzy.edge import InputSource # define Modzy specific variables and establish connection to edge client MODEL_ID = "tuaod1v68l" # replace with your model ID MODEL_VERSION = "1.0.0" # replace with your model version client = EdgeClient("localhost", 55000) client.connect() # open input file and create input object with open("dashboard.py", "r") as input: input_data = input.read() input_object = InputSource( key="input", data=input_data.encode() ) # submit inference inference = client.inferences.perform_inference(MODEL_ID, MODEL_VERSION, [input_object]) results = client.inferences.block_until_complete(inference.identifier, timeout=60) result = json.loads(results.result.outputs["generated_text"].data) # write results back to python file out = open("dashboard.py", 'w') out.write(result) out.close()

For our first prompt, we will ask Code Llama to create a Streamlit dashboard with some basic configuration.

# import streamlit and pandas apps, and then create streamlit application that is configured with "wide" layout, title of "Oil Well Analysis", and two columns.

After a few minor edits to Code Llama’s output and a second code comment, we’ll prompt the model to add a widget in the sidebar for us to upload a CSV file.

# import streamlit and pandas apps, and then create streamlit application that is configured with "wide" layout, title of "Oil Well Analysis", and two columns. import streamlit as st import pandas as pd st.set_page_config(layout="wide") st.title("Oil Well Analysis") st.markdown(""" This app is designed to help you analyze oil wells. """) # add widget in sidebar to upload a CSV file

This time, Code Llama fills in a lot of information for us! We again will make some minor tweaks to the raw output, and add a third code comment.

# import streamlit and pandas apps, and then create streamlit application that is configured with "wide" layout, title of "Oil Well Analysis", and two columns. import streamlit as st import pandas as pd st.set_page_config(layout="wide") st.title("Oil Well Analysis") st.markdown(""" This app is designed to help you analyze oil wells. """) # add widget in sidebar to upload a CSV file. uploaded_file = st.sidebar.file_uploader("Upload a CSV file", type=["csv"]) # if a file is uploaded, read the file and create a dataframe. if uploaded_file is not None: df = pd.read_csv(uploaded_file) st.sidebar.success("File uploaded successfully!") # if a file is not uploaded, display a message. else: st.sidebar.warning("Please upload a file!") # create two columns with st.columns

At this point, Code Llama seems to be catching on to our aspirations for this dashboard. Not only does the model create two columns to our dashboard, but it also predicts what we’d like to display in the columns. As much as we love the model’s proactivity, let’s again provide very specific instructions to the model.

# import streamlit and pandas apps, and then create streamlit application that is configured with "wide" layout, title of "Oil Well Analysis", and two columns. import streamlit as st import pandas as pd st.set_page_config(layout="wide") st.title("Oil Well Analysis") st.markdown(""" This app is designed to help you analyze oil wells. """) # add widget in sidebar to upload a CSV file. uploaded_file = st.sidebar.file_uploader("Upload a CSV file", type=["csv"]) # if a file is uploaded, read the file and create a dataframe. if uploaded_file is not None: df = pd.read_csv(uploaded_file) st.sidebar.success("File uploaded successfully!") # if a file is not uploaded, display a message. else: st.sidebar.warning("Please upload a file!") # create two columns with st.columns. col1, col2 = st.columns(2) # with col1, create subheader "Dataframe", display the data, and calculate summary statistics

This looks to do exactly what we were hoping it would and then some! Let’s make one final edit and prompt the model to visualize some data our dashboard’s second column.

# import streamlit and pandas apps, and then create streamlit application that is configured with "wide" layout, title of "Oil Well Analysis", and two columns. import streamlit as st import pandas as pd st.set_page_config(layout="wide") st.title("Oil Well Analysis") st.markdown(""" This app is designed to help you analyze oil wells. """) # add widget in sidebar to upload a CSV file. uploaded_file = st.sidebar.file_uploader("Upload a CSV file", type=["csv"]) # if a file is uploaded, read the file and create a dataframe. if uploaded_file is not None: df = pd.read_csv(uploaded_file) st.sidebar.success("File uploaded successfully!") # if a file is not uploaded, display a message. else: st.sidebar.warning("Please upload a file!") # create two columns with st.columns. col1, col2 = st.columns(2) # with col1, create subheader "Dataframe", display the data, and calculate summary statistics. with col1: st.subheader("Dataframe") st.dataframe(df) st.write("Summary statistics:") st.write(df.describe()) # with col2, create subheader "Visualizations", and display three line charts, where each uses the parameters data=df, x="Date", and the y axis is different for each of the three, using the columns "Oil volume (m3/day)", "Gas volume (m3/day)", "Water volume (m3/day)"

Now that Code Llama has programmed our entire dashboard for us, let’s check it out in the browser.

Oil Well Analysis Dashboard Generated by Code Llama

It looks like we can add some error handling to prevent this message from appearing before we upload our data, but for now, let’s upload our sample oil well data and see what the dashboard does with it.

Oil Well Data Visualized

And just like that, we have a robust dashboard that helps us analyze data more efficiently.

Wrapping Up

This tutorial provides a comprehensive guide to leverage Code Llama and other tools for creating a customized dashboard for oil field monitoring. As you can see from this experience, the Code Llama model helped you generate the code you need tailored to your specific use case, eliminating the need to use Google or Stack Overflow for additional support or help. To get started using Code Llama, chassisml.io, and Modzy to build your own dashboard, book a demo.