Image Generation

What does it mean to run an AI model locally?

Sep 2, 2024

min read

There are two ways to run a generative image model. The first is to leverage an online service to send a prompt and get back an image. The way this typically works is your prompt is entered into a UI, the prompt is then sent to an online image generator API, which then leverages massive, powerful GPUs to generate an image and send it back to you.

This is currently how models like Dall-E, Midjourney, and Gemini work - as well as a number of services that leverage Stable Diffusion (or newer models like Flux) on the backend.

While this democratizes access for individuals that don’t have a powerful computer, there are downsides. Your data, and images, aren’t private. You typically have to pay for every image you generate - which can add up quickly especially when GPUs are in short supply. And lastly the massive data centers and GPUs are power hungry.

That’s not to say by generating an image of a corgi riding a unicorn you’re contributing to the climate crisis, but when looking at the grand scale of training + generating, the costs and power add up.

A corgi riding a unicorn - hopefully not contributing to the climate crisis

The second way to generate an image is by running a model completely locally on your computer. This way leverages your computer’s RAM to take a prompt and output an image. There are a few minimum requirements for this - the larger the model, the more RAM it requires, and the more RAM you have the faster your images will generate - but there are also big advantages. Your data is completely private and secure.

You won’t have any usage limits and you don’t even need internet access. And you won’t rely on massive, power hungry GPUs to see results.

Running a model locally

To understand how that works, let’s take a look at the steps for running a model locally:

Download a model

The first step is finding a model you want to use. With programs like Odyssey, models are downloaded automatically when you download the software but you can also just download a model yourself. Different pieces of software have different requirements. For Odyssey, an image model will typically be a compressed file that contains a text encoder, Unet, decoder, an encoder, a merges.txt file, and a vocab.json file
Input processing

Depending on what software you’re using and how you’re prompting the model, this input will be different (typically either text or an image.) But the basic premise is that you provide an input that your computer preprocesses into a format your computer understands
Inference

The beating heart of an AI model is inference - or the process of using a model to make predictions or decisions based on the input data. The magic of a locally run model is that inference happens entirely on your device, typically your computer’s CPU or GPU. There are different ways to optimize how fast this occurs on your computer and the size of the model will typically determine how quickly this can occur.
Post-processing - once a result is generated, some post processing occurs to get the result into a format that a human can understand. This step is less intensive than inference, but ultimately the piece that makes a result useable
Display

The magical end result is displayed.

Differences Between Local and Cloud-based Inference

Let’s take a look at a few key differences between locally run models that are run through the cloud.

Data Flow and Privacy

Local: All data stays on your device and nothing is sent over the internet
Cloud-based: Data is sent to a remote server, processed there, and results are sent back. This introduces potential privacy concerns.

Advantage: Local

Processing Power

Local: Limited to your device's CPU/GPU capabilities. This can be a constraint for large models.
Cloud-based: Can leverage powerful server-grade hardware, allowing for larger and more complex models (i.e. ChatGPT or Midjourney)

Advantage: Cloud

Latency

Local: Generally lower latency as there's no network communication. Good for real-time applications.
Cloud-based: Higher latency due to network transmission. Can be problematic for time-sensitive applications.

Advantage: Local

Internet Dependency

Local: Works offline. No internet connection required after initial model download.
Cloud-based: Requires a stable internet connection for every inference.

Advantage: Local

Scalability

Local: Limited by device resources. Upgrading means replacing hardware.
Cloud-based: Easily scalable. Can handle varying loads by adjusting server resources.

Advantage: Cloud

Model Updates

Local: Requires downloading new model versions to the device, though programs like Odyssey can deploy updates that include updated models
Cloud-based: Updates can be deployed instantly on the server side.

Advantage: Tie

Device Resource Usage

Local: Consumes device memory and processing power, which can affect battery life and overall performance.
Cloud-based: Minimal local resource usage, as heavy computations happen on servers.

Advantage: Tie

Cost Structure

Local: One-time cost (if any) for the model or app. No ongoing costs for inference.
Cloud-based: Often involves ongoing costs based on usage (API calls, compute time).

Advantage: Local

Model Size and Complexity

Local: Often uses smaller, optimized models to fit device constraints.
Cloud-based: Can use larger, more complex models without device limitations.

Advantage: Cloud

Deployment and Maintenance

Local: Users are responsible for keeping the model updated on their devices but gives stronger version control
Cloud-based: Centralized management allows for easier updates and maintenance but models can perform differently unexpectedly

Advantage: Tie

Software

Local: Limited, with emerging tools like Odyssey making local inference easier
Cloud-based: A number of established web-based applications

Advantage: Tie

Which is better?

Local is better for privacy, latency, internet dependency, cost structure, and software
Cloud-based models are better for processing power, scalability, and model size
There isn't a clear advantage for model updates, resource usage, deployment and maintenance, or software

What does it mean to run an AI model locally?

What does it mean to run an AI model locally?

What does it mean to run an AI model locally?

Running a model locally

Differences Between Local and Cloud-based Inference

Which is better?

Try Odyssey for free

Subscribe to Odyssey's newsletter

Try Odyssey for free

Share It On: