Image Generation

Image Generation

What does it mean to run an AI model locally?

What does it mean to run an AI model locally?

What does it mean to run an AI model locally?

Sep 2, 2024



min read

A diagram of running a model locally
A diagram of running a model locally
A diagram of running a model locally

There are two ways to run a generative image model. The first is to leverage an online service to send a prompt and get back an image. The way this typically works is your prompt is entered into a UI, the prompt is then sent to an online image generator API, which then leverages massive, powerful GPUs to generate an image and send it back to you.

This is currently how models like Dall-E, Midjourney, and Gemini work - as well as a number of services that leverage Stable Diffusion (or newer models like Flux) on the backend.

While this democratizes access for individuals that don’t have a powerful computer, there are downsides. Your data, and images, aren’t private. You typically have to pay for every image you generate - which can add up quickly especially when GPUs are in short supply. And lastly the massive data centers and GPUs are power hungry.

That’s not to say by generating an image of a corgi riding a unicorn you’re contributing to the climate crisis, but when looking at the grand scale of training + generating, the costs and power add up.

A corgi riding a unicorn - hopefully not contributing to the climate crisis

The second way to generate an image is by running a model completely locally on your computer. This way leverages your computer’s RAM to take a prompt and output an image. There are a few minimum requirements for this - the larger the model, the more RAM it requires, and the more RAM you have the faster your images will generate - but there are also big advantages. Your data is completely private and secure.

You won’t have any usage limits and you don’t even need internet access. And you won’t rely on massive, power hungry GPUs to see results.

Running a model locally

To understand how that works, let’s take a look at the steps for running a model locally:

  1.  Download a model

    The first step is finding a model you want to use. With programs like Odyssey, models are downloaded automatically when you download the software but you can also just download a model yourself. Different pieces of software have different requirements. For Odyssey, an image model will typically be a compressed file that contains a text encoder, Unet, decoder, an encoder, a merges.txt file, and a vocab.json file

  2. Input processing

    Depending on what software you’re using and how you’re prompting the model, this input will be different (typically either text or an image.) But the basic premise is that you provide an input that your computer preprocesses into a format your computer understands

  3. Inference

    The beating heart of an AI model is inference - or the process of using a model to make predictions or decisions based on the input data. The magic of a locally run model is that inference happens entirely on your device, typically your computer’s CPU or GPU. There are different ways to optimize how fast this occurs on your computer and the size of the model will typically determine how quickly this can occur.

  4. Post-processing - once a result is generated, some post processing occurs to get the result into a format that a human can understand. This step is less intensive than inference, but ultimately the piece that makes a result useable

  5. Display

    The magical end result is displayed.

Differences Between Local and Cloud-based Inference

Let’s take a look at a few key differences between locally run models that are run through the cloud.

Data Flow and Privacy

  • Local: All data stays on your device and nothing is sent over the internet

  • Cloud-based: Data is sent to a remote server, processed there, and results are sent back. This introduces potential privacy concerns.

Advantage: Local

Processing Power

  • Local: Limited to your device's CPU/GPU capabilities. This can be a constraint for large models.

  • Cloud-based: Can leverage powerful server-grade hardware, allowing for larger and more complex models (i.e. ChatGPT or Midjourney)

Advantage: Cloud


  • Local: Generally lower latency as there's no network communication. Good for real-time applications.

  • Cloud-based: Higher latency due to network transmission. Can be problematic for time-sensitive applications.

Advantage: Local

Internet Dependency

  • Local: Works offline. No internet connection required after initial model download.

  • Cloud-based: Requires a stable internet connection for every inference.

Advantage: Local


  • Local: Limited by device resources. Upgrading means replacing hardware.

  • Cloud-based: Easily scalable. Can handle varying loads by adjusting server resources.

Advantage: Cloud

Model Updates

  • Local: Requires downloading new model versions to the device, though programs like Odyssey can deploy updates that include updated models

  • Cloud-based: Updates can be deployed instantly on the server side.

Advantage: Tie

Device Resource Usage

  • Local: Consumes device memory and processing power, which can affect battery life and overall performance.

  • Cloud-based: Minimal local resource usage, as heavy computations happen on servers.

Advantage: Tie

Cost Structure

  • Local: One-time cost (if any) for the model or app. No ongoing costs for inference.

  • Cloud-based: Often involves ongoing costs based on usage (API calls, compute time).

Advantage: Local

Model Size and Complexity

  • Local: Often uses smaller, optimized models to fit device constraints.

  • Cloud-based: Can use larger, more complex models without device limitations.

Advantage: Cloud

Deployment and Maintenance

  • Local: Users are responsible for keeping the model updated on their devices but gives stronger version control

  • Cloud-based: Centralized management allows for easier updates and maintenance but models can perform differently unexpectedly

Advantage: Tie


  • Local: Limited, with emerging tools like Odyssey making local inference easier

  • Cloud-based: A number of established web-based applications

Advantage: Tie

Which is better?

  • Local is better for privacy, latency, internet dependency, cost structure, and software

  • Cloud-based models are better for processing power, scalability, and model size

  • There isn't a clear advantage for model updates, resource usage, deployment and maintenance, or software

Try Odyssey for free

Download for Mac

Subscribe to Odyssey's newsletter

Get the latest from Odyssey delivered directly to your inbox!

Try Odyssey for free

Download for Mac

Share It On: