Sep 2, 2024
|
8
min read
There are two ways to run a generative image model. The first is to leverage an online service to send a prompt and get back an image. The way this typically works is your prompt is entered into a UI, the prompt is then sent to an online image generator API, which then leverages massive, powerful GPUs to generate an image and send it back to you.
This is currently how models like Dall-E, Midjourney, and Gemini work - as well as a number of services that leverage Stable Diffusion (or newer models like Flux) on the backend.
While this democratizes access for individuals that don’t have a powerful computer, there are downsides. Your data, and images, aren’t private. You typically have to pay for every image you generate - which can add up quickly especially when GPUs are in short supply. And lastly the massive data centers and GPUs are power hungry.
That’s not to say by generating an image of a corgi riding a unicorn you’re contributing to the climate crisis, but when looking at the grand scale of training + generating, the costs and power add up.
A corgi riding a unicorn - hopefully not contributing to the climate crisis
The second way to generate an image is by running a model completely locally on your computer. This way leverages your computer’s RAM to take a prompt and output an image. There are a few minimum requirements for this - the larger the model, the more RAM it requires, and the more RAM you have the faster your images will generate - but there are also big advantages. Your data is completely private and secure.
You won’t have any usage limits and you don’t even need internet access. And you won’t rely on massive, power hungry GPUs to see results.
Running a model locally
To understand how that works, let’s take a look at the steps for running a model locally:
Download a model
The first step is finding a model you want to use. With programs like Odyssey, models are downloaded automatically when you download the software but you can also just download a model yourself. Different pieces of software have different requirements. For Odyssey, an image model will typically be a compressed file that contains a text encoder, Unet, decoder, an encoder, a merges.txt file, and a vocab.json fileInput processing
Depending on what software you’re using and how you’re prompting the model, this input will be different (typically either text or an image.) But the basic premise is that you provide an input that your computer preprocesses into a format your computer understandsInference
The beating heart of an AI model is inference - or the process of using a model to make predictions or decisions based on the input data. The magic of a locally run model is that inference happens entirely on your device, typically your computer’s CPU or GPU. There are different ways to optimize how fast this occurs on your computer and the size of the model will typically determine how quickly this can occur.Post-processing - once a result is generated, some post processing occurs to get the result into a format that a human can understand. This step is less intensive than inference, but ultimately the piece that makes a result useable
Display
The magical end result is displayed.
Differences Between Local and Cloud-based Inference
Let’s take a look at a few key differences between locally run models that are run through the cloud.
Data Flow and Privacy
Local: All data stays on your device and nothing is sent over the internet
Cloud-based: Data is sent to a remote server, processed there, and results are sent back. This introduces potential privacy concerns.
Advantage: Local
Processing Power
Local: Limited to your device's CPU/GPU capabilities. This can be a constraint for large models.
Cloud-based: Can leverage powerful server-grade hardware, allowing for larger and more complex models (i.e. ChatGPT or Midjourney)
Advantage: Cloud
Latency
Local: Generally lower latency as there's no network communication. Good for real-time applications.
Cloud-based: Higher latency due to network transmission. Can be problematic for time-sensitive applications.
Advantage: Local
Internet Dependency
Local: Works offline. No internet connection required after initial model download.
Cloud-based: Requires a stable internet connection for every inference.
Advantage: Local
Scalability
Local: Limited by device resources. Upgrading means replacing hardware.
Cloud-based: Easily scalable. Can handle varying loads by adjusting server resources.
Advantage: Cloud
Model Updates
Local: Requires downloading new model versions to the device, though programs like Odyssey can deploy updates that include updated models
Cloud-based: Updates can be deployed instantly on the server side.
Advantage: Tie
Device Resource Usage
Local: Consumes device memory and processing power, which can affect battery life and overall performance.
Cloud-based: Minimal local resource usage, as heavy computations happen on servers.
Advantage: Tie
Cost Structure
Local: One-time cost (if any) for the model or app. No ongoing costs for inference.
Cloud-based: Often involves ongoing costs based on usage (API calls, compute time).
Advantage: Local
Model Size and Complexity
Local: Often uses smaller, optimized models to fit device constraints.
Cloud-based: Can use larger, more complex models without device limitations.
Advantage: Cloud
Deployment and Maintenance
Local: Users are responsible for keeping the model updated on their devices but gives stronger version control
Cloud-based: Centralized management allows for easier updates and maintenance but models can perform differently unexpectedly
Advantage: Tie
Software
Local: Limited, with emerging tools like Odyssey making local inference easier
Cloud-based: A number of established web-based applications
Advantage: Tie
Which is better?
Local is better for privacy, latency, internet dependency, cost structure, and software
Cloud-based models are better for processing power, scalability, and model size
There isn't a clear advantage for model updates, resource usage, deployment and maintenance, or software