Resource

Resource

Five Reasons to Run AI Models Locally

Five Reasons to Run AI Models Locally

Five Reasons to Run AI Models Locally

Aug 22, 2024

|

5

min read

There are two primary ways to run generative AI models: through the cloud or locally on your computer.

Cloud-based models use an API, or some sort of UI like ChatGPT to take a prompt, your settings, and generate an image, text, video, song, etc. These models leverage powerful GPUs that live in some far off data center.

For most consumers, this is the most popular way to run AI models…for now. Models like GPT-4o, Midjourney, and RunwayML are so big and memory-intensive that they need to be hosted on large GPU clusters to generate results.

But the way AI models are run is changing. Generative AI Models are getting more efficient and consumer hardware is evolving to meet the demands of the generative AI boom.

Just look at Apple's latest WWDC presentation - while some models would be run on their own secure cloud, their focus was often on small, on device models that only required your device's memory.

A preview of Apple Intelligence on-device models, courtesy of Apple

The reason for that shift is that there are very clear advantages to locally run models. We've compiled what we believe are the five best reasons to run AI models locally.


Control

Many of the locally run generative AI models are open source, which means you can download them for free and run them directly on your computer.

Models like Stable Diffusion, Flux, Llama, and Mistral all release their models in a way that developers, and consumers with the right tools, can change granular settings, fine tune models on their own data, and "peek under the hood" to gain a sense of how everything works.

When you run a model through the cloud, this simply isn't the case. You are at the whims of a service provider - which means that as models are updated, or changes are made to how they interact to a prompt, your experience can change and models can potentially get worse.

Having control over how your models run and work is a major advantage - and one of the biggest reasons we advocate for running models locally.

Cost

Running a model through a cloud provider means you will either be paying for each generation via an API, paying for the costs of running a model on a powerful GPU, or paying a subscription that will have some margin added on top of either one of those costs.

If you're running a model locally, you'll be paying for your computer and whatever software you use to run the model…and that's it. You won't be paying per generation or buying into fluctuating, and expensive, GPU costs. And you won't be at the whims of a service provider's API.

LLM cost comparison chart - courtesy of AI Chat Makers

For truly scalable, bulk generative AI, locally run models are the most inexpensive option.

Privacy

When you run a model via the cloud, your data is sent to the model provider by necessity. If, for example, you're using a model to analyze sensitive customer data or generating client images, whatever cloud provider you're using can potentially access that data and use it for training purposes.

Just look at the controversy around tools like Figma's AI design assistant. Anything a user sent to the model could in turn be used to train the model which, in creative fields, could mean that your work is being leveraged without compensation.

In contrast, the risk of privacy violations with locally run models is very limited. Unless you're using software that has security concerns (like the recent issue with one of ComfyUI's custom nodes) there's little risk of any of your data being shared with a 3rd party.

Efficiency

Many people's AI experiences leverage general purpose, very large models that react to any sort of request. Yet it's often small, focused models that both perform the best and respond the quickest.

Especially with LLMs, running local models can be incredibly efficient and, when the model knows the context of the task, the results can be instantaneous.

If we look again at Apple's approach to Apple Intelligence, they have a number of small LLMs that are focused on specific tasks. By taking this approach, Apple can engrain AI on local devices within contextual workflows - which in turn saves on the cost and power consumption of GPU. We call that a win-win-win.

It's the future

When we first started building Odyssey, our Mac app for building workflows that are powered by local AI models, generating a Stable Diffusion 1.5 image took about a minute and thirty seconds on an M1 Mac. Over the course of a year, Stable Diffusion's models improved, Lightning models were released which required fewer steps to generate an image, and Apple released Mac's with a M3 chip. Now, generating a high-quality, SDXL image can happen in less than 20 seconds. On Nvidia hardware, it can happen in real time.

The reality is that these types of models will only get more efficient and the hardware will only get better at running them.

That's why we've invested so much in Odyssey around running models locally: we believe that the benefits far outweigh the cons, and that in just a few years locally run models will be the default - especially when there's software that makes it as easy as Odyssey does.

Try Odyssey for free

Download for Mac

Subscribe to Odyssey's newsletter

Get the latest from Odyssey delivered directly to your inbox!

Try Odyssey for free

Download for Mac

Share It On: