Aug 27, 2024
|
7
min read
There are a ton of advantages of running AI models locally. While Stable Diffusion has been the gold standard for open source, locally run AI models, there's a range of different Stable Diffusion models, fine tuned custom models, and even competitors emerging.
While sites like CivitAI seem to have a model for everything, let's take a look at our four favorite locally run image models.
Stable Diffusion 1.5
Released in October 2020 and open source, Stable Diffusion 1.5 (SD1.5) became the foundational model the Stable Diffusion ecosystem was built around. At the time, SD1.5 was a state of the art, producing some of the highest quality imagery in the burgeoning space. The real advantage, however, came from the model being open source.
By releasing the model under a The CreativeML OpenRAIL M license, developers could not only download the model but build on top of it. This led to development of pre-processors such as ControlNet which let you completely control the output of an image. Also, because of how long SD1.5 has been around, there are a large number of fine tuned models that use SD1.5 as a base to build a better model, such as:
Realistic Vision - an excellent model that improves SD1.5 dramatically. By combining Realistic Vision with ControlNet, you can generate realistic images with incredible control
Prompt: RAW photo of a [subject] 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
Dreamshaper - a model that's a bit more "all-purpose" than Realistic Vision since it leans less toward realism than Realistic Vision. The model ranges from 3D to Anime and can similarly leverage ControlNet.
Prompt 1: gorgeous realistic portrait of a half woman half cyborg, terrified cyborg in a bright white room, realism, 8k, UHD
Prompt 2: gorgeous watercolor of a beautiful pond at sunset, beautiful colors, brushstrokes, incredible quality
This is why we still think Stable Diffusion 1.5 is one of the best local models you can use - the models run incredible fast locally, there's an ecosystem of custom models (like the ones we support on HuggingFace), and in-depth ControlNet support.
Stable Diffusion XL
Released in July of 2023, SDXL was a big leap in quality over SD1.5. The generic, basic model meant that better images could be generated with much simpler prompts.
A collection of SDXL images - courtesy of Stable Diffusion
SDXL is also open source - though the ecosystem that’s built around it is still somewhat nascent. Primarily, ControlNet support is still not as robust as it is for SD1.5 - though IP Adapters have begun emerging as the popular way for controlling SDXL images.
One of the biggest upsides has been the release of Turbo and Lightning models. These models drastically reduce the number of steps it takes to generate an image, which makes image generation extremely fast.
A few of our favorite SDXL fine tuned models also have Lightning versions - giving exceptional image quality without a lot of latency:
Realistic Vision XL Lightning - the realistic vision series of models have become our go-to models due to their ability to generate people and a variety of photography styles. With the Lightning models, generation time is extremely fast and an image can be generated with just ten steps
Prompt 1: extremely detailed, landscape of an unknown planet, monolith, lake, cloudy weather, unreal engine 5, perfect composition, vibrant, rtx, hbao
Prompt 2: instagram photo, portrait photo of (man:2.1), 28 y.o, perfect face, natural skin, hard shadows, film grainJuggernaut X Lightning - Juggernaut models are massive and cover a wide array of image types. We like the lightning models here as well, where they can bring image generation down to just 4 steps
Prompt 1: 3D render of a sci-fi baroque concept design of anatomically correct brain device with terrarium, steampunk, intricate details, scientific, hyper detailed, photorealistic
Prompt 2: instagram photograph of a bungalow, sunset, ocean, architectural lighting, modern design, wooden walkway, reflection on water, blue hour, clear sky, fujifilm, 35mm
Flux
The Flux series of models took the AI art community by storm and have quickly become one of the most popular locally run models available. Image quality is consistent, spans a wide range of styles, and is able to handle difficult things like hands and text extremely well. Prompt adherence is also an advantage, which puts the Flux series of models squarely at the top for locally run AI models.
Since Flux was built by Black Forest Labs (a well-funded company composed of some of the earliest Stable Diffusion engineers) the architecture is very similar to Stable Diffusion - which means custom versions of Flux are already starting to emerge and even ControlNets.
For the first time, this means that Stable Diffusion has true competition on the market - which we think is only a good sign with all the chaos that Stability AI has experienced over the past few months.
Stable Diffusion 3 (maybe)
The last model we want to highlight is Stable Diffusion 3 - with one big caveat: the jury is still out. When Stability AI first released Stable Diffusion, they released a smaller model that had some major issues dealing with human anatomy. Bodies looked contorted, limbs were often askew, and the model really struggled with the types of generation that earlier models excelled at.
There were also confusing legal terms associated with running the models locally, which is one of the main reasons why Odyssey didn't include Stable Diffusion 3 as an option at launch.
When you ran the larger model via API though, those issues were nowhere to be seen. In fact through the API, Stable Diffusion 3 was state of the art and looked like a technological breakthrough for prompt adherence, generating text, and baseline image quality.
A collection of SD3 images via API, courtesy of Stability AI
There have been rumors that Stability AI is working on releasing a model update that performs closer to the API version and has updated legal guidelines. If that's the case, we expect SD3 to be right there in the conversation as Flux as the best locally run model available.
Conclusion
Different models have different benefits - and some unknowns. By comparing models across quality, speed, control, and availability (particularly in apps like Odyssey), you can make an informed decision about which locally run image model to use.