On Wednesday, Apple released optimizations that allow the Stable Diffusion AI image generator to run on Apple Silicon using Core ML, Apple’s proprietary framework for machine learning models. The optimizations will allow app developers to use Apple Neural Engine hardware to run Stable Diffusion about twice as fast as previous Mac-based methods.
Stable Diffusion (SD), which launched in August, is an open source AI image synthesis model that generates novel images using text input. For example, typing “astronaut on a dragon” into SD will typically create an image of exactly that.
By releasing the new SD optimizations—available as conversion scripts on GitHub—Apple wants to unlock the full potential of image synthesis on its devices, which it notes on the Apple Research announcement page. “With the growing number of applications of Stable Diffusion, ensuring that developers can leverage this technology effectively is important for creating apps that creatives everywhere will be able to use.”
Apple also mentions privacy and avoiding cloud computing costs as advantages to running an AI generation model locally on a Mac or Apple device.
“The privacy of the end user is protected because any data the user provided as input to the model stays on the user’s device,” says Apple. “Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.”
Currently, Stable Diffusion generates images fastest on high-end GPUs from Nvidia when run locally on a Windows or Linux PC. For example, generating a 512×512 image at 50 steps on an RTX 3060 takes about 8.7 seconds on our machine.
In comparison, the conventional method of running Stable Diffusion on an Apple Silicon Mac is far slower, taking about 69.8 seconds to generate a 512×512 image at 50 steps using Diffusion Bee in our tests on an M1 Mac Mini.
According to Apple’s benchmarks on GitHub, Apple’s new Core ML SD optimizations can generate a 512×512 50-step image on an M1 chip in 35 seconds. An M2 does the task in 23 seconds, and Apple’s most powerful Silicon chip, the M1 Ultra, can achieve the same result in only nine seconds. That’s a dramatic improvement, cutting generation time almost in half in the case of the M1.
Apple’s GitHub release is a Python package that converts Stable Diffusion models from PyTorch to Core ML and includes a Swift package for model deployment. The optimizations work for Stable Diffusion 1.4, 1.5, and the newly released 2.0.
At the moment, the experience of setting up Stable Diffusion with Core ML locally on a Mac is aimed at developers and requires some basic command-line skills, but Hugging Face published an in-depth guide to setting Apple’s Core ML optimizations for those who want to experiment.
For those less technically inclined, the previously mentioned app called Diffusion Bee makes it easy to run Stable Diffusion on Apple Silicon, but it does not integrate Apple’s new optimizations yet. Also, you can run Stable Diffusion on an iPhone or iPad using the Draw Things app.