Ollama has stepped beyond the terminal with a clean desktop app for macOS and Windows, making it easy to install, run, and switch between models with a few clicks.

Ollama desktop app showing a dark window with the llama logo centred and a message input field at the bottom

Recent adds some practicality: you can now drag and drop PDFs, text, or code into a chat, feed in images with multimodal models like Gemma 3, and even adjust the context window size to handle larger documents. Performance gains continue quietly in the background (overlapping CPU/GPU work, faster flash-attention, and snappier chat switching) so things just feel quicker.

The model library has grown too, with additions such as EmbeddingGemma for semantic tasks, alongside the familiar Llama, Mistral, and Gemma families.

Ollama is also experimenting with Turbo, an early preview of their upcoming cloud inference service, currently free to try with GPT-OSS models.

What else is similar to it

There are other tools in this space. LM Studio offers a desktop app that looks and feels quite similar, with an easy way to download and try models. GPT4All is another approachable option, positioned as a privacy-first chatbot that works offline on ordinary laptops. At the other end of the spectrum is llama.cpp, a high-performance engine in C and C++ that underpins many of these friendlier interfaces, though it appeals more to tinkerers than casual users.

Why it’s different

Ollama’s appeal lies in its simplicity. You can be chatting to a model within minutes of downloading, and you don’t need a cloud account or an internet connection once the model is installed. Its app is clean, its command-line is clear, and its local API means you can point other software at it as though you were using a hosted service. The licence is MIT, which makes it open, adaptable and free to use. What makes this especially compelling is that Ollama has built up an ecosystem of integrations, so that frameworks like LangChain or Spring AI can work with it immediately.

How to get started

  • Download the app for macOS, Windows or Linux at ollama.com.
  • Try the simplest command: ollama run llama3.2 “Hello” to see a model reply.
  • Developers can connect to the local API at http://localhost:11434.
  • Docker users can start an instance with:
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama

Share your thoughts…?