Ollama: the first 30 minutes
How to install Ollama, pick a model, and make your first request.
Installation
curl -fsSL https://ollama.com/install.sh | sh on Linux.
On Mac, brew install ollama or the dmg from the website.
On Windows, the official installer.
First run
ollama run llama3.1:8b downloads the model and drops you into a chat.
From then on, Ollama exposes an OpenAI-compatible API at http://localhost:11434. That means any client that speaks the OpenAI API can be repointed at a local model with one config change.
What to try on day one
- llama3.1:8b — general-purpose and fast.
- qwen2.5:7b — excellent multilingual support.
- deepseek-coder:6.7b — code.
- nomic-embed-text — embeddings.
Hardware
- 8B models — comfortable on 16GB RAM + an 8GB+ GPU, or Apple Silicon M1 Pro.
- 13B models — 24GB RAM + a 12GB+ GPU, or M2 Pro and up.
- 70B+ — a serious workstation or a server.
The principle
Start with a small model. If it can't handle your tasks, then move up in size. Don't size up before you have evidence you need it.
Install Ollama. Run llama3.1:8b. Make 5 of your typical requests. Compare the answers against ChatGPT/Claude on the same prompts.
Copy and adapt to your context. Text in angle brackets should be replaced.
(Through the Ollama localhost API)
POST http://localhost:11434/api/chat
{
"model": "qwen2.5:7b",
"messages": [
{ "role": "system", "content": "You are my development assistant. Be concise and concrete." },
{ "role": "user", "content": "<task>" }
]
}- Running a huge model on a weak GPU and getting 1 token per second.
- Comparing it to GPT-5 and being disappointed when expectations aren't met.
- Never moving to quantized versions of the models.
- Quantized (Q4_K_M) gives almost the same quality for a fraction of the memory.
- Apple Silicon is a great fit for small and mid-size models.
- The combo: a local model for sensitive data + the cloud for fine-grained work.
Privacy scenarios, experiments, cheap processing of large volumes.
If your hardware is weak and privacy isn't critical — the cloud is cheaper to operate.