Local LLM: Ollama, LM Studio, private stack · Lesson 1

Ollama: the first 30 minutes

How to install Ollama, pick a model, and make your first request.

15 min read2 questions in quizReady prompt includedIn progress

Installation

curl -fsSL https://ollama.com/install.sh | sh on Linux. On Mac, brew install ollama or the dmg from the website. On Windows, the official installer.

First run

ollama run llama3.1:8b downloads the model and drops you into a chat.

From then on, Ollama exposes an OpenAI-compatible API at http://localhost:11434. That means any client that speaks the OpenAI API can be repointed at a local model with one config change.

What to try on day one

llama3.1:8b — general-purpose and fast.
qwen2.5:7b — excellent multilingual support.
deepseek-coder:6.7b — code.
nomic-embed-text — embeddings.

Hardware

8B models — comfortable on 16GB RAM + an 8GB+ GPU, or Apple Silicon M1 Pro.
13B models — 24GB RAM + a 12GB+ GPU, or M2 Pro and up.
70B+ — a serious workstation or a server.

The principle

Start with a small model. If it can't handle your tasks, then move up in size. Don't size up before you have evidence you need it.

Practical exercise

What to do after this lesson

Install Ollama. Run llama3.1:8b. Make 5 of your typical requests. Compare the answers against ChatGPT/Claude on the same prompts.

Task grader

Install Ollama. Run llama3.1:8b. Make 5 of your typical requests. Compare the answers against ChatGPT/Claude on the same prompts.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

(Through the Ollama localhost API)

POST http://localhost:11434/api/chat
{
  "model": "qwen2.5:7b",
  "messages": [
    { "role": "system", "content": "You are my development assistant. Be concise and concrete." },
    { "role": "user", "content": "<task>" }
  ]
}

Prompt sandbox

Prompt

Common mistakes

What people get wrong

Running a huge model on a weak GPU and getting 1 token per second.
Comparing it to GPT-5 and being disappointed when expectations aren't met.
Never moving to quantized versions of the models.

Pro tips

What works but no one documents

Quantized (Q4_K_M) gives almost the same quality for a fraction of the memory.
Apple Silicon is a great fit for small and mid-size models.
The combo: a local model for sensitive data + the cloud for fine-grained work.

When to use

Privacy scenarios, experiments, cheap processing of large volumes.

When not to use

If your hardware is weak and privacy isn't critical — the cloud is cheaper to operate.

Official sources

Ollama

Quiz — 2 questions

1.Why is Ollama good for getting started?

2.A quantized model is:

Answered: 0 of 2

Hardware: what to buy and why →

Discussion

No comments yet. Be the first!