Local LLM: Ollama, LM Studio, private stack · Lesson 2

Hardware: what to buy and why

A no-nonsense guide to picking a GPU or Mac for running models locally.

15 min read2 questions in quizReady prompt includedIn progress

The one rule

VRAM (or unified memory on Apple) is the hard limit. The model has to fit entirely in memory; if it spills over, everything grinds to a crawl.

Options by budget

Apple Silicon (M2/M3/M4 Pro / Max). 16-32GB unified memory is comfortable for 7B-13B. The best all-in-one option. Quieter and more power-efficient.
NVIDIA GPU (4070-4090). 12-24GB VRAM. Very fast inference, but expensive and loud.
Server GPUs (A100, H100). Only if you have a real business workload. The price tag matches.
CPU-only. Possible for small models (Phi-3, Gemma-2-2B), but slow.

What NOT to buy

Old GPUs with 8GB VRAM hoping to run 70B models — it won't fit.
Professional cards you don't actually need — pure overhead.

Memory calculator

| Model | FP16 (full) | Q4 (quantized) | | --- | --- | --- | | 7B | ~14GB | ~4-5GB | | 13B | ~26GB | ~7-8GB | | 30B | ~60GB | ~17-20GB | | 70B | ~140GB | ~40-45GB |

The principle

Buy hardware for the tasks you actually have, not "for the future". If the tasks don't pan out, the cloud is cheaper than sitting on an unused 4090.

Practical exercise

What to do after this lesson

Build a small table: your tasks → the models you'd expect to need → the memory required. Compare it against your current hardware.

Task grader

Build a small table: your tasks → the models you'd expect to need → the memory required. Compare it against your current hardware.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

Help me pick hardware for running a local LLM.

Budget: <…>
Tasks: <…>
The model sizes I need: <…>
Noise / heat / power consumption matter: <…>

Give me a recommendation: Apple Silicon / NVIDIA / CPU-only.

Prompt sandbox

Prompt

Common mistakes

What people get wrong

Buying a card with 8GB to run 30B — it just won't work.
Ignoring power draw and noise, then never actually using the machine.
Buying "top of the line for everything" where a mid-tier card would have done the job.

Pro tips

What works but no one documents

The Mac Mini M2 Pro with 32GB is an underrated workhorse.
A used RTX 3090 24GB is good value if the noise doesn't scare you.
The cloud (Vast.ai, Runpod) for one-off heavy jobs.

When to use

Systematic, recurring local workloads.

When not to use

Occasional one-off use — the cloud is cheaper.

Quiz — 2 questions

1.What is the main limit for a local LLM?

2.A good all-in-one option for most developers?

Answered: 0 of 2

← Ollama: the first 30 minutes Real-world scenarios →

Discussion

No comments yet. Be the first!