Hardware: what to buy and why
A no-nonsense guide to picking a GPU or Mac for running models locally.
The one rule
VRAM (or unified memory on Apple) is the hard limit. The model has to fit entirely in memory; if it spills over, everything grinds to a crawl.
Options by budget
- Apple Silicon (M2/M3/M4 Pro / Max). 16-32GB unified memory is comfortable for 7B-13B. The best all-in-one option. Quieter and more power-efficient.
- NVIDIA GPU (4070-4090). 12-24GB VRAM. Very fast inference, but expensive and loud.
- Server GPUs (A100, H100). Only if you have a real business workload. The price tag matches.
- CPU-only. Possible for small models (Phi-3, Gemma-2-2B), but slow.
What NOT to buy
- Old GPUs with 8GB VRAM hoping to run 70B models — it won't fit.
- Professional cards you don't actually need — pure overhead.
Memory calculator
| Model | FP16 (full) | Q4 (quantized) | | --- | --- | --- | | 7B | ~14GB | ~4-5GB | | 13B | ~26GB | ~7-8GB | | 30B | ~60GB | ~17-20GB | | 70B | ~140GB | ~40-45GB |
The principle
Buy hardware for the tasks you actually have, not "for the future". If the tasks don't pan out, the cloud is cheaper than sitting on an unused 4090.
Build a small table: your tasks → the models you'd expect to need → the memory required. Compare it against your current hardware.
Copy and adapt to your context. Text in angle brackets should be replaced.
Help me pick hardware for running a local LLM. Budget: <…> Tasks: <…> The model sizes I need: <…> Noise / heat / power consumption matter: <…> Give me a recommendation: Apple Silicon / NVIDIA / CPU-only.
- Buying a card with 8GB to run 30B — it just won't work.
- Ignoring power draw and noise, then never actually using the machine.
- Buying "top of the line for everything" where a mid-tier card would have done the job.
- The Mac Mini M2 Pro with 32GB is an underrated workhorse.
- A used RTX 3090 24GB is good value if the noise doesn't scare you.
- The cloud (Vast.ai, Runpod) for one-off heavy jobs.
Systematic, recurring local workloads.
Occasional one-off use — the cloud is cheaper.