Question 1

What is the best local LLM for a CPU only / integrated graphics in 2026?

Accepted Answer

Qwen3 Coder 30B A3B is our top overall pick on the CPU only / integrated graphics: Purpose-built agentic coder. Best local fill-in-the-middle and tool-calling under 70B; useless at small talk.

Question 2

How many local LLMs fit in 0 GB of VRAM?

Accepted Answer

At Q4_K_M quantization and 32K context, 0 of our 30 tracked models fit entirely in the CPU only / integrated graphics's 0 GB of VRAM, and 0 more MoE models run via expert offload with enough system RAM.

Question 3

Can a CPU only / integrated graphics run a 70B model like Llama 3.3?

Accepted Answer

Yes — Llama 3.3 70B runs on the CPU only / integrated graphics as "CPU only" at Q4_K_M, around 1 tokens/sec.

Question 4

Can a CPU only / integrated graphics run DeepSeek R1?

Accepted Answer

Not the full 671B model — its Q2_K weights alone exceed 200 GB. The R1-Distill-Qwen 14B/32B models are the practical local alternative on this card.

Question 5

How much VRAM do I need for 32K context?

Accepted Answer

The KV cache is separate from the weights and grows linearly with context. For a typical 8-14B dense model at 32K and f16 KV, budget 2-4 GB extra on top of the weights; MLA models like DeepSeek R1 need far less, and quantized KV (q8_0) halves it.

Model	8K	16K	32K	64K	128K
Llama 3.1 8B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 9B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 4B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 3 4B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Mistral Nemo 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Phi 4 14B	Q8_0	Q8_0	—	—	—
Gemma 4 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3 14B	Q8_0	Q8_0	Q8_0	—	—
Gemma 3 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
DeepSeek R1 Distill Qwen 14B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
GPT OSS 20B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Rocinante 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 27B	Q8_0	Q8_0	Q8_0	Q8_0	Q5_K_M
Qwen3.5 35B A3B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3 Coder 30B A3B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 3 27B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 4 26B A4B	Q8_0	Q8_0	Q8_0	Q8_0	Q5_K_M
Mistral Small 3.2 24B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
DeepSeek R1 Distill Qwen 32B	Q8_0	Q8_0	Q8_0	Q8_0	Q4_K_M
Cydonia 24B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Llama 3.3 70B	Q5_K_M	Q4_K_M	Q4_K_M	—	—
Llama 4 Scout	Q3_K_M	—	—	—	—
GPT OSS 120B	—	—	—	—	—
GLM 4.5 Air	—	—	—	—	—
Qwen3.5 122B A10B	—	—	—	—	—
Qwen3.5 397B A17B	—	—	—	—	—
DeepSeek R1	—	—	—	—	—
Mistral Large 3	—	—	—	—	—
Kimi K2.5	—	—	—	—	—
Anubis 70B	Q5_K_M	Q4_K_M	Q4_K_M	—	—

Best local LLMs for the CPU only / integrated graphics (2026)

Fit grid by context length

Top pick per use case

Qwen3 Coder 30B A3B Q8_0

Rocinante 12B Q8_0

Qwen3.5 27B Q8_0

Qwen3.5 35B A3B Q8_0

Gemma 3 27B Q8_0

Almost fits

What an upgrade unlocks

Frequently asked questions

What is the best local LLM for a CPU only / integrated graphics in 2026?

How many local LLMs fit in 0 GB of VRAM?

Can a CPU only / integrated graphics run a 70B model like Llama 3.3?

Can a CPU only / integrated graphics run DeepSeek R1?

How much VRAM do I need for 32K context?

Related guides