Question 1

What is the best local LLM for a Apple M2 Ultra (128GB) in 2026?

Accepted Answer

Qwen3 Coder 30B A3B is our top overall pick on the Apple M2 Ultra (128GB): Purpose-built agentic coder. Best local fill-in-the-middle and tool-calling under 70B; useless at small talk.

Question 2

How many local LLMs fit in 96 GB of VRAM?

Accepted Answer

At Q4_K_M quantization and 32K context, 25 of our 30 tracked models fit entirely in the Apple M2 Ultra (128GB)'s 96 GB of VRAM, and 0 more MoE models run via expert offload with enough system RAM.

Question 3

Can a Apple M2 Ultra (128GB) run a 70B model like Llama 3.3?

Accepted Answer

Yes — Llama 3.3 70B runs on the Apple M2 Ultra (128GB) as "Fits on GPU" at Q8_0, around 7 tokens/sec.

Question 4

Can a Apple M2 Ultra (128GB) run DeepSeek R1?

Accepted Answer

Not the full 671B model — its Q2_K weights alone exceed 200 GB. The R1-Distill-Qwen 14B/32B models are the practical local alternative on this card.

Question 5

How much VRAM do I need for 32K context?

Accepted Answer

The KV cache is separate from the weights and grows linearly with context. For a typical 8-14B dense model at 32K and f16 KV, budget 2-4 GB extra on top of the weights; MLA models like DeepSeek R1 need far less, and quantized KV (q8_0) halves it.

Best local LLMs for the Apple M2 Ultra (128GB) (2026)

Fit grid by context length

Top pick per use case

Qwen3 Coder 30B A3B Q8_0

Rocinante 12B Q8_0

Qwen3.5 27B Q8_0

Qwen3.5 35B A3B Q8_0

Gemma 3 27B Q8_0

Almost fits

Frequently asked questions

What is the best local LLM for a Apple M2 Ultra (128GB) in 2026?

How many local LLMs fit in 96 GB of VRAM?

Can a Apple M2 Ultra (128GB) run a 70B model like Llama 3.3?

Can a Apple M2 Ultra (128GB) run DeepSeek R1?

How much VRAM do I need for 32K context?

Related guides

Model	8K	16K	32K	64K	128K
Llama 3.1 8B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 9B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 4B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 3 4B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Mistral Nemo 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Phi 4 14B	Q8_0	Q8_0	—	—	—
Gemma 4 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3 14B	Q8_0	Q8_0	Q8_0	—	—
Gemma 3 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
DeepSeek R1 Distill Qwen 14B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
GPT OSS 20B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Rocinante 12B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 27B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3.5 35B A3B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Qwen3 Coder 30B A3B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 3 27B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Gemma 4 26B A4B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Mistral Small 3.2 24B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
DeepSeek R1 Distill Qwen 32B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Cydonia 24B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
Llama 3.3 70B	Q8_0	Q8_0	Q8_0	Q8_0	Q5_K_M
Llama 4 Scout	Q6_K	Q6_K	Q6_K	Q5_K_M	Q4_K_M
GPT OSS 120B	Q8_0	Q8_0	Q8_0	Q8_0	Q8_0
GLM 4.5 Air	Q6_K	Q5_K_M	Q5_K_M	Q5_K_M	Q4_K_M
Qwen3.5 122B A10B	Q5_K_M	Q5_K_M	Q5_K_M	Q5_K_M	Q4_K_M
Qwen3.5 397B A17B	—	—	—	—	—
DeepSeek R1	—	—	—	—	—
Mistral Large 3	—	—	—	—	—
Kimi K2.5	—	—	—	—	—
Anubis 70B	Q8_0	Q8_0	Q8_0	Q8_0	Q5_K_M