Best local LLMs for the AMD Radeon RX 7900 XT (2026)
The AMD Radeon RX 7900 XT has 20 GB of VRAM and 800 GB/s of memory bandwidth. That fits 11 of our 30 tracked models entirely on the GPU at Q4_K_M and 32K context, and 6 more via MoE expert offload. Every figure below is computed from weights + KV cache + overhead, not guessed. Open this GPU in the calculator →
All figures assume an f16 KV cache, a 0.6 GB display reserve on the GPU, and 64 GB of DDR5 system RAM for the offload tiers. Tune these in the calculator.
Fit grid by context length
| Model | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|
| Llama 3.1 8B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | IQ4_XS |
| Qwen3.5 9B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | IQ4_XS |
| Qwen3.5 4B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | IQ4_XS |
| Gemma 3 4B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q8_0 |
| Mistral Nemo 12B | Q8_0 | Q8_0 | Q8_0 | Q4_K_M | Q8_0 |
| Phi 4 14B | Q8_0 | Q8_0 | — | — | — |
| Gemma 4 12B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q5_K_M |
| Qwen3 14B | Q8_0 | Q8_0 | Q6_K | — | — |
| Gemma 3 12B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q5_K_M |
| DeepSeek R1 Distill Qwen 14B | Q8_0 | Q8_0 | Q6_K | Q4_K_M | Q8_0 |
| GPT OSS 20B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q8_0 |
| Rocinante 12B | Q8_0 | Q8_0 | Q8_0 | Q4_K_M | Q8_0 |
| Qwen3.5 27B | Q4_K_M | IQ4_XS | IQ4_XS | IQ4_XS | Q5_K_M |
| Qwen3.5 35B A3B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q8_0 |
| Qwen3 Coder 30B A3B | Q4_K_M | IQ4_XS | Q8_0 | Q8_0 | Q8_0 |
| Gemma 3 27B | Q4_K_M | Q4_K_M | IQ4_XS | IQ4_XS | IQ4_XS |
| Gemma 4 26B A4B | Q4_K_M | IQ4_XS | Q8_0 | Q6_K | Q5_K_M |
| Mistral Small 3.2 24B | Q5_K_M | Q5_K_M | IQ4_XS | IQ4_XS | Q8_0 |
| DeepSeek R1 Distill Qwen 32B | Q4_K_M | Q4_K_M | Q4_K_M | Q4_K_M | Q4_K_M |
| Cydonia 24B | Q5_K_M | Q5_K_M | Q4_K_M | Q4_K_M | Q8_0 |
| Llama 3.3 70B | IQ4_XS | IQ4_XS | IQ4_XS | — | — |
| Llama 4 Scout | Q4_K_M | Q4_K_M | Q4_K_M | IQ4_XS | — |
| GPT OSS 120B | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q8_0 |
| GLM 4.5 Air | IQ4_XS | IQ4_XS | IQ4_XS | IQ4_XS | — |
| Qwen3.5 122B A10B | Q4_K_M | Q4_K_M | Q4_K_M | Q4_K_M | — |
| Qwen3.5 397B A17B | — | — | — | — | — |
| DeepSeek R1 | — | — | — | — | — |
| Mistral Large 3 | — | — | — | — | — |
| Kimi K2.5 | — | — | — | — | — |
| Anubis 70B | IQ4_XS | IQ4_XS | IQ4_XS | — | — |
Fits on GPUExpert offloadPartial offloadCPU only
Top pick per use case
Coding · 32K
Qwen3 Coder 30B A3B Q8_0
Expert offload · ≈ 14 tok/s
Purpose-built agentic coder. Best local fill-in-the-middle and tool-calling under 70B; useless at small talk.
Roleplay & writing · 16K
Rocinante 12B Q8_0
Fits on GPU · ≈ 40 tok/s
The budget roleplay king. Lowest slop-per-token of anything under 24 GB; the community keeps it alive for a reason.
Summarization · 32K
Gemma 3 27B IQ4_XS
Fits on GPU · ≈ 35 tok/s
Sliding-window attention keeps long context cheap, and the vision stack still beats most of 2026's newcomers.
RAG & documents · 16K
Qwen3.5 35B A3B Q8_0
Expert offload · ≈ 13 tok/s
The meta pick, full stop. Near-dense-30B quality at 3B-active speed, and expert offload puts it on 8 GB cards.
Vision / image input · 16K
Gemma 3 27B Q4_K_M
Fits on GPU · ≈ 31 tok/s
Sliding-window attention keeps long context cheap, and the vision stack still beats most of 2026's newcomers.
Almost fits
These models can't run well on 20 GB at 32K: Phi 4 14B, Qwen3.5 27B, DeepSeek R1 Distill Qwen 32B, Cydonia 24B, Llama 3.3 70B, Qwen3.5 122B A10B and 5 more.
What an upgrade unlocks
Stepping up to a Apple M2 Ultra (128GB) (96 GB) unlocks 6 more models on GPU or expert offload at 32K, including Qwen3.5 27B, DeepSeek R1 Distill Qwen 32B, Cydonia 24B.
Frequently asked questions
What is the best local LLM for a AMD Radeon RX 7900 XT in 2026?
Qwen3 Coder 30B A3B is our top overall pick on the AMD Radeon RX 7900 XT: Purpose-built agentic coder. Best local fill-in-the-middle and tool-calling under 70B; useless at small talk.
How many local LLMs fit in 20 GB of VRAM?
At Q4_K_M quantization and 32K context, 11 of our 30 tracked models fit entirely in the AMD Radeon RX 7900 XT's 20 GB of VRAM, and 6 more MoE models run via expert offload with enough system RAM.
Can a AMD Radeon RX 7900 XT run a 70B model like Llama 3.3?
Yes — Llama 3.3 70B runs on the AMD Radeon RX 7900 XT as "Partial offload" at IQ4_XS.
Can a AMD Radeon RX 7900 XT run DeepSeek R1?
Not the full 671B model — its Q2_K weights alone exceed 200 GB. The R1-Distill-Qwen 14B/32B models are the practical local alternative on this card.
How much VRAM do I need for 32K context?
The KV cache is separate from the weights and grows linearly with context. For a typical 8-14B dense model at 32K and f16 KV, budget 2-4 GB extra on top of the weights; MLA models like DeepSeek R1 need far less, and quantized KV (q8_0) halves it.