Intelligence index
66/ 100
vs all models36th pctile
Composite of MMLU, GPQA, MATH & HumanEval
Speed
200tok/s
vs all models86th pctile
Median across providers, steady state
Blended price
$0.27/ 1M tokens
vs all models86th pctile
3:1 input:output blend
At a glance
- Context window
- 128k tokens
- Max output
- 4k tokens
- Input price
- $0.23 / 1M tokens
- Output price
- $0.40 / 1M tokens
- Time to first token
- 0.4s
- Input modalities
- text
- Output modalities
- text
- License
- Open source
- Provider
- Meta
Benchmark scores
Public scores from each provider; bars compare this model against the leader in each benchmark.
MMLU
General knowledge across 57 subjects
86.0
leader: 91.8
MMLU Pro
Harder MMLU successor with more reasoning
68.9
leader: 80.0
GPQA
Graduate-level science Q&A
50.5
leader: 78.0
MATH
Competition mathematics
77.0
leader: 94.8
HumanEval
Python code generation pass@1
88.4
leader: 95.8
Strengths
- Open weights
- Fast on Groq / Cerebras
- Cheap
Weaknesses
- No vision
- Smaller context than peers
Best for
- Self-hosting
- EU data residency
- Cost-sensitive workloads
Models you should also evaluate
Meta
Llama 3.1 405B
The original "open-source GPT-4" — largest publicly-released weights.
64 intel32 tok/s$2.70 /1M
DeepSeek
DeepSeek R1
Open-weights reasoning model that matches o1 at 1/25 the price.
73 intel60 tok/s$0.96 /1M
DeepSeek
DeepSeek V3
Frontier-class quality at fast-tier prices — and open weights.
67 intel90 tok/s$0.48 /1M
Llama 3.3 70B vs… popular head-to-heads
One-click matchups against the models people compare Llama 3.3 70B with most.
Llama 3.3 70B — frequently asked questions
Llama 3.3 70B is a large language model from Meta, released on 6 December 2024. Open-weights 70B that matches GPT-4o on most benchmarks.
Need help choosing between models?
Compare every option in one sortable table — intelligence, speed and price on a single page.