Intelligence index
44/ 100
vs all models0th pctile
Composite of MMLU, GPQA, MATH & HumanEval
Speed
750tok/s
vs all models95th pctile
Median across providers, steady state
Blended price
$0.06/ 1M tokens
vs all models100th pctile
3:1 input:output blend
At a glance
- Context window
- 128k tokens
- Max output
- 4k tokens
- Input price
- $0.06 / 1M tokens
- Output price
- $0.06 / 1M tokens
- Time to first token
- 0.2s
- Input modalities
- text
- Output modalities
- text
- License
- Open source
- Provider
- Meta
Benchmark scores
Public scores from each provider; bars compare this model against the leader in each benchmark.
MMLU
General knowledge across 57 subjects
73.0
leader: 91.8
MMLU Pro
Harder MMLU successor with more reasoning
48.3
leader: 80.0
GPQA
Graduate-level science Q&A
30.4
leader: 78.0
MATH
Competition mathematics
51.9
leader: 94.8
HumanEval
Python code generation pass@1
72.6
leader: 95.8
Strengths
- Extremely fast
- Cheapest production-grade model
- Open weights
Weaknesses
- Weak on hard reasoning
Best for
- Real-time UX
- Classification
- Edge deployment
Models you should also evaluate
Meta
Llama 3.3 70B
Open-weights 70B that matches GPT-4o on most benchmarks.
66 intel200 tok/s$0.27 /1M
Meta
Llama 3.1 405B
The original "open-source GPT-4" — largest publicly-released weights.
64 intel32 tok/s$2.70 /1M
Mistral
Mistral Small 3
Excellent latency / cost ratio — open weights, production-ready.
52 intel150 tok/s$0.30 /1M
Llama 3.1 8B — frequently asked questions
Llama 3.1 8B is a large language model from Meta, released on 23 July 2024. Tiny but capable; serves at >500 tok/s on Groq.
Need help choosing between models?
Compare every option in one sortable table — intelligence, speed and price on a single page.