Intelligence index
62/ 100
vs all models23th pctile
Composite of MMLU, GPQA, MATH & HumanEval
Speed
55tok/s
vs all models14th pctile
Median across providers, steady state
Blended price
$0.40/ 1M tokens
vs all models77th pctile
3:1 input:output blend
At a glance
- Context window
- 128k tokens
- Max output
- 8k tokens
- Input price
- $0.40 / 1M tokens
- Output price
- $0.40 / 1M tokens
- Time to first token
- 0.5s
- Input modalities
- text
- Output modalities
- text
- License
- Open source
- Provider
- Alibaba
Benchmark scores
Public scores from each provider; bars compare this model against the leader in each benchmark.
MMLU
General knowledge across 57 subjects
85.3
leader: 91.8
MMLU Pro
Harder MMLU successor with more reasoning
71.1
leader: 80.0
GPQA
Graduate-level science Q&A
49.0
leader: 78.0
MATH
Competition mathematics
83.1
leader: 94.8
HumanEval
Python code generation pass@1
86.6
leader: 95.8
Strengths
- Open weights
- Strong on Chinese
- Great on math
Weaknesses
- Less English fine-tuning data than Llama
Best for
- Self-hosting
- Chinese / multilingual apps
Models you should also evaluate
DeepSeek
DeepSeek R1
Open-weights reasoning model that matches o1 at 1/25 the price.
73 intel60 tok/s$0.96 /1M
DeepSeek
DeepSeek V3
Frontier-class quality at fast-tier prices — and open weights.
67 intel90 tok/s$0.48 /1M
Meta
Llama 3.3 70B
Open-weights 70B that matches GPT-4o on most benchmarks.
66 intel200 tok/s$0.27 /1M
Qwen 2.5 72B vs… popular head-to-heads
One-click matchups against the models people compare Qwen 2.5 72B with most.
Qwen 2.5 72B — frequently asked questions
Qwen 2.5 72B is a large language model from Alibaba, released on 19 September 2024. Apache 2.0 open weights — strong multilingual + math.
Need help choosing between models?
Compare every option in one sortable table — intelligence, speed and price on a single page.