What is the “intelligence index”?

A 0–100 composite of public benchmarks (MMLU, MMLU Pro, GPQA, MATH, HumanEval). Higher is better.

How often is this updated?

Monthly. Each model has a lastUpdated stamp. Open an issue if you spot something stale.

AI Models · Text-to-Speech

Best Text-to-Speech AI Models

Generate natural-sounding speech audio from text.

Models

Providers

Text-to-Speech AI Models

3 models matched. Click any column to sort.

				Notes
Cartesia Sonic	Cartesia	Proprietary	$0.07	Sub-100ms time-to-first-byte — built for realtime voice agents.
OpenAI TTS (HD)	OpenAI	Proprietary	$0.03	Cheap, natural-sounding TTS bundled with the OpenAI API.
ElevenLabs Multilingual v2	ElevenLabs	Proprietary	$0.18	The industry standard for expressive cloned voices.

Showing 3 of 3 models. Click any column header to sort. Prices are USD per 1M tokens unless noted otherwise. Estimates marked with *.

Browse AI Models by category

Drill into a slice of the catalog — open-source models, video models, or all models from one provider.

By License

Open Source

Models with publicly-released weights you can self-host or fine-tune.

Proprietary

Closed-weights API-only models from major labs.

By Purpose

Frontier

The most capable models from each major lab.

Reasoning

Long chain-of-thought models built for hard math, code, and planning.

Fast & Cheap

Production-grade workhorses with the best speed and cost.

Code

Models specialised for software engineering tasks.

Multimodal

Models that natively understand text, images, and beyond.

By Modality

Text-to-Image

Generate still images from a text prompt.

Text-to-Video

Generate short video clips from a text prompt.

By Provider

OpenAI

All models from OpenAI — GPT, o-series and beyond.

Anthropic

Claude family of models from Anthropic.

Google

Gemini family of models from Google DeepMind.

Mistral

Models from Mistral AI — EU-hosted, multilingual.

DeepSeek

Frontier-class open-weights models from DeepSeek.

xAI

Grok models from xAI.

Alibaba

Qwen family of open-weights models from Alibaba.

Midjourney

Midjourney text-to-image models.

Black Forest Labs

FLUX image-generation models from Black Forest Labs.

Stability AI

Stable Diffusion image models from Stability AI.

Runway

Gen-series text-to-video models from Runway.

Kuaishou

Kling text-to-video models from Kuaishou.

ElevenLabs

Text-to-speech voice models from ElevenLabs.

Cartesia

Sonic low-latency text-to-speech models from Cartesia.

Frequently asked questions

ElevenLabs leads on voice cloning and emotional range. OpenAI TTS is the cheapest with good quality. Cartesia Sonic has by far the lowest latency — pick it if you’re building realtime voice agents.

Explore the full catalog

See every AI model in one place — intelligence, speed and price on a single sortable table.

All AI models LLM pricing calculator