Chatterbox TTS — Voice Cloning

warning

This tutorial is a community contribution and is not supported by the Open WebUI team. It serves only as a demonstration on how to customize Open WebUI for your specific use case. Want to contribute? Check out the contributing tutorial.

What is `Chatterbox TTS API`?

Chatterbox TTS API is an API wrapper that allows for voice cloning and text-to-speech, serving as a direct substitute for the OpenAI Speech API endpoint.

Key Features

Zero-shot voice cloning — only ~10 seconds of any voice sample needed
Outperforms ElevenLabs
Watermarked outputs for responsible voice cloning
0.5B Llama backbone
Custom Voice Library management
Streaming support for fast generation
Advanced memory management and automatic cleanup
Optional frontend for easy management and usage

Hardware Recommendations

Memory: 4GB minimum, 8GB+ recommended
GPU: CUDA (Nvidia), Apple M-series (MPS)
CPU: Works but slower — GPU recommended for production

info

Chatterbox can use a good deal of memory and has hardware requirements that might be higher than you're used to with other local TTS solutions. If you have trouble meeting the requirements, you might find OpenAI Edge TTS or Kokoro-FastAPI to be suitable replacements.

⚡️ Quick start

🐍 Using Python

Option A: Using uv (Recommended - Faster & Better Dependencies)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv (automatically creates venv)
uv sync

# Copy and customize environment variables
cp .env.example .env

# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123

# Or use the main script
uv run main.py

💡 Why uv? Users report better compatibility with chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide →

Option B: Using pip (Traditional)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Setup environment — using Python 3.11
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and customize environment variables
cp .env.example .env

# Add your voice sample (or use the provided one)

# cp your-voice.mp3 voice-sample.mp3

# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123

# Or use the main script
python main.py

Ran into issues? Check the troubleshooting section

🐳 Docker (Recommended)

# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Use Docker-optimized environment variables
cp .env.example.docker .env  # Docker-specific paths, ready to use

# Or: cp .env.example .env    # Local development paths, needs customization

# Choose your deployment method:

# API Only (default)
docker compose -f docker/docker-compose.yml up -d             # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d          # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d         # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d      # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d         # CPU-only

# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d             # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d         # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d      # uv + GPU + Frontend

# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f

# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Chatterbox TTS!"}' \
  --output test.wav

🚀 Running with the Frontend Interface

Setting up Open WebUI to use `Chatterbox TTS API`

We recommend running with the frontend interface so you can upload the audio files for the voices you'd like to use before configuring Open WebUI's settings. If started correctly (see guide above), you can visit http://localhost:4321 to access the frontend.

To use Chatterbox TTS API with Open WebUI, follow these steps:

Open the Admin Panel and go to Settings -> Audio
Set your TTS Settings to match the following:
- Text-to-Speech Engine: OpenAI
- API Base URL: http://localhost:4123/v1 # alternatively, try http://host.docker.internal:4123/v1
- API Key: none
- TTS Model: tts-1 or tts-1-hd
- TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
- Response splitting: Paragraphs

info

The default API key is the string none (no API key required)

Screenshot of Open WebUI Admin Settings for Audio adding the correct endpoints for this project

Please ⭐️ star the repo on GitHub to support development

Need help?

Chatterbox can be challenging to get running the first time, and you may want to try different install options if you run into issues with a particular one.

For more information on chatterbox-tts-api, you can visit the GitHub repo

📖 Documentation: See API Documentation and Docker Guide
💬 Discord: Join the Discord for this project

Troubleshooting

Memory Requirements

Chatterbox has higher memory requirements than other TTS solutions:

Minimum: 4GB RAM
Recommended: 8GB+ RAM
GPU: NVIDIA CUDA or Apple M-series (MPS) recommended

If you experience memory issues, consider using a lighter alternative like OpenAI Edge TTS or Kokoro-FastAPI.

Docker Networking

If Open WebUI can't connect to Chatterbox:

Docker Desktop: Use http://host.docker.internal:4123/v1
Docker Compose: Use http://chatterbox-tts-api:4123/v1
Linux: Use your host machine's IP address

First-Time Startup

The first TTS request takes significantly longer as the model loads. Check logs with:

docker logs chatterbox-tts-api -f

For more troubleshooting tips, see the Audio Troubleshooting Guide.

Chatterbox TTS — Voice Cloning

What is `Chatterbox TTS API`?

Key Features

Hardware Recommendations

⚡️ Quick start

🐍 Using Python

Option A: Using uv (Recommended - Faster & Better Dependencies)

Option B: Using pip (Traditional)

🐳 Docker (Recommended)

With Docker Compose Profiles

Local Development

Build for Production

Port Configuration

Setting up Open WebUI to use `Chatterbox TTS API`

Please ⭐️ star the repo on GitHub to support development

Need help?

Troubleshooting

Memory Requirements

Docker Networking

First-Time Startup

What is Chatterbox TTS API?​

Key Features​

Hardware Recommendations​

⚡️ Quick start​

🐍 Using Python​

Option A: Using uv (Recommended - Faster & Better Dependencies)​

Option B: Using pip (Traditional)​

🐳 Docker (Recommended)​

With Docker Compose Profiles​

Local Development​

Build for Production​

Port Configuration​

Setting up Open WebUI to use Chatterbox TTS API​

Please ⭐️ star the repo on GitHub to support development​

Need help?​

Troubleshooting​

Memory Requirements​

Docker Networking​

First-Time Startup​

What is `Chatterbox TTS API`?

Key Features

Hardware Recommendations

⚡️ Quick start

🐍 Using Python

Option A: Using uv (Recommended - Faster & Better Dependencies)

Option B: Using pip (Traditional)

🐳 Docker (Recommended)

With Docker Compose Profiles

Local Development

Build for Production

Port Configuration

Setting up Open WebUI to use `Chatterbox TTS API`

Please ⭐️ star the repo on GitHub to support development

Need help?

Troubleshooting

Memory Requirements

Docker Networking

First-Time Startup