Chatterbox TTS — Voice Cloning
This tutorial is a community contribution and is not supported by the Open WebUI team. It serves only as a demonstration on how to customize Open WebUI for your specific use case. Want to contribute? Check out the contributing tutorial.
What is Chatterbox TTS API?
Chatterbox TTS API is an API wrapper that allows for voice cloning and text-to-speech, serving as a direct substitute for the OpenAI Speech API endpoint.
Key Features
- Zero-shot voice cloning — only ~10 seconds of any voice sample needed
- Outperforms ElevenLabs
- Watermarked outputs for responsible voice cloning
- 0.5B Llama backbone
- Custom Voice Library management
- Streaming support for fast generation
- Advanced memory management and automatic cleanup
- Optional frontend for easy management and usage
Hardware Recommendations
- Memory: 4GB minimum, 8GB+ recommended
- GPU: CUDA (Nvidia), Apple M-series (MPS)
- CPU: Works but slower — GPU recommended for production
Chatterbox can use a good deal of memory and has hardware requirements that might be higher than you're used to with other local TTS solutions. If you have trouble meeting the requirements, you might find OpenAI Edge TTS or Kokoro-FastAPI to be suitable replacements.
⚡️ Quick start
🐍 Using Python
Option A: Using uv (Recommended - Faster & Better Dependencies)
# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies with uv (automatically creates venv)
uv sync
# Copy and customize environment variables
cp .env.example .env
# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py
💡 Why uv? Users report better compatibility with
chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide →
Option B: Using pip (Traditional)
# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Setup environment — using Python 3.11
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy and customize environment variables
cp .env.example .env
# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3
# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py
Ran into issues? Check the troubleshooting section
🐳 Docker (Recommended)
# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Use Docker-optimized environment variables
cp .env.example.docker .env # Docker-specific paths, ready to use
# Or: cp .env.example .env # Local development paths, needs customization
# Choose your deployment method:
# API Only (default)
docker compose -f docker/docker-compose.yml up -d # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d # CPU-only
# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d # uv + GPU + Frontend
# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f
# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello from Chatterbox TTS!"}' \
--output test.wav
🚀 Running with the Frontend Interface
Setting up Open WebUI to use Chatterbox TTS API
We recommend running with the frontend interface so you can upload the audio files for the voices you'd like to use before configuring Open WebUI's settings. If started correctly (see guide above), you can visit http://localhost:4321 to access the frontend.
To use Chatterbox TTS API with Open WebUI, follow these steps:
- Open the Admin Panel and go to
Settings->Audio - Set your TTS Settings to match the following:
-
- Text-to-Speech Engine: OpenAI
- API Base URL:
http://localhost:4123/v1# alternatively, tryhttp://host.docker.internal:4123/v1 - API Key:
none - TTS Model:
tts-1ortts-1-hd - TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
- Response splitting:
Paragraphs
The default API key is the string none (no API key required)
Please ⭐️ star the repo on GitHub to support development
Need help?
Chatterbox can be challenging to get running the first time, and you may want to try different install options if you run into issues with a particular one.
For more information on chatterbox-tts-api, you can visit the GitHub repo
- 📖 Documentation: See API Documentation and Docker Guide
- 💬 Discord: Join the Discord for this project
Troubleshooting
Memory Requirements
Chatterbox has higher memory requirements than other TTS solutions:
- Minimum: 4GB RAM
- Recommended: 8GB+ RAM
- GPU: NVIDIA CUDA or Apple M-series (MPS) recommended
If you experience memory issues, consider using a lighter alternative like OpenAI Edge TTS or Kokoro-FastAPI.
Docker Networking
If Open WebUI can't connect to Chatterbox:
- Docker Desktop: Use
http://host.docker.internal:4123/v1 - Docker Compose: Use
http://chatterbox-tts-api:4123/v1 - Linux: Use your host machine's IP address
First-Time Startup
The first TTS request takes significantly longer as the model loads. Check logs with:
docker logs chatterbox-tts-api -f
For more troubleshooting tips, see the Audio Troubleshooting Guide.