Kokoro-FastAPI Setup Guide

What is Kokoro-FastAPI?

Kokoro-FastAPI is a project that provides the high-quality, lightweight TTS (Text-to-Speech) model "Kokoro" as an easy-to-use Web API.
A key feature is not just reading text, but the ability to finely control pronunciation and rhythm using Markdown-like syntax.

Key Features and Use Cases

Using the control tags below allows for natural, human-like speech and precise reading of technical terminology.

Custom Pronunciation: Use [Word](/ipa/) to strictly define pronunciation using IPA (International Phonetic Alphabet).
Pause Control: Insert [pause:seconds] tags to create silence of any duration.
※ Note: While the Web UI for the base model, Kokoro-82M on Hugging Face Spaces by hexgrad, supports IPA phonemes, it does not natively support pause control. Kokoro-FastAPI likely implements this functionality independently.

Example text with control tags:

The gNodeB utilized [MIMO](/maɪmoʊ/) techniques to achieve high throughput for enhanced mobile broadband services. [pause:0.5s] A 5G network allows operators to optimize a [CORESET](/ˈkɔːɹˌsɛt/) configuration based on available bandwidth and traffic loads.

🔊 Play Sample:

Automation Workflow Integration (n8n)

This API truly shines when combined with no-code tools like n8n or AI agents.
For instance, using an AI Agent node, you can automate complex processes to build a sophisticated voice generation pipeline:

Dictionary-based auto-correction: Automatically insert pronunciation tags for technical terms (e.g., MIMO → [maɪmoʊ]) based on a predefined dictionary.
Context-aware performance: The AI analyzes the text context and inserts "pause" tags at appropriate moments for natural rhythm.
n8n Voice Generation Pipeline example:

Setup Guide: Prerequisites

This guide explains how to run "Kokoro-FastAPI" on a Windows 11 environment.

1. Prerequisites

Windows 11 Home does not include the necessary tools to run Linux by default. First, let's set up the foundation.

① Enable Virtualization (BIOS/UEFI)

This is essential for running WSL2 and Docker.

Restart your PC and enter the BIOS/UEFI settings (usually by pressing Del, F2, or F10 during boot).
Find the CPU settings and enable "Virtualization", "SVM Mode" (AMD), or "Intel VT-x" (Intel).
Save changes and restart.

② Install WSL2

This allows you to run Linux directly on Windows.

Right-click the Start button and open "Terminal (Admin)" or "PowerShell (Admin)".
Enter the following command and press Enter:

wsl --install

Once installed, restart your PC as instructed.
After the reboot, the "Ubuntu" setup window will open automatically. Set your username and password.

③ Install NVIDIA Drivers (GPU only)

Required only if you want to accelerate performance using an NVIDIA GPU (GeForce, etc.).

Download and install the latest drivers from the official NVIDIA website.
※ No special configuration is needed for Docker to use the GPU, but keeping drivers updated (v510+ recommended) helps prevent issues.

④ Install Docker Desktop

The tool for managing containers (application execution environments).

Download the installer from the official Docker website and run it.
During installation, ensure "Use WSL 2 instead of Hyper-V" is checked (usually on by default).
After installation, launch Docker Desktop and confirm there are no errors.
Check Settings: Go to Docker Desktop Settings (gear icon) > Resources > WSL Integration, and ensure the switch for Ubuntu is turned ON.

⑤ Install Git (Option 2 only)

Required to download the source code.

Download "Git for Windows" from git-scm.com and install it with default settings.

Choosing a Setup Method (Option 1 vs Option 2)

Choose the method that best fits your needs.

Feature	Option 1: Use Pre-built Image (Recommended)	Option 2: Build from Source
Difficulty	★☆☆ (Easy)	★★☆ (Intermediate)
Best For	Quick start, users who want minimal hassle.	Developers, users who want to modify the code.
Process	Start with a single command.	Clone via Git and build locally.
Data Persistence	Model data is lost if the container is removed (requires re-download).	Easier to persist data via config files.

Step-by-Step: Option 1 (Start Immediately with Pre-built Image)

The easiest method. Run this in Command Prompt or PowerShell.

Install and Run

【CPU Only】

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest

【With NVIDIA GPU】

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest

First Run: The image download (several GB) will start automatically.
Confirmation: Once logs appear and the server starts, access http://localhost:8880 in your browser.
Stop: Press Ctrl + C in the terminal.
Web UI:

Update Procedure

If new features are added, update to the latest version using these steps:

Pull the latest image:

# For CPU
docker pull ghcr.io/remsky/kokoro-fastapi-cpu:latest

# For GPU
docker pull ghcr.io/remsky/kokoro-fastapi-gpu:latest

Re-run: Execute the same docker run command used above to start the updated version.

Step-by-Step: Option 2 (Build from Source)

This method involves downloading the code to your local machine using Git.

Installation

Clone (Download) the repository:

git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI

Navigate to the directory: Go to the folder matching your hardware.
- For CPU: cd docker/cpu
- For GPU: cd docker/gpu

Run

Build and start the container with the following command:

docker compose up --build

The initial build and model download will take some time.
Once started, access http://localhost:8880.
Background Mode: To run in the background without showing logs, use docker compose up -d.

Update Procedure

Get latest code: Return to the project root folder (Kokoro-FastAPI) and run:

git pull origin main

Rebuild and Launch: Navigate back to docker/cpu or docker/gpu and run:

docker compose up --build

The --build flag ensures the container is rebuilt with the latest code.

Kokoro TTS API Verification & Usage Tips

API Documentation:
After startup, visit http://localhost:8880/docs to see the Swagger UI. You can test voice generation directly using the "Try it out" button.
Swagger UI:
Troubleshooting:
- Model Download:
  If model files are missing at startup, they will download automatically. This may take a few minutes.
- If you encounter errors on the GPU version, run nvidia-smi in Ubuntu on WSL to verify the GPU is recognized.
- If you see a Port already allocated error, check if another app is using port 8880, or change the left-side port number in the command (e.g., -p 9000:8880).

n8n Integration Example (HTTP Request Node)

Here is an example configuration for calling Kokoro-FastAPI from n8n.
When communicating between Docker environments, typically use host.docker.internal as the hostname.

Method: POST
URL: http://host.docker.internal:8880/v1/audio/speech (※ Port 8880 is based on this guide. Adjust if you changed your port settings)
Authentication: None
Body Content Type: Raw (application/json)

JSON Body:

{
  "model": "kokoro",
  "input": "{{ $json.output }}",
  "voice": "af_heart",
  "response_format": "mp3",
  "download_format": "mp3",
  "speed": 1,
  "stream": true,
  "return_download_link": false,
  "lang_code": "a",
  "volume_multiplier": 1,
  "normalization_options": {
    "normalize": true,
    "unit_normalization": false,
    "url_normalization": true,
    "email_normalization": true,
    "optional_pluralization_normalization": true,
    "phone_normalization": true,
    "replace_remaining_symbols": true
  }
}

Reference: n8n HTTP Request Node configuration:

Table of Contents:

What is Kokoro-FastAPI?

Key Features and Use Cases

Automation Workflow Integration (n8n)

Setup Guide: Prerequisites

1. Prerequisites

① Enable Virtualization (BIOS/UEFI)

② Install WSL2

③ Install NVIDIA Drivers (GPU only)

④ Install Docker Desktop

⑤ Install Git (Option 2 only)

Choosing a Setup Method (Option 1 vs Option 2)

Step-by-Step: Option 1 (Start Immediately with Pre-built Image)

Install and Run

Update Procedure

Step-by-Step: Option 2 (Build from Source)

Installation

Run

Update Procedure

Kokoro TTS API Verification & Usage Tips

n8n Integration Example (HTTP Request Node)