Real-time voice AI agents — speech-in/speech-out conversational systems with Whisper, ElevenLabs, OpenAI Realtime API, and Anthropic voice tools.
Voice Agents tools are AI-powered software designed to help developers and teams tackle voice agents-related tasks more efficiently. These tools are typically published as open-source projects on GitHub and can be integrated into existing workflows via MCP (Model Context Protocol), Claude Skills, or standalone agent frameworks. On Agent Skills Hub, we index 30 quality-scored voice agents tools across languages including Python, Rust, TypeScript.
In 2026, the AI agent ecosystem is maturing rapidly. Voice Agents tools can significantly boost development efficiency by automating repetitive tasks, reducing human error, and providing intelligent suggestions. The top 3 tools — jarvis, bolna, adk-rust — have earned an average of 4,061 GitHub stars, reflecting strong community validation. 26 of the listed tools come with clear open-source licenses, ensuring freedom to use and modify.
When choosing a voice agents tool, consider these factors: 1) Community activity — GitHub stars and recent commit frequency indicate reliability; 2) Integration method — check if it supports MCP, Claude, or your preferred agent framework; 3) Language compatibility — the most common language in this list is Python; 4) Quality score — Agent Skills Hub's composite score evaluates code quality, documentation completeness, and maintenance activity. Our recommendation: start with jarvis — it ranks highest in both star count and quality score.
A 100% private AI voice assistant that lives on your computer (works offline). Talk naturally as if Jarvis is a third person in the room, and get conversational responses. It remembers everything, knows location and time, can check the web, control Chrome, track nutrition, and more with support for unlimited MCPs / tools without context rot.
Rust Agent Development Kit (ADK-Rust): Build AI agents in Rust with modular components for models, tools, memory, realtime voice, and more. ADK-Rust is a flexible framework for developing AI agents with simplicity and power. Model-agnostic, deployment-agnostic, optimized for frontier AI models. Includes support for real-time voice agents.
💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.
A lightweight, powerful framework for multi-agent workflows and voice agents
AI-native video production toolkit for Claude Code
Unreal Engine plugin for LLM/GenAI models & MCP UE5 server. OpenAI GPT-5, Deepseek R1, Claude Opus/Sonnet, Gemini 3, Grok 4, Alibaba Qwen, Kimi, ElevenLabs TTS, Inworld, OpenRouter, Groq, GLM, Ollama, Local, Meshy, Tripo, Hunyuan3D, Rodin, fal, Dashscope, Seedream. NPC AI, agentic, chat, 3D gen, TTS, multimodal, image gen. UnrealMCP/UnrealClaude
Like the macOS say command, but with a modern voice.
```bash
brew install steipete/tap/sag # auto-taps steipete/tap
```
A powerful Rust library and CLI tool to unify and orchestrate multiple LLM, Agent and voice backends (OpenAI, Claude, Gemini, Ollama, ElevenLabs...) with a single, extensible API. Build, chain, evaluate, and serve complex multi-step AI workflows — including speech-to-text, text-to-speech, completions, vision, and reasoning.
AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.
Collection of agent skills for AI coding assistants
Swift SDK to stream ElevenLabs Voices
```swift
import ElevenLabsKit
let client = ElevenLabsTTSClient(apiKey: "<api-key>")
let request = ElevenLabsTTSRequest(
text: "Hello",
modelId: "eleven_v3",
outputFormat: "pcm_44100")
let stream = client.streamSynthesize(voiceId: "<voice-id>", request: request)
let sampleRate = TalkTTS
Automatically generate engaging AI podcasts from nothing but an episode title.
Open-source AI pipeline that turns any topic into a publish-ready YouTube/Instagram/TikTok Short — research, script, voiceover, visuals, music, captions, and assembly in one command.
GoHighLevel MCP Server — 520+ tools across 40 categories. Voice AI, Proposals, Contacts, Calendars, Conversations, Opportunities, Invoices, Payments, Workflows, Social Media, and more. MCP SDK 1.26, Streamable HTTP, tool annotations.
Real-time web cockpit for OpenClaw: voice conversations, agent automated kanban board, workspace/file control, sub-agent sessions, inline charts, and usage visibility.
Open-source real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory, tools, RAG, and optional digital-human video.
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
Speech-to-text input for Claude Code with live streaming dictation
Official one-stop shop for AI Agents and developers building with Telnyx.
AgentCall lets AI Agents join meetings with voice, video & screen-share to build together. Supports Google Meet, Teams, Zoom (Beta)
Persistent agents for Claude Code as a plugin, not a harness. Memory, personality, messaging across WhatsApp, Telegram, and Discord, plus a service mode for 24/7 runs. Imports from OpenClaw.
TalkiTo lets developers interact with AI systems through speech across multiple channels (terminal, API, phone). It can be used as both a command-line tool and a Python library.
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
| Tool | Stars | Language | License | Score |
|---|---|---|---|---|
| jarvis | ★ 845 | Python | — | 38 |
| bolna | ★ 649 | Python | MIT | 39 |
| adk-rust | ★ 333 | Rust | — | 42 |
| airi | ★ 39.4k | TypeScript | MIT | 48 |
| OpenMontage | ★ 3.5k | Python | AGPL-3.0 | 49 |
| openai-agents-js | ★ 3.1k | TypeScript | MIT | 46 |
| elevenlabs-mcp | ★ 1.4k | Python | MIT | 51 |
| claude-code-video-toolkit | ★ 1.2k | Python | MIT | 46 |
| UnrealGenAISupport | ★ 563 | C++ | MIT | 41 |
| sag | ★ 300 | Go | MIT | 48 |
| llm | ★ 339 | Rust | MIT | 38 |
| sdk | ★ 285 | TypeScript | Apache-2.0 | 40 |
| ai-skills | ★ 209 | Python | Apache-2.0 | 48 |
| ElevenLabsKit | ★ 103 | Swift | MIT | 43 |
| podcast-llm | ★ 142 | Python | — | 26 |
| OpenReels | ★ 64 | TypeScript | MIT | 34 |
| mcp-tts | ★ 54 | Go | MIT | 38 |
| Go-High-Level-MCP-2026-Complete | ★ 50 | TypeScript | — | 38 |
| voicemode | ★ 1.2k | Python | MIT | 41 |
| openclaw-nerve | ★ 821 | TypeScript | MIT | 44 |
| CyberVerse | ★ 701 | Python | GPL-3.0 | 47 |
| LLM-Agents-Ecosystem-Handbook | ★ 516 | Python | MIT | 51 |
| claude-stt | ★ 363 | Python | MIT | 39 |
| ai | ★ 173 | Shell | MIT | 38 |
| skills | ★ 164 | Shell | MIT | 37 |
| telnyx-skills | ★ 150 | Shell | MIT | 36 |
| agentcall | ★ 72 | Python | MIT | 44 |
| ClawCode | ★ 56 | TypeScript | MIT | 36 |
| talkito | ★ 54 | Python | AGPL-3.0 | 30 |
| unsloth | ★ 65.0k | Python | Apache-2.0 | 49 |
The top voice agents tools in 2026 are jarvis, bolna, adk-rust. Agent Skills Hub ranks 30 options by GitHub stars, quality score (6 dimensions including completeness, examples, and agent readiness), and recent activity. The list is rebuilt every 8 hours from live GitHub data.
jarvis (845 stars) is the most adopted choice for general voice agents workflows, written in Python. bolna (649 stars) is a strong alternative. Pick by your existing stack: match the language and runtime your team already uses to minimize integration cost. If unsure, start with jarvis — it has the deepest community and the most examples online.
Avoid pre-built voice agents tools when (1) your use case requires deep customization that the tool's plugin system doesn't support, (2) you have strict compliance requirements that ban third-party dependencies, (3) the tool's maintenance is inactive (last commit >6 months ago), or (4) your data volume is small enough that a 50-line custom script is cheaper than learning the tool. For most production workflows above 100 requests/day, the time savings from a maintained tool outweigh the customization loss.
Voice Agents focuses specifically on real-time voice ai agents — speech-in/speech-out conversational systems with whisper, elevenlabs, openai realtime api, and anthropic voice tools. Text-to-Speech & Voice is a related but distinct category — see https://agentskillshub.top/best/text-to-speech/ for those tools. The two often appear in the same agent pipeline but solve different problems: choose voice agents when your primary goal is the specific task, and text-to-speech & voice when the workflow is broader.
For most teams, yes. jarvis has 845 stars worth of community testing, handles edge cases you haven't thought of, and ships with documentation. Build your own only when (1) your requirements are deeply non-standard, (2) you have a security/compliance reason to avoid OSS dependencies, or (3) the maintenance burden is small enough (<200 lines of code) that you'll save time long-term. The break-even point is usually around 2-3 weeks of dev time saved.
Most voice agents tools listed are open source under permissive licenses (MIT, Apache 2.0). A handful offer paid managed/cloud versions on top of free self-hosted core. Always check the LICENSE file on each tool's GitHub repository before commercial use — some use AGPL or non-commercial restrictions that may not fit your deployment model.