by IBM · Agent Tool · ★ 61
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
🔷 VAKRA: A Benchmark for Evaluating Multi-Hop, Multi-Source Tool-Calling in AI Agents VAKRA (eValuating API and Knowledge Retrieval Agents using multi-hop, multi-source dialogues) is a tool-grounded, executable benchmark designed to evaluate how well AI agents reason end-to-end in enterprise-like settings. Rather than testing isolated skills, VAKRA measures compositional reasoning across APIs and documents, using full execution traces to assess whether agents can reliably complete multi-step workflows, not just individual steps. VAKRA provides an executable environment where agents interact with over 8,000 locally hosted APIs backed by real databases spanning 62 domains, along with domain-aligned document collections. Resources: Leaderboard · Dataset · Blog Quick links: Requirements · Quick Start · Exploring Available Tools · Running Your Agent · Submit to Leaderboard What VAKRA Provides An executable benchmark environment with 8,000+ locally hosted APIs backed by real databases across 62 domains Domain-aligned document collections for retrieval-augmented, cross-source reasoning Tasks that require 3-7 step reasoning chains acros
| Stars | 61 |
| Forks | 4 |
| Language | Python |
| Category | Agent Tool |
| Quality Score | 59.841054885001/100 |
| Open Issues | 6 |
| Last Updated | 2026-05-21 |
| Created | 2026-02-25 |
| Platforms | python |
| Est. Tokens | ~665k |
These tools work well together with vakra for enhanced workflows:
Explore other popular agent tool tools:
vakra is A Benchmark for Evaluating Multi-Hop, Multi-Source Tool-Calling in AI Agents. It is categorized as a Agent Tool with 61 GitHub stars.
vakra is primarily written in Python.
You can find installation instructions and usage details in the vakra GitHub repository at github.com/IBM/vakra. The project has 61 stars and 4 forks, indicating an active community.