by THUDM · Agent Tool · ★ 256
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
VisualAgentBench (VAB) 🌐 Website 🗂️ VAB Training (ModelScope) VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents VisualAgentBench (VAB) is the first benchmark designed to systematically evaluate and develop large multi models (LMMs) as visual foundation agents, which comprises 5 distinct environments across 3 types of representative visual agent tasks (Embodied, GUI, and Visual Design) https://github.com/user-attachments/assets/4a1a5980-48f9-4a70-a900-e5f58ded69b4 VAB-OmniGibson (Embodied) VAB-Minecraft (Embodied) VAB-Mobile (GUI) VAB-WebArena-Lite (GUI, based on WebArena and VisualWebArena) VAB-CSS (Visual Design) Compared to its predecessor AgentBench, VAB highlights visual inputs and the enabling of Foundation Agent capability development with training open LLMs/LMMs on trajectories. Table of Contents Quick Start Dataset Summary Leaderboard Quick Start Acknowledgement [Cita
| Stars | 256 |
| Forks | 9 |
| Language | Python |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 39.2/100 |
| Open Issues | 16 |
| Last Updated | 2025-04-24 |
| Created | 2024-08-08 |
| Platforms | python |
| Est. Tokens | ~378k |
These tools work well together with VisualAgentBench for enhanced workflows:
Looking for a VisualAgentBench alternative? If you're comparing VisualAgentBench with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Low code tool to rapidly build and coordinate multi-agent teams
A curated list of OpenClaw resources, tools, skills, tutorials & articles. OpenClaw (formerly Moltbot / Clawdb
Delegate tasks to Codex and Gemini directly from within Claude Code.
Shell and coding agent on mcp clients
Structured deep research skill for Claude Code/Open Code/Codex with human-in-the-loop control
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allow
Explore other popular agent tool tools:
VisualAgentBench is Towards Large Multimodal Models as Visual Foundation Agents. It is categorized as a Agent Tool with 256 GitHub stars.
VisualAgentBench is primarily written in Python. It covers topics such as gpt, llm-agent, multimodal-large-language-models.
You can find installation instructions and usage details in the VisualAgentBench GitHub repository at github.com/THUDM/VisualAgentBench. The project has 256 stars and 9 forks, indicating an active community.
VisualAgentBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to VisualAgentBench on Agent Skills Hub include tribe, awesome-openclaw, claude-delegator. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.