by Asaf-Yehudai · Agent Tool · ★ 89
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
Evaluation of LLM-based Agents: A Reading List Based on the Survey Paper: Survey on Evaluation of LLM-based Agents (arXiv 2025) Asaf Yehudai¹², Lilach Eden², Alan Li³, Guy Uziel², Yilun Zhao³, Roy Bar-Haim², Arman Cohan³, Michal Shmueli-Scheuer² ¹The Hebrew University of Jerusalem ³Yale University About This Repository This repository serves as a companion to the survey paper "Survey on Evaluation of LLM-based Agents". It organizes evaluation methodologies, benchmarks, and frameworks according to the structure presented in the paper, aiming to provide a comprehensive resource for researchers and practitioners in the field of LLM-based agents. The selection criteria focus on works discussed within the survey, covering: Fundamental Agent Capabilities: Planning, Tool Use, Self-Reflection, Memory. Application-Specific Domains: Web, Software Engineering, Scientific, Conversational Agents. Generalist Agent Evaluation. Evaluation Frameworks. Our goal is to map the rapidly evolving landscape of agent evaluation, highlight key trends, and identify current limitations as discussed in the sur
| Stars | 89 |
| Forks | 12 |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 47.0981958439832/100 |
| Last Updated | 2025-10-21 |
| Created | 2025-04-28 |
| Est. Tokens | ~3087k |
Looking for a LLM-Agent-Evaluation-Survey alternative? If you're comparing LLM-Agent-Evaluation-Survey with other agent tool tools, these 4 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Claude Code Skills for software engineering workflows - Git automation, testing, and code review
A Claude Code skill that turns PDFs, docs, and codebases into Obsidian study vaults
86 product management skills from Lenny's Podcast for Claude Code and AI agents. Hiring, user research, strate
Power rename/refactor tool (now with agent skill support!)
Explore other popular agent tool tools:
LLM-Agent-Evaluation-Survey is Top papers related to LLM-based agent evaluation. It is categorized as a Agent Tool with 89 GitHub stars.
You can find installation instructions and usage details in the LLM-Agent-Evaluation-Survey GitHub repository at github.com/Asaf-Yehudai/LLM-Agent-Evaluation-Survey. The project has 89 stars and 12 forks, indicating an active community.
LLM-Agent-Evaluation-Survey is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to LLM-Agent-Evaluation-Survey on Agent Skills Hub include claude-skills-marketplace, tutor-skills, lenny-skills. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.