LLM-Agent-Evaluation-Survey — Agent Tool by Asaf-Yehudai

Last updated: 2025-10-21 · Indexed by AgentSkillsHub · Auto-synced every 8h

About LLM-Agent-Evaluation-Survey

Evaluation of LLM-based Agents: A Reading List Based on the Survey Paper: Survey on Evaluation of LLM-based Agents (arXiv 2025) Asaf Yehudai¹², Lilach Eden², Alan Li³, Guy Uziel², Yilun Zhao³, Roy Bar-Haim², Arman Cohan³, Michal Shmueli-Scheuer² ¹The Hebrew University of Jerusalem ³Yale University About This Repository This repository serves as a companion to the survey paper "Survey on Evaluation of LLM-based Agents". It organizes evaluation methodologies, benchmarks, and frameworks according to the structure presented in the paper, aiming to provide a comprehensive resource for researchers and practitioners in the field of LLM-based agents. The selection criteria focus on works discussed within the survey, covering: Fundamental Agent Capabilities: Planning, Tool Use, Self-Reflection, Memory. Application-Specific Domains: Web, Software Engineering, Scientific, Conversational Agents. Generalist Agent Evaluation. Evaluation Frameworks. Our goal is to map the rapidly evolving landscape of agent evaluation, highlight key trends, and identify current limitations as discussed in the sur

Quick Facts

Stars	89
Forks	12
Category	Agent Tool
License	MIT
Quality Score	47.0981958439832/100
Last Updated	2025-10-21
Created	2025-04-28
Est. Tokens	~3087k

LLM-Agent-Evaluation-Survey alternative? Top 4 similar tools

Looking for a LLM-Agent-Evaluation-Survey alternative? If you're comparing LLM-Agent-Evaluation-Survey with other agent tool tools, these 4 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

claude-skills-marketplace by mhattingpete · ⭐ 442
Claude Code Skills for software engineering workflows - Git automation, testing, and code review
tutor-skills by RoundTable02 · ⭐ 400
A Claude Code skill that turns PDFs, docs, and codebases into Obsidian study vaults
lenny-skills by RefoundAI · ⭐ 382
86 product management skills from Lenny's Podcast for Claude Code and AI agents. Hiring, user research, strate
repren by jlevy · ⭐ 371
Power rename/refactor tool (now with agent skill support!)

More Agent Tool Tools

Explore other popular agent tool tools:

View all Agent Tool tools →

Frequently Asked Questions

What is LLM-Agent-Evaluation-Survey?

LLM-Agent-Evaluation-Survey is Top papers related to LLM-based agent evaluation. It is categorized as a Agent Tool with 89 GitHub stars.

How do I install or use LLM-Agent-Evaluation-Survey?

You can find installation instructions and usage details in the LLM-Agent-Evaluation-Survey GitHub repository at github.com/Asaf-Yehudai/LLM-Agent-Evaluation-Survey. The project has 89 stars and 12 forks, indicating an active community.

What license does LLM-Agent-Evaluation-Survey use?

LLM-Agent-Evaluation-Survey is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to LLM-Agent-Evaluation-Survey?

The top alternatives to LLM-Agent-Evaluation-Survey on Agent Skills Hub include claude-skills-marketplace, tutor-skills, lenny-skills. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Agent Tool tools