by PAIR-code · Agent Tool · ★ 521
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
LLM Comparator LLM Comparator is an interactive visualization tool with a python library, for analyzing side-by-side LLM evaluation results. It is designed to help people qualitatively analyze how responses from two models differ at example- and slice-levels. Users can interactively discover insights like "Model A's responses are better than B's on email rewriting tasks because Model A tends to generate bulleted lists more often." Using LLM Comparator You can play with LLM Comparator at https://pair-code.github.io/llm-comparator/. You can either select one of the example files we provide, or you can upload your own JSON file (e.g., minimal example file) that follows our format which we describe below. Example Demo for Comparing Gemma 1.1 and 1.0 We provide an example file for comparing the model responses between Gemma 1.1 and 1.0 for prompts obtained from the Chatbot Arena Conversations dataset. You can click the link below to play with it: https://pair-code.github.io/llm-comparator/?resultspath=https://pair-code.github.io/llm-comparator/data/examplearena.json The tool helps you analyze when and why Gemma 1.1 is better or worse than 1.0 and how responses from two models differ.
| Stars | 521 |
| Forks | 50 |
| Language | JavaScript |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 62.8832906712694/100 |
| Open Issues | 4 |
| Last Updated | 2025-02-11 |
| Created | 2024-05-07 |
| Platforms | node |
| Est. Tokens | ~706k |
These tools work well together with llm-comparator for enhanced workflows:
Looking for a llm-comparator alternative? If you're comparing llm-comparator with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
This repository contains a collection of Agent Skills developed by GudaStudio, enabling seamless collaboration
Supercharge Claude Code with 11 AI agents, 36 commands & 15 skills — the claude-code plugin framework inspired
Skill to give Claude Code (and any coding agent) the ability to generate beautiful and practical Excalidraw di
A collection of Agent skills and Claude Code plugins for HashiCorp products.
A collection of standardized Agent Skills to teach GitHub Copilot, Claude, Gemini and Cursor about modern Andr
Claude Code Skill Factory — A powerful open-source toolkit for building and deploying production-ready Claude
Explore other popular agent tool tools:
llm-comparator is LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.. It is categorized as a Agent Tool with 521 GitHub stars.
llm-comparator is primarily written in JavaScript.
You can find installation instructions and usage details in the llm-comparator GitHub repository at github.com/PAIR-code/llm-comparator. The project has 521 stars and 50 forks, indicating an active community.
llm-comparator is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to llm-comparator on Agent Skills Hub include skills, claude-forge, excalidraw-diagram-skill. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.