by jeinlee1991 · Agent Tool · ★ 6.2k
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
非线智能 NoneLinear - ReLE评测:中文AI大模型能力评测(持续更新) ReLE (Really Reliable Live Evaluation for LLM),原名CLiB 目前已囊括384个大模型,覆盖chatgpt、gpt-5.5、谷歌gemini-3.1-pro、Claude-4.8、文心ERNIE-X1.1、ERNIE-5.1、qwen3.7-max、qwen3.7-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.7-flash、kimi-k2.6、ernie4.5、MiniMax-M3、deepseek-v4、Qwen3.6、llama4、智谱GLM-5.1、MiMo-V2、LongCat、gemma4、mistral等开源大模型。 支持多维度能力评测,包括教育、医疗与心理健康、金融、法律与行政公务、推理与数学计算、语言与指令遵从、agent与工具调用等7个领域,以及细分的300个维度(比如牙科、高中语文…)。详见我们的技术报告ReLE: A Scalable System and Structured Benchmark for Diagnosing Capability Anisotropy in Chinese LLMs 媒体报道(机器之心):全球304个中文大模型实测:没有“全能王者”,ReLE凭70%降本方案破解评估困局 不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。 为您的私有大模型提供免费评测服务,联系我们(非线智能 ReLE benchmark团队):加微信 目录 🔄最近更新 ⚓GitHub热门大模型评测项目 📝大模型基本信息 📊排行榜 0、多模态排行榜 1、综合能力排行榜 1.1 推理类模型排行榜 1.2 商用大模型排行榜(含开源模型的付费API) 1.3 开源大模型排行榜 2、教育排行榜 2.1 小学学科 2.3 中考TODO 2.4 高中学科 2.6 高等教育TODO 2.7 考研TODO | 2.8 教师资格TODO 3、医疗与心理健康排行榜 3.1 医师
| Stars | 6,223 |
| Forks | 254 |
| Category | Agent Tool |
| Quality Score | 50.9624801952756/100 |
| Open Issues | 15 |
| Last Updated | 2026-06-27 |
| Created | 2023-06-04 |
| Platforms | claude-code, gemini |
| Est. Tokens | ~28k |
Looking for a chinese-llm-benchmark alternative? If you're comparing chinese-llm-benchmark with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Pocket Flow: 100-line LLM framework. Let Agents build Agents!
Comprehensive resources on Generative AI, including a detailed roadmap, projects, use cases, interview prepara
Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.
Build, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.
Build and run agents you can see, understand and trust.
📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for w
Explore other popular agent tool tools:
chinese-llm-benchmark is 非线智能 NoneLinear - ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括374个大模型,覆盖chatgpt、gpt-5.4、谷歌gemini-3.1-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3.6-max、qwen3.6-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.. It is categorized as a Agent Tool with 6.2k GitHub stars.
You can find installation instructions and usage details in the chinese-llm-benchmark GitHub repository at github.com/jeinlee1991/chinese-llm-benchmark. The project has 6.2k stars and 254 forks, indicating an active community.
The top alternatives to chinese-llm-benchmark on Agent Skills Hub include PocketFlow, generative-ai, AgenticRAG-Survey. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.