<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Benchmark on 查拉图的数字花园</title><link>https://www.chalatu.xyz/tags/benchmark/</link><description>Recent content in Benchmark on 查拉图的数字花园</description><generator>Hugo</generator><language>zh-CN</language><lastBuildDate>Sat, 11 Apr 2026 11:00:00 +0800</lastBuildDate><atom:link href="https://www.chalatu.xyz/tags/benchmark/index.xml" rel="self" type="application/rss+xml"/><item><title>大模型如何被评判好坏？一文读懂 LLM 评测全体系</title><link>https://www.chalatu.xyz/posts/solo-company/2026-04-11-llm-evaluation-complete-guide/</link><pubDate>Sat, 11 Apr 2026 11:00:00 +0800</pubDate><guid>https://www.chalatu.xyz/posts/solo-company/2026-04-11-llm-evaluation-complete-guide/</guid><description>从评测维度、核心指标到权威机构和测评网站，系统梳理 AI 大模型能力评估的完整图景，帮你看穿各家榜单背后的逻辑与陷阱。</description></item></channel></rss>