<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>测评 on 查拉图的数字花园</title><link>https://www.chalatu.xyz/tags/%E6%B5%8B%E8%AF%84/</link><description>Recent content in 测评 on 查拉图的数字花园</description><generator>Hugo</generator><language>zh-CN</language><lastBuildDate>Tue, 14 Apr 2026 12:31:35 +0800</lastBuildDate><atom:link href="https://www.chalatu.xyz/tags/%E6%B5%8B%E8%AF%84/index.xml" rel="self" type="application/rss+xml"/><item><title>智谱GLM-5.1全测评：开源拿下SWE-Bench Pro全球第一，代价是什么</title><link>https://www.chalatu.xyz/posts/solo-company/2026-04-14-glm-5-1-full-review-2026/</link><pubDate>Tue, 14 Apr 2026 12:31:35 +0800</pubDate><guid>https://www.chalatu.xyz/posts/solo-company/2026-04-14-glm-5-1-full-review-2026/</guid><description>GLM-5.1在SWE-Bench Pro上以58.4分超越Claude Opus 4.6和GPT-5.4，成为首个开源登顶这一编程基准的模型。但这个「第一」有多少含金量？它的长程任务、多模态、推理能力真实水位在哪里？本文基于最新公开基准数据，给出一个不加滤镜的评测判断。</description></item></channel></rss>