LLM Agents

Deep Agents From LangGraph

在过去的一年里，AI Agent 的演进出现了两个非常重要的趋势：智能体正在变得更通用（Generalist）：可以承担越来越多类型的任务；智能体的任务时长变得更长（Long-horizon）：能够连续执行几十甚至上百个步骤的复杂任务。根据 METR 的基准测试，AI 能自动完成的人类任务等效时长大约每 7 个月翻倍。这意味着智能体从“短对话助手”，发展为“能够连续运行数百甚至上千步的自主系统”。 ...

Introduction to training LLMs for AI agents

大家可能都已经对 LLM 很熟悉了。大概在两三年前，ChatGPT、Claude、Llama、DeepSeek 等模型相继出现，可以说是彻底改变了世界。但在使用这些强大工具的同时,一个核心问题值得探讨：这些模型到底是如何训练的？ ...

Memory, Reasoning, and Planning of Language Agents

Language Agents have emerged as one of the most exciting research directions in AI over the past two years. This article explores three core components: long-term memory via HippoRAG, reasoning capabilities with Grokked Transformers, and world modeling through WebDreamer. Why Agents Again? Russell & Norvig in “Artificial Intelligence: A Modern Approach” define an agent as “anything that can perceive its environment through sensors and act upon that environment through actions.”（@ArtificialIntelligenceModern） ...

LLM Agents: Brief History and Overview

Introduction To understand LLM agents, we need to break the term into two foundational components: Large Language Models (LLMs) and Agents. While LLMs have gained widespread recognition, the concept of “agent” in this context requires deeper exploration. What is an Agent? In artificial intelligence, an agent is an “intelligent” system that perceives and interacts with an “environment” to achieve specific goals. The classification of agents varies based on their operational environment: ...

大语言模型的自我提升与推理能力进化(Jason Weston, Meta)

本文内容来自 Jason Weston (Meta) 在 UC Berkeley Advanced Large Language Model Agents 课程中的分享，探讨了大语言模型的推理能力提升。以下为讲座内容： AI 能力正在快速发展，如 O1、R1 等模型在推理基准测试中取得的突破性进展。本文将聚焦于模型的自我提升能力(self-improvement)。 ...

如何优化大语言模型（LLM）的推理能力？

2024年，大语言模型在推理能力方面取得了显著突破。以O系列模型为例，在ARC-AGI评估任务中展现了令人瞩目的性能【1】： O3模型达到了87.5%的准确率，尽管每个任务的计算成本较高（超过$1,000）相比之下，未采用特殊推理技术的传统LLMs准确率通常低于25% Fig.1: O-Series Performance 如何通过有效的Prompting 来激发大语言模型的深层次推理能力，一直是研究者和开发者关注的核心问题。以下是几种主要的触发方法： ...