AI | Mig's Blog

在未知中前行：5 条原则

——读 Jaclyn Konzelmann《Looking Ahead into 2026》笔记她今年放弃了惯例的"AI 预测清单"——因为变化太快，预测毫无意义。改为定下几条原则： ...

Deep Agents From LangGraph

在过去的一年里，AI Agent 的演进出现了两个非常重要的趋势：智能体正在变得更通用（Generalist）：可以承担越来越多类型的任务；智能体的任务时长变得更长（Long-horizon）：能够连续执行几十甚至上百个步骤的复杂任务。根据 METR 的基准测试，AI 能自动完成的人类任务等效时长大约每 7 个月翻倍。这意味着智能体从“短对话助手”，发展为“能够连续运行数百甚至上千步的自主系统”。 ...

Introduction to training LLMs for AI agents

大家可能都已经对 LLM 很熟悉了。大概在两三年前，ChatGPT、Claude、Llama、DeepSeek 等模型相继出现，可以说是彻底改变了世界。但在使用这些强大工具的同时,一个核心问题值得探讨：这些模型到底是如何训练的？ ...

Context Engineering

By 2025, existing models have already become remarkably intelligent. However, even the smartest system cannot perform effectively without understanding what it is being asked to do. Prompr engineering refers to the practice of phrasing tasks in an optimal way for large language model-based chatbots. Context engineering, on the other hand, represents the next stage - aiming to automate this process within dynamic systems. What is Context Engineering? Tobi, from Shopify, shared an interesting post in which he expressed his appreciation for the term “Context Engineering.” Later, Karpathy followed up with a brilliant definition: ...

How to Use Reasoning Models?

The following insights are drawn from the Reasoning with o1 video course by DeepLearning.ai. This article explores how to effectively prompt and utilize the new generation of reasoning models. Models released over the past year have demonstrated remarkable progress in reasoning and planning tasks. OpenAI has deeply optimized Chain of Thought (CoT) processing, using reinforcement learning to fine-tune models so they automatically integrate step-by-step reasoning into their response process. ...

Open Training Recipes for Reasoning in Language Models

In today’s rapidly evolving AI landscape, the remarkable progress we’ve witnessed is largely attributed to open scientific research and fully open models. However, as time progresses, more and more research and development work is becoming increasingly closed off. We still need to delve deeper into how language models work, improve their capabilities, and make them safer, more efficient, and more reliable. Simultaneously, we need to extend language models’ abilities beyond text into domains like healthcare, science, and even complex decision-making processes. Most importantly, we must bring these models into real-world applications, ensuring they are deployable, interpretable, and effectively mitigate biases and risks. ...

Memory, Reasoning, and Planning of Language Agents

Language Agents have emerged as one of the most exciting research directions in AI over the past two years. This article explores three core components: long-term memory via HippoRAG, reasoning capabilities with Grokked Transformers, and world modeling through WebDreamer. Why Agents Again? Russell & Norvig in “Artificial Intelligence: A Modern Approach” define an agent as “anything that can perceive its environment through sensors and act upon that environment through actions.”（@ArtificialIntelligenceModern） ...

LLM Agents: Brief History and Overview

Introduction To understand LLM agents, we need to break the term into two foundational components: Large Language Models (LLMs) and Agents. While LLMs have gained widespread recognition, the concept of “agent” in this context requires deeper exploration. What is an Agent? In artificial intelligence, an agent is an “intelligent” system that perceives and interacts with an “environment” to achieve specific goals. The classification of agents varies based on their operational environment: ...

大语言模型的自我提升与推理能力进化(Jason Weston, Meta)

本文内容来自 Jason Weston (Meta) 在 UC Berkeley Advanced Large Language Model Agents 课程中的分享，探讨了大语言模型的推理能力提升。以下为讲座内容： AI 能力正在快速发展，如 O1、R1 等模型在推理基准测试中取得的突破性进展。本文将聚焦于模型的自我提升能力(self-improvement)。 ...

如何优化大语言模型（LLM）的推理能力？

2024年，大语言模型在推理能力方面取得了显著突破。以O系列模型为例，在ARC-AGI评估任务中展现了令人瞩目的性能【1】： O3模型达到了87.5%的准确率，尽管每个任务的计算成本较高（超过$1,000）相比之下，未采用特殊推理技术的传统LLMs准确率通常低于25% Fig.1: O-Series Performance 如何通过有效的Prompting 来激发大语言模型的深层次推理能力，一直是研究者和开发者关注的核心问题。以下是几种主要的触发方法： ...