How AI agents remember
A field guide to agent memory: the kinds of memory an agent keeps, where each one lives in Postgres, and when the agent reads or writes it during a single turn.
Contents
The gist
- Agent memory splits into three classic kinds borrowed from cognitive science (semantic, episodic, procedural) and four the agent era added (working, entity/graph, reflective, shared).
- Storage follows the access pattern: exact lookups in a SQL table, semantic recall (including which tools to use) in a pgvector index, relationships in a graph, all inside one Postgres.
- Most reads are deterministic and run before the model thinks; expensive reads are just-in-time; durable writes happen only when the turn ends.
- An agent that curates its own memory by forgetting the trivial and reflecting on hard runs stays sharp instead of drowning in its own history.
An agent with no memory is a brilliant stranger. It reasons well, calls tools, returns an answer, and then forgets everything the moment the turn ends. Memory is the fourth capability, alongside perception, reasoning, and action, and it is the one that turns a clever demo into a system you can rely on across days and sessions. This article is a map of three questions: what an agent stores, where each kind of memory lives in a database, and when the agent reaches for it.
The stateless default
A stateless agent reads its input, reasons over it, and produces output, with nothing carried between calls. That is fine for a one-shot tool and useless for anything real. Without memory an agent cannot run a task that spans many steps, cannot recognize a user who returns tomorrow, and cannot learn from a mistake it made an hour ago. The usual patch, pasting the entire history into every prompt, only delays the failure: the context window fills, latency climbs, cost grows with every turn, and eventually the window overflows.
Memory is the alternative to stuffing. Instead of carrying everything in the prompt, the agent persists what matters outside the context window and retrieves only the slice a given turn needs. The rest of this piece is about how to do that well.
Seven kinds of memory
"Memory" is not one thing. It helps to split it into kinds, because each kind answers a different question and, as we will see, wants a different home in the database.
Three kinds are borrowed straight from how people remember. Semantic memory holds facts: what is true about the user, the domain, and the world. Episodic memory holds experiences: a time-ordered record of what happened and what the agent did. Procedural memory holds skills: how to carry out a task without re-deriving it each time.
Four more kinds were added by the practice of building agents. Working memory is the scratchpad for the task in flight. Entity/graph memory is a map of the things an agent has seen and how they connect. Reflective memory is the lessons an agent draws from its own past runs. Shared memory is a pool that several agents read and write in common.
Durable facts about the user, the domain, and the world.
Stable preferences and attributes pulled from the chat.
"The user is vegetarian and prefers metric units."
A knowledge base the agent retrieves over (RAG).
A time-ordered record of what happened and what was done.
A short note per turn or event, kept in order.
"Yesterday you helped me draft an email to Sam."
A run log of past executions, for replay and audit.
How to carry out a task, plus the catalog of tools.
Reusable steps and workflows that worked before.
The system prompt and persona rules.
A learned tool sequence (workflow memory) and the toolbox.
The scratchpad for the task currently in flight.
The current goal, plan, and intermediate results.
The active conversation in the context window.
Plan and partial results held across loop steps.
Entities and the typed edges between them.
People, systems, and orgs, and how they connect.
"Sam is your manager; the thread was about Q3."
A knowledge graph the agent traverses (Graphiti, Zep).
Lessons the agent draws from its own past runs.
What went wrong, and the rule that avoids it next time.
"Ask for the deadline before drafting."
A lessons file loaded before similar tasks (Reflexion).
A pool several agents read and write in common.
Facts and state shared across the team.
A team-wide FAQ every bot reads from.
Multi-agent crew state, a shared blackboard.
These seven kinds are not seven databases. They collapse into a handful of storage shapes, and a single turn touches several of them at once, which is what the rest of this piece shows.
Where memory lives
Here is the engineering core, and it is simpler than the catalog of kinds suggests. The access pattern picks the backend. Ask one question of every read: do I know exactly which rows I want, or only that they should resemble something? The answer routes the data to one of three homes, all of which are just Postgres.
Exact lookups go in a SQL table. Conversational memory is the clearest case. You want this thread's most recent messages, in order. That is an index range scan, not a similarity search.
create table messages (
id bigint generated always as identity primary key,
thread_id text not null,
role text not null, -- 'user' | 'assistant'
content text not null,
created_at timestamptz not null default now()
);
create index on messages (thread_id, created_at desc);Reading the recent turns is a single indexed query, no embeddings involved:
select role, content
from messages
where thread_id = $1
order by created_at desc
limit 20;A tool log or event log has the same shape: rows you fetch by id and time, for replay and audit.
Semantic recall goes in a vector index. When you only know the meaning you want, store an embedding
of the content and search by distance. The pgvector extension adds a vector column type and distance
operators to plain Postgres.
create extension if not exists vector;
create table knowledge (
id bigint generated always as identity primary key,
content text not null,
embedding vector(768) not null,
metadata jsonb not null default '{}'
);
-- An approximate-nearest-neighbor index (HNSW) over cosine distance.
create index on knowledge using hnsw (embedding vector_cosine_ops);Retrieval pulls the top-K rows closest to the query embedding. The <=> operator is cosine distance:
select content, metadata
from knowledge
order by embedding <=> $1 -- $1 is the query embedding
limit 5;Because it is still SQL, you can scope the search with an ordinary filter on the JSONB metadata in the same query:
select content
from knowledge
where metadata @> '{"source": "papers"}'
order by embedding <=> $1
limit 5;Knowledge, learned workflows, the tool catalog, entity descriptions, and reflective lessons all share this layout: a content column, an embedding, and some metadata.
Relationships go in a graph. Entity memory is not really about similarity, it is about connection. Model entities and typed edges as two tables, then walk the graph with a recursive query.
create table entities (
id bigint generated always as identity primary key,
name text not null,
kind text,
embedding vector(768) -- so an entity can also be found by meaning
);
create table edges (
src bigint references entities(id),
rel text not null,
dst bigint references entities(id)
);Pulling the two-hop neighborhood around an entity is a recursive CTE:
with recursive nbrhood as (
select id, name, 0 as depth
from entities
where name = $1
union all
select e.dst, t.name, n.depth + 1
from nbrhood n
join edges e on e.src = n.id
join entities t on t.id = e.dst
where n.depth < 2
)
select distinct name, depth from nbrhood order by depth;Tools are memory, too. As an agent gains tools, sending every tool's schema to the model on every
call is the same mistake as stuffing the prompt: it is noise the model has to read past. Treat the tool
catalog as procedural memory. Store each tool's description as an embedding, exactly like the knowledge
table, and retrieve only the few tools whose description is closest to the task at hand.
book_flightreserve a seat on a flightkeptfind_hotelsearch hotels by city and datekeptconvert_currencyconvert between currencieskeptsend_emailsend a message to a recipientrun_sqlquery the production databasesearch_papersfind academic papers by topicsummarize_doccondense a long documentcreate_ticketopen a tracking ticketThe retrieval is the same <=> nearest-neighbor search you already run for knowledge; only the rows
differ. A catalog of two hundred tools costs the model nothing if the agent loads the three it needs, and
the model reasons better when it is not handed a wall of schemas it will never call.
One database, not three
Notice that nothing above leaves Postgres. The vector index is an extension, the graph is two ordinary tables, and the conversation log is a plain table with a B-tree. You do not need a dedicated vector store or a separate graph engine to ship a memory-aware agent; you need one Postgres and a clear idea of which question each table answers.
It is tempting to embed everything and search by similarity for all of it. Resist that. Asking a vector index for "this thread's last ten messages, in order" is both slow and wrong: it returns rows that are merely similar, not the ones you asked for. Exact, ordered reads are a B-tree's job. Similarity is the vector index's job. Connection is the graph's job. Match the backend to the question and most of the hard decisions disappear.
When the agent remembers
Knowing where memory lives is half the design. The other half is timing, and it is where most agents go wrong. A single turn touches memory at three distinct moments, and keeping them separate is what makes an agent both cheap and debuggable.
First, deterministic reads, before the model thinks. An agent cannot ask for what it does not know exists, so the core stores load on every turn by a fixed rule, not by the model's choice: recent conversation by thread, relevant knowledge by similarity, similar past workflows, and the entities named in the query. This is the context the model starts from.
Second, just-in-time reads, inside the loop. Expensive or conditional fetches wait until the model actually asks. A web search fires only when local memory comes up thin, and a full document loads only when the model decides it needs the detail. A lean prompt reasons better than a stuffed one, so the agent pulls detail just in time rather than just in case.
Third, durable writes, at the stop condition. When the turn produces a final answer, the agent persists what it learned: the exchange, the workflow it followed, any new entities, and a reflective lesson. Writing only at the end keeps half-finished reasoning out of long-term memory.
The system below traces one full turn through all three phases. Scroll it: the deterministic reads load from each store, the loop reaches outside only when it must, and the durable writes flow back when the turn ends.
One agent, one Postgres, three stores
Everything the agent remembers lives in three shapes inside a single database: a SQL table for exact, ordered reads, a vector index for meaning, and a graph for connection. Here is the whole system at rest. Scroll to watch one turn move through it.
Deterministic reads load the context
Before the model thinks, core memory loads by a fixed rule, not by the model's choice: recent conversation from the SQL table, relevant knowledge from the vector index, named entities from the graph. This is the context the turn starts from.
The model reasons, and reaches out just in time
Inside the loop the model reasons over that context. Only when local memory comes up thin does it reach outside, a web search or a document fetch, just in time rather than just in case. A lean prompt reasons better than a stuffed one.
Durable writes, only when the turn ends
When the turn produces a final answer, the agent writes what it learned back to the stores: the exchange, the workflow it followed, any new entities, and a reflective lesson. Writing only at the stop condition keeps half-finished reasoning out of long-term memory.
The tools live in the vector index too
The catalog of tools is procedural memory. Each tool's description is an embedding in the same vector index, so the agent retrieves the few tools that match the task instead of sending every schema to the model. The catalog can grow without bloating the prompt.
In code, the three phases are the top, middle, and bottom of one function. The model chooses only what happens inside the loop; everything around it is fixed.
def run_turn(query, thread_id):
# 1. Deterministic reads: assemble context before the model thinks.
context = assemble(
recent = sql_recent_messages(thread_id), # exact, by thread
knowledge = knn("knowledge", embed(query), k=5), # semantic top-K
workflow = knn("workflow", embed(query), k=3),
entities = graph_neighborhood(query), # graph traversal
)
write_message(thread_id, "user", query) # durable, every turn
# 2. The loop: reason, act, fetch just in time.
messages = [system_prompt(), user(context, query)]
for _ in range(MAX_STEPS):
step = model(messages)
if step.is_final:
break
result = call_tool(step) # e.g. expand_summary(id), web_search(q)
messages += [step, tool_result(result)]
# 3. Stop condition: persist what the turn learned.
write_message(thread_id, "assistant", step.text)
upsert_workflow(query, steps_taken=messages)
upsert_entities(extract_entities(step.text))
write_lesson(reflect(query, step.text)) # reflective memory
return step.textRead the structure, not the syntax: reads at the top, a tight loop in the middle, writes at the bottom. Because the reads and writes are deterministic, every run starts and ends the same way, which is exactly what makes a misbehaving agent possible to debug.
Keeping memory from rotting
A memory that only grows eventually buries the agent in its own history. So writing needs discipline, the same way human memory keeps what matters and lets the trivial fade.
Forgetting. Not every fact earns a permanent place. A durable preference is worth a write; a one-off aside is not. Deciding what to keep takes judgment, so it is one of the few memory operations that belongs to the model rather than to a rule.
Forgetting is a feature
The instinct is to save everything, just in case. But noise retrieved is noise reasoned over. An agent that writes selectively keeps its future retrievals sharp; an agent that hoards slowly poisons its own context.
Reflection. After a hard task, the agent writes down what it learned. The next time a similar task begins, that lesson loads with the rest of memory, and the agent avoids repeating the mistake. This is the loop that lets an agent get better at a job over time instead of starting fresh each morning.
The takeaway
Three questions carry the whole design. The taxonomy answers what to store: seven kinds, three of them borrowed from how people remember and four added by building agents. Postgres answers where it goes: exact reads in a SQL table, semantic reads (knowledge, and the tools to act on it) in a pgvector index, relationships in a graph, all in one database. The loop answers when to touch it: deterministic reads before the model thinks, just-in-time reads while it works, durable writes when it stops. Get those three right and you have crossed most of the distance between a demo that forgets you and a system that does not.
Memory'si olmayan bir agent, zeki ama unutkan bir yabancı gibidir: güzel reasoning yapar, tool çağırır, cevabını döndürür, ama turn biter bitmez her şeyi unutur. Memory; perception, reasoning ve action'ın yanındaki dördüncü temel yetenektir ve bir agent'ı gösterişli bir demo olmaktan çıkarıp günler, hatta session'lar boyunca güvenebileceğin bir sisteme dönüştüren şey de tam olarak budur. Bu yazı üç sorunun haritasını çıkarıyor: agent neyi saklar, her memory türü database'in neresinde durur ve agent ona ne zaman başvurur.
The stateless default
Stateless bir agent input'unu alır, üzerinde reasoning yapar ve bir output üretir; çağrılar arasında hiçbir şey taşımaz. Tek seferlik bir tool için bu yeterli, ama gerçek hiçbir iş için yetmez. Memory olmadan bir agent ne çok adımlı bir task'ı yürütebilir, ne yarın geri dönen kullanıcıyı tanıyabilir, ne de bir saat önce yaptığı hatadan ders çıkarabilir. Akla ilk gelen çözüm, yani tüm geçmişi her prompt'a yapıştırmak, sorunu yalnızca erteler: context window dolar, latency artar, maliyet her turn'le birlikte büyür ve er ya da geç window taşar.
Memory işte bu stuffing'in alternatifi. Agent her şeyi prompt'ta taşımak yerine önemli olanı context window'un dışında persist eder, o turn'ün ihtiyaç duyduğu kısmı da retrieve eder. Yazının geri kalanı bunu nasıl düzgün yapacağını anlatıyor.
Seven kinds of memory
"Memory" tek parça bir şey değil. Onu türlere ayırmak işe yarıyor, çünkü her tür farklı bir soruya cevap veriyor ve birazdan göreceğimiz gibi database'de farklı bir yere oturuyor.
Bunların üçü doğrudan insan hafızasından geliyor. Semantic memory, facts'i tutar: kullanıcı, domain ve dünya hakkında neyin doğru olduğunu. Episodic memory, yaşananları tutar: ne olduğunun ve agent'ın ne yaptığının zaman sıralı kaydını. Procedural memory ise beceriyi tutar: bir task'ı her seferinde sıfırdan türetmeden nasıl yapacağını.
Kalan dört türü ise agent geliştirme pratiği sonradan ekledi. Working memory, o an üzerinde çalışılan task'ın müsvedde defteri. Entity/graph memory, agent'ın gördüğü şeylerin ve aralarındaki bağların haritası. Reflective memory, agent'ın kendi geçmiş run'larından çıkardığı dersler. Shared memory ise birden fazla agent'ın ortaklaşa okuyup yazdığı bir havuz.
Durable facts about the user, the domain, and the world.
Stable preferences and attributes pulled from the chat.
"The user is vegetarian and prefers metric units."
A knowledge base the agent retrieves over (RAG).
A time-ordered record of what happened and what was done.
A short note per turn or event, kept in order.
"Yesterday you helped me draft an email to Sam."
A run log of past executions, for replay and audit.
How to carry out a task, plus the catalog of tools.
Reusable steps and workflows that worked before.
The system prompt and persona rules.
A learned tool sequence (workflow memory) and the toolbox.
The scratchpad for the task currently in flight.
The current goal, plan, and intermediate results.
The active conversation in the context window.
Plan and partial results held across loop steps.
Entities and the typed edges between them.
People, systems, and orgs, and how they connect.
"Sam is your manager; the thread was about Q3."
A knowledge graph the agent traverses (Graphiti, Zep).
Lessons the agent draws from its own past runs.
What went wrong, and the rule that avoids it next time.
"Ask for the deadline before drafting."
A lessons file loaded before similar tasks (Reflexion).
A pool several agents read and write in common.
Facts and state shared across the team.
A team-wide FAQ every bot reads from.
Multi-agent crew state, a shared blackboard.
Bu yedi tür, yedi ayrı database demek değil. Hepsi birkaç temel storage biçimine iner; üstelik tek bir turn aynı anda birkaçına birden dokunur. Yazının geri kalanı da zaten bunu gösteriyor.
Where memory lives
İşin mühendislik özü burada ve türlerin listesine bakınca sanacağından çok daha basit. Backend'i seçen şey access pattern. Her read'de kendine tek bir soru sor: hangi row'ları istediğimi tam olarak biliyor muyum, yoksa sadece bir şeye benzemelerini mi istiyorum? Bu sorunun cevabı veriyi üç evden birine yönlendirir ve üçü de aslında sadece Postgres.
Tam bilinen lookup'lar SQL table'a gider. En net örnek conversational memory. Bu thread'in en son mesajlarını sırasıyla istiyorsun; bu bir index range scan işi, similarity search değil.
create table messages (
id bigint generated always as identity primary key,
thread_id text not null,
role text not null, -- 'user' | 'assistant'
content text not null,
created_at timestamptz not null default now()
);
create index on messages (thread_id, created_at desc);Son turn'leri okumak için tek bir indexed query yeter, hiçbir embedding'e gerek yok:
select role, content
from messages
where thread_id = $1
order by created_at desc
limit 20;Bir tool log ya da event log da aynı yapıda: id ve zamana göre çektiğin, replay ve audit için tuttuğun row'lar.
Anlamca yakın recall vector index'e gider. Elinde sadece aradığın anlam varsa, içeriğin embedding'ini
saklar ve distance'a göre ararsın. pgvector extension'ı, düz Postgres'e bir vector column tipi ve
distance operatörleri kazandırır.
create extension if not exists vector;
create table knowledge (
id bigint generated always as identity primary key,
content text not null,
embedding vector(768) not null,
metadata jsonb not null default '{}'
);
-- An approximate-nearest-neighbor index (HNSW) over cosine distance.
create index on knowledge using hnsw (embedding vector_cosine_ops);Retrieval, query embedding'ine en yakın top-K row'u getirir. <=> operatörü de cosine distance demek:
select content, metadata
from knowledge
order by embedding <=> $1 -- $1 is the query embedding
limit 5;Sonuçta hâlâ SQL yazdığın için, aramayı aynı query içinde JSONB metadata üzerinde sıradan bir filter ile daraltabilirsin:
select content
from knowledge
where metadata @> '{"source": "papers"}'
order by embedding <=> $1
limit 5;Knowledge, öğrenilmiş workflow'lar, tool kataloğu, entity tanımları ve reflective ders'lerin hepsi aynı düzeni paylaşır: bir content column, bir embedding ve biraz metadata.
İlişkiler graph'a gider. Entity memory'nin derdi aslında benzerlik değil, bağlantı. Entity'leri ve aralarındaki tipli edge'leri iki table olarak modelle, sonra graph'ı recursive bir query ile dolaş.
create table entities (
id bigint generated always as identity primary key,
name text not null,
kind text,
embedding vector(768) -- so an entity can also be found by meaning
);
create table edges (
src bigint references entities(id),
rel text not null,
dst bigint references entities(id)
);Bir entity'nin çevresindeki two-hop komşuluğu çekmek için recursive bir CTE kullanırsın:
with recursive nbrhood as (
select id, name, 0 as depth
from entities
where name = $1
union all
select e.dst, t.name, n.depth + 1
from nbrhood n
join edges e on e.src = n.id
join entities t on t.id = e.dst
where n.depth < 2
)
select distinct name, depth from nbrhood order by depth;Tool'lar da birer memory. Agent'ın tool sayısı arttıkça her çağrıda her tool'un schema'sını modele
yollamak, prompt'u stuffing yapmakla aynı hata: modelin okuyup geçmek zorunda kaldığı bir gürültüye
dönüşür. Tool kataloğunu procedural memory gibi düşün. Her tool'un açıklamasını tıpkı knowledge
table'ındaki gibi bir embedding olarak sakla, sonra sadece açıklaması o anki task'a en yakın olan birkaç
tool'u retrieve et.
book_flightreserve a seat on a flightkeptfind_hotelsearch hotels by city and datekeptconvert_currencyconvert between currencieskeptsend_emailsend a message to a recipientrun_sqlquery the production databasesearch_papersfind academic papers by topicsummarize_doccondense a long documentcreate_ticketopen a tracking ticketBu retrieval, knowledge için zaten kullandığın <=> nearest-neighbor search'ün aynısı; tek fark,
dönen row'lar. İki yüz tool'luk bir katalog bile, agent ihtiyacı olan üçünü yüklediği sürece modele
hiçbir yük bindirmez; üstelik model, hiç çağırmayacağı bir schema yığınıyla boğuşmadığında daha iyi
reasoning yapar.
Üç değil, tek bir database
Dikkat et, yukarıda hiçbir şey Postgres'in dışına çıkmadı. Vector index bir extension, graph iki sıradan table, conversation log ise B-tree'li düz bir table. Memory-aware bir agent'ı canlıya almak için ayrı bir vector store ya da ayrı bir graph engine'e ihtiyacın yok; tek bir Postgres ve her table'ın hangi soruya cevap verdiğini net bilmen yeterli.
İnsanın içinden her şeyi embed edip her şeyi benzerlikle aramak geçer, ama buna direnmek lazım. Bir vector index'e "şu thread'in son on mesajını sırayla ver" demek hem yavaş hem yanlış: istediğin row'ları değil, sadece onlara benzeyenleri döner. Tam ve sıralı read'ler B-tree'nin işi, benzerlik vector index'in işi, bağlantı ise graph'ın işi. Backend'i soruyla eşleştirdiğin anda zor kararların çoğu kendiliğinden ortadan kalkar.
When the agent remembers
Memory'nin nerede durduğunu bilmek tasarımın yarısı. Diğer yarısı timing ve çoğu agent'ın çuvalladığı yer de tam burası. Tek bir turn, memory'e üç ayrı anda dokunur; bu üçünü birbirinden ayrı tutmak, bir agent'ı hem ucuz hem de debug edilebilir kılan şey.
Birincisi, deterministic read'ler, model daha düşünmeye başlamadan. Bir agent var olduğunu bilmediği şeyi isteyemez; o yüzden çekirdek store'lar her turn'de modelin keyfine değil, sabit bir kurala göre yüklenir: thread'e göre son conversation, benzerliğe göre ilgili knowledge, geçmişteki benzer workflow'lar ve query'de adı geçen entity'ler. Modelin yola çıktığı context işte bu.
İkincisi, just-in-time read'ler, loop'un içinde. Pahalı ya da koşullu fetch'ler, model gerçekten isteyene kadar bekler. Web search ancak local memory zayıf kaldığında devreye girer; bir doküman da ancak model detayına ihtiyaç duyduğunda tam olarak yüklenir. Yalın bir prompt, tıka basa dolu olandan daha iyi reasoning yapar; bu yüzden agent detayı "ne olur ne olmaz" diye değil, tam ihtiyaç anında çeker.
Üçüncüsü, durable write'lar, stop condition'da. Turn bir final answer ürettiğinde agent öğrendiklerini persist eder: exchange'i, izlediği workflow'u, yeni entity'leri ve bir reflective ders'i. Sadece sonda yazmak, yarım kalmış reasoning'in long-term memory'ye sızmasını engeller.
Aşağıdaki sistem, tek bir turn'ü bu üç fazın hepsinden geçiriyor. Kaydırarak izle: deterministic read'ler her store'dan yüklenir, loop yalnızca mecbur kaldığında dışarıya uzanır, turn biterken de durable write'lar geri akar.
One agent, one Postgres, three stores
Everything the agent remembers lives in three shapes inside a single database: a SQL table for exact, ordered reads, a vector index for meaning, and a graph for connection. Here is the whole system at rest. Scroll to watch one turn move through it.
Deterministic reads load the context
Before the model thinks, core memory loads by a fixed rule, not by the model's choice: recent conversation from the SQL table, relevant knowledge from the vector index, named entities from the graph. This is the context the turn starts from.
The model reasons, and reaches out just in time
Inside the loop the model reasons over that context. Only when local memory comes up thin does it reach outside, a web search or a document fetch, just in time rather than just in case. A lean prompt reasons better than a stuffed one.
Durable writes, only when the turn ends
When the turn produces a final answer, the agent writes what it learned back to the stores: the exchange, the workflow it followed, any new entities, and a reflective lesson. Writing only at the stop condition keeps half-finished reasoning out of long-term memory.
The tools live in the vector index too
The catalog of tools is procedural memory. Each tool's description is an embedding in the same vector index, so the agent retrieves the few tools that match the task instead of sending every schema to the model. The catalog can grow without bloating the prompt.
Kod tarafında bu üç faz aslında tek bir fonksiyonun üstü, ortası ve altı. Model yalnızca loop'un içinde ne olacağına karar verir; çevresindeki her şey sabittir.
def run_turn(query, thread_id):
# 1. Deterministic reads: assemble context before the model thinks.
context = assemble(
recent = sql_recent_messages(thread_id), # exact, by thread
knowledge = knn("knowledge", embed(query), k=5), # semantic top-K
workflow = knn("workflow", embed(query), k=3),
entities = graph_neighborhood(query), # graph traversal
)
write_message(thread_id, "user", query) # durable, every turn
# 2. The loop: reason, act, fetch just in time.
messages = [system_prompt(), user(context, query)]
for _ in range(MAX_STEPS):
step = model(messages)
if step.is_final:
break
result = call_tool(step) # e.g. expand_summary(id), web_search(q)
messages += [step, tool_result(result)]
# 3. Stop condition: persist what the turn learned.
write_message(thread_id, "assistant", step.text)
upsert_workflow(query, steps_taken=messages)
upsert_entities(extract_entities(step.text))
write_lesson(reflect(query, step.text)) # reflective memory
return step.textSyntax'a değil yapıya bak: üstte read'ler, ortada sıkı bir loop, altta write'lar. Read'ler ve write'lar deterministic olduğu için her run aynı şekilde başlar, aynı şekilde biter; hatalı davranan bir agent'ı debug edilebilir kılan da işte bu.
Keeping memory from rotting
Sürekli büyüyen bir memory, eninde sonunda agent'ı kendi geçmişinin altında boğar. O yüzden yazma işinin bir disiplini olmalı; tıpkı insan hafızasının önemli olanı tutup önemsizi zamanla unutması gibi.
Forgetting. Her fact kalıcı bir yeri hak etmez. Kalıcı bir tercihi yazmaya değer, ama geçerken söylenmiş tek seferlik bir laf değmez. Neyin tutulacağına karar vermek muhakeme ister; zaten bu yüzden bir kurala değil de doğrudan modele bırakılan birkaç memory işleminden biri.
Forgetting bir feature'dır
İlk içgüdü, "ne olur ne olmaz" diye her şeyi saklamak. Oysa retrieve ettiğin gürültü, sonunda üzerinde reasoning yaptığın gürültü olur. Seçici yazan bir agent, ileride yapacağı retrieval'ları keskin tutar; her şeyi istifleyen bir agent ise kendi context'ini yavaş yavaş zehirler.
Reflection. Zorlu bir task'ın ardından agent öğrendiğini yazıya döker. Benzer bir task bir dahaki sefere geldiğinde o ders, memory'nin geri kalanıyla birlikte yüklenir ve agent aynı hataya düşmez. Bir agent'ın her sabah sıfırdan başlamak yerine bir işte zamanla ustalaşmasını sağlayan döngü işte bu.
The takeaway
Tüm tasarımı üç soru taşıyor. Taxonomy, neyi saklayacağını söyler: üçü insan hafızasından gelen, dördü agent geliştirmenin eklediği yedi tür. Postgres, nereye koyacağını söyler: tam read'ler SQL table'da, anlamca read'ler (knowledge ve onunla iş görecek tool'lar) pgvector index'te, ilişkiler graph'ta, hepsi de tek bir database içinde. Loop ise ne zaman dokunacağını söyler: model düşünmeden önce deterministic read'ler, çalışırken just-in-time read'ler, işi bitince durable write'lar. Bu üçünü doğru kurarsan, seni unutan bir demo ile seni unutmayan bir sistem arasındaki yolun büyük kısmını çoktan katetmiş olursun.