This is the second part of a multi part series covering AI agents. Today we are going to build one from scratch. If you haven't read Part 1, feel free to do so. I'll wait here. Done? Good. Let's get our hands dirty.
What About Bot?
We start with the simplest chat bot you can imagine. It cannot do much — just replying to what you wrote — but it's ours.
import OpenAI from "openai";
import * as readline from "readline";
const client = new OpenAI();
const terminal = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
function main() {
terminal.question("You: ", async (input) => {
const completion = await client.chat.completions.create({
messages: [{ role: "user", content: input }],
model: "gpt-52-chat",
});
const reply = completion.choices[0].message.content;
terminal.write(`Assistant: ${reply}\n`);
main();
});
}
main();It works, but it's also a little weird. It's not a conversation, since the LLM only knows the latest message. It does not have knowledge of your previous messages. It cannot remember, because every call starts from scratch. Imagine waking up every morning at 6 am — listening to the same radio station — having the same conversation with the same people over and over again. This is what talking to our current chat bot implementation feels like. Let's fix that.
I Know What You Did Last Summer
Remembering the name of a colleague you just met and casually bringing it up in a conversation or being haunted right before you fall asleep by an embarrassing memory from when you were a kid. Memories come in all shapes and sizes. We, as in we humans, have a short-term memory and a long-term memory. We also have an ultrashort-term memory, but for the sake of this article, we will only look at short- and long-term storage.
Memento mori
In our short-term memory we hold everything we need to recall in the moment — and forget it once it's no longer needed. The sentence that you just read and the one that you are reading now are stored in short-term memory. You will probably have forgotten about it, when you finished this article, or already after the next paragraph. It's also limited and rather small when it comes to storage capacity. If we want our chat bot to feel less like a stranger and more like someone who actually pays attention, we need to give it something similar. For LLMs, this is called the context window. Let's add it to our chat bot.
import OpenAI from "openai";
import * as readline from "readline";
const client = new OpenAI();
const terminal = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const messages = [];
function main() {
terminal.question("You: ", async (input) => {
messages.push({ role: "user", content: input });
const completion = await client.chat.completions.create({
messages: messages,
model: "gpt-52-chat",
});
const reply = completion.choices[0].message.content;
messages.push({ role: "assistant", content: reply });
terminal.write(`Assistant: ${reply}\n`);
main();
});
}
main();Nice, with the context window the LLM remembers all of our messages and its responses. It also can store quite a bit of data. But just like our short-term memory, it's not endless either. We will look into how we can solve that in the next chapter. Let's first fix a potential amnesia issue.
Stay with me
One downside of the current implementation is that our chat bot will forget everything once you close the terminal. All messages are only stored in memory and terminating the process, gets rid of all messages. Instead of building a sophisticated session system — similar to what Open Claw uses — we will store all conversations in one single file and load it on startup.
import OpenAI from "openai";
import * as readline from "readline";
import * as fs from "fs";
const client = new OpenAI();
const terminal = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const memoryFilePath = "messages.jsonl";
function loadMessages() {
if (!fs.existsSync(memoryFilePath)) {
return [];
}
return fs
.readFileSync(memoryFilePath, "utf-8")
.split("\n")
.filter(Boolean)
.map(JSON.parse);
}
function appendMessage(message) {
fs.appendFileSync(memoryFilePath, JSON.stringify(message) + "\n");
}
const messages = loadMessages();
function main() {
terminal.question("You: ", async (input) => {
const userMessage = { role: "user", content: input };
messages.push(userMessage);
appendMessage(userMessage);
const completion = await client.chat.completions.create({
messages: messages,
model: "gpt-52-chat",
});
const reply = completion.choices[0].message.content;
const assistantMessage = { role: "assistant", content: reply };
messages.push(assistantMessage);
appendMessage(assistantMessage);
terminal.write(`Assistant: ${reply}\n`);
main();
});
}
main();Great, we have a basic working conversation memory. Every single interaction of our conversation is now persisted as JSON Lines in the messages.jsonl on disk. Once we restart the terminal, the chat bot will have the full conversation present.
We use JSON Lines instead of a regular JSON array because it lets us append new messages without reading and rewriting the entire file.
Context compaction
While the context windows of recent models got bigger over time, they are not endless. Just as our short-term memory starts to forget things when someone just throws stuff at us — do you remember the first sentence from the Memento Mori paragraph? — an LLM will simply refuse to accept more input once its context window is full. This is an issue that all agents with long-running conversations have and one solution is context compaction: summarize old messages and keep the most recent ones. Our implementation is again simple, for illustration purposes.
import OpenAI from "openai";
import * as readline from "readline";
import * as fs from "fs";
const client = new OpenAI();
const terminal = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const memoryFilePath = "messages.jsonl";
function loadMessages() {
if (!fs.existsSync(memoryFilePath)) {
return [];
}
return fs
.readFileSync(memoryFilePath, "utf-8")
.split("\n")
.filter(Boolean)
.map(JSON.parse);
}
function appendMessage(message) {
fs.appendFileSync(memoryFilePath, JSON.stringify(message) + "\n");
}
function saveMessages(messages) {
fs.writeFileSync(
memoryFilePath,
messages.map(JSON.stringify).join("\n") + "\n",
);
}
async function compactMessages(messages) {
// Naively estimate token count based on character length, assuming 4 characters per token on average
const estimatedTokens =
messages.reduce(
(count, message) => count + JSON.stringify(message).length,
0,
) / 4;
// Current models typically support 128K to 200K tokens, so we can set a threshold to trigger compaction before hitting limits
if (estimatedTokens < 105_000) {
return messages;
}
terminal.write("Compacting conversation ...");
// Split messages into two halves: older messages to summarize, recent messages to keep intact
const split = Math.floor(messages.length / 2);
const old = messages.slice(0, split);
const recent = messages.slice(split);
const summary = await client.chat.completions.create({
messages: [
{
role: "user",
content: `Condense the following conversation into a brief summary.
Focus on:
- Who the user is and what they care about
- Decisions made and their rationale
- Outstanding tasks or unresolved questions
- Any specific details (names, numbers, paths) that would be lost without explicit mention
Drop all conversational filler.
${JSON.stringify(old, null, 2)}`,
},
],
model: "gpt-52-chat",
});
const compacted = [
{
role: "user",
content: `## Previous Conversation Summary\n ${summary.choices[0].message.content}`,
},
...recent,
];
saveMessages(compacted);
return compacted;
}
let messages = loadMessages();
function main() {
terminal.question("You: ", async (input) => {
messages = await compactMessages(messages);
const userMessage = { role: "user", content: input };
messages.push(userMessage);
appendMessage(userMessage);
const completion = await client.chat.completions.create({
messages: messages,
model: "gpt-52-chat",
});
const reply = completion.choices[0].message.content;
const assistantMessage = { role: "assistant", content: reply };
messages.push(assistantMessage);
appendMessage(assistantMessage);
terminal.write(`Assistant: ${reply}\n`);
main();
});
}
main();We added compactMessages which takes care of checking if we are still in the context window limit. Once we start exceeding it, we will create a summary with the first half of your conversation. Together with the most recent messages this will be our new short-term memory, or context window. We also had to add saveMessages as we have to overwrite message.jsonl and we check and compact on every turn.
Total Recall
How would you feel, if your brain put everything directly into long-term memory once you experienced something? I'd argue things would get overwhelming and intense pretty quickly. When it comes to long-term memory, storing a memory becomes a bit more complicated. While short-term is mostly automatic, long-term is more selective, and there are a few mechanisms that determine what we deem worthy of storing. It's a bouncer deciding who is coming in and who's not. Our chat bot has to become a bouncer, deciding for itself which information to keep and which to discard, and since long-term memories are not automatically part of the context window, it also needs the ability to recall said information. In short: we need to give it tools.
But tools deserve their own chapter, and so does the question of how an agent decides what to remember and when to recall it. We'll tackle both in Part 3.
Until next time.
P.S.: If you have questions in the meantime, feel free to reach out. I'm always happy to help and support where I can!