The Agent File Pattern

Why monolithic system prompts fail, and the framework-agnostic split that fixes them: SOUL (identity), AGENTS (behavior), HEARTBEAT (proactivity) — plus USER and MEMORY.

agents patterns ~14 min read
Contents
  1. 1.The monolithic prompt problem
  2. 2.The pattern: split by concern
  3. 3.Why splitting wins
  4. 4.SOUL.md — identity
  5. 5.AGENTS.md — behavior
  6. 6.HEARTBEAT.md — proactivity
  7. 7.USER.md + MEMORY.md
  8. 8.Loading order & token budget
  9. 9.Adapting to your framework
  10. 10.Worked refactor
  11. 11.Anti-patterns
  12. 12.Rules & pitfalls

1. The monolithic prompt problem #

The default way to configure an LLM agent is a single large system prompt. Everything goes in: who the agent is, what it does, what tools it can use, what tone it takes, what it does on a schedule, what it remembers about you, what it should refuse. One blob. Often thousands of tokens. Usually called system.md or shoved into the system field of a JSON config.

This works until it doesn't. The failure modes are predictable:

All of these are one underlying problem: different concerns live in one file because the framework made that the path of least resistance.

The core claim Agent instructions have at least three distinct concerns — identity, behavior, and proactivity — plus two common side concerns (user context and curated memory). Each wants its own file. This guide argues for a specific split and names each file.

2. The pattern: split by concern #

Five files. Each answers one question:

SOUL.md

Who are you? Persona, values, tone, boundaries.

Stable across workflows.

AGENTS.md

What do you do? SOPs, tool usage, response format, rules of engagement.

Evolves with your system.

HEARTBEAT.md

What do you do when no one prompts you? Cron rules, cadence, anti-spam.

Optional — only if scheduled.

USER.md

Who are you talking to? Name, role, preferences, constraints.

Human-editable context.

MEMORY.md

What do you remember? Curated, durable facts — not a transcript dump.

Agent-managed, human-auditable.

Loaded in this order, these files reconstruct on every turn what a single monolithic prompt used to cram into one string — but now with separable units of change.

3. Why splitting wins #

You can edit one concern without disturbing others

Want a terser voice? Edit SOUL.md. You are guaranteed not to accidentally rewrite a tool-usage rule, because tool usage doesn't live there. Want to add a new SOP? Edit AGENTS.md. Your persona is untouched.

You can share files across agents

Two agents can reuse the same AGENTS.md (same workflows) with different SOUL.md files (different personas). When the workflow changes, you edit one file and both agents pick it up. Before the split, you maintained two drifting monoliths.

You get natural change propagation rules

EditedTakes effect
SOUL.md, AGENTS.md, HEARTBEAT.md, USER.mdNext turn — they're re-read from disk
MEMORY.mdNext turn, but the agent usually manages it
Tool registrations, model configFramework restart

You stop confusing personality with operations

This is the most underrated benefit. When everything is in one file, a sentence like "be direct" can mean "don't waffle in responses" (persona) or "prefer imperative commit messages" (operational). The split forces you to decide which file it goes in, and the decision itself clarifies the rule.

4. SOUL.md — identity #

Who the agent is when no task is pending.

What goes in

What does not

# You are Atlas

You are a focused technical assistant. You prefer terse, concrete
answers over long explanations. You flag uncertainty instead of
guessing. You dislike filler: no “great question”, no apologizing,
no sign-offs. Direct is not rude.

## Values

- Clarity over politeness
- Concrete over abstract
- Short answers unless depth is requested
- Never fabricate file paths, commands, or API shapes

## Voice

- First person, confident, mildly understated
- No emojis unless the user uses them first
- Code in fenced blocks, explanations in prose, never both mixed
The test If you can rewrite SOUL.md without mentioning any specific tool, system, or task, you're doing it right. Identity is persistent; systems come and go.

5. AGENTS.md — behavior #

What the agent does when a task shows up.

What goes in

What does not

# Operating Rules

## Tool Usage

- Read before you claim. Use the read tool to verify a file exists
  before referencing it.
- Search before you read a whole folder. Keyword search is cheap;
  dumping a directory is expensive.
- If a tool fails twice, stop and report. Do not improvise around
  repeated failures.

## Response Format

- Default to 3-5 sentences for prose answers.
- Use bullet lists for anything enumerable.
- Wrap shell commands in fenced code blocks.
- When uncertain, say so explicitly. Do not hedge with
  weasel-words.

## Safety Rails

- Never run destructive commands (rm, force-push, drop table,
  kill) without a clear user directive.
- Confirm before writing to shared systems (git remotes, deploys,
  databases).

## Escalation

If a request falls outside your area, say so clearly and suggest
where it should go. Do not attempt tasks you are not configured
for.

Notice what's not here: no persona, no "Atlas is a focused technical assistant". Reading AGENTS.md in isolation should feel like reading a job description. Reading SOUL.md in isolation should feel like reading a personality profile.

6. HEARTBEAT.md — proactivity #

What the agent does when nobody prompts it.

Include HEARTBEAT.md only if the agent actually runs scheduled or cron-driven turns. If your agent is purely reactive (user messages โ†’ agent responds), skip this file.

What goes in

# Proactive Behavior

You are invoked periodically by a scheduler. These rules govern
what you do when there is no user message.

## Cadence

- Morning check: 08:00 local. Ask about the day's priorities
  if the user hasn't volunteered them.
- Evening check: 19:00 local. Summarize the day and prompt
  for logging anything missed.
- Quiet hours: no proactive messages between 22:00 and 07:00.

## Trigger Conditions (act only if)

- The user has opened the app in the last 24 hours.
- The scheduled slot has not already fired today.
- There is something concrete to say — never generic check-ins.

## Anti-Spam

- Maximum 3 proactive messages per day across all slots.
- Never repeat yourself within 24 hours.
- If the user hasn't responded to the last 2 proactive messages,
  pause proactive behavior for 48 hours.

## Message Style

- Lead with the reason you're reaching out.
- Keep it under 3 sentences.
- End with a question or an explicit “no reply needed”.
The HEARTBEAT trap Without explicit anti-spam rules, scheduled agents degrade fast into noise. A good heartbeat file spends more words on when NOT to speak than on when TO speak. Model it as "what would a respectful colleague do?", not "what could I check in about?".

7. USER.md + MEMORY.md #

USER.md — who you're serving

A short file with facts about the human. Kept up-to-date by the user or by the agent after explicit confirmation.

# User

- Name: Alex
- Role: Backend engineer
- Timezone: America/New_York
- Prefers: terse responses, no emojis, commands as code blocks
- Current focus: migrating auth service to new database
- Off-limits: personal tasks, scheduling, calendar

USER.md keeps the agent grounded. Without it, the agent re-derives user context every conversation or (worse) assumes. With it, the first sentence of every response can be tuned to the reader.

MEMORY.md — curated, not comprehensive

Long-term facts the agent has chosen to remember. Not a transcript. Not a log. A curated list — the agent earns entries by deciding they're worth keeping, and the human can audit and prune them.

# Memory

## Preferences

- User prefers `jq` over `grep` for JSON
- User has rejected emojis in code comments twice — stop suggesting
- User's deploy workflow uses rsync + systemd, not Docker

## Active Projects

- Auth migration to Postgres 16 — in progress, blocker on schema review
- Internal dashboard rewrite — paused since last Thursday

## Decisions Made

- Chose TypeScript over Go for the new service (2026-02-14)
- Decided NOT to adopt GraphQL (2026-03-02)

The key test: a new person reading MEMORY.md should be able to infer what this person is working on and how they like to work. That's what "curated" means — selected for future usefulness, not archived for completeness.

8. Loading order & token budget #

On every turn, the agent's context starts with these files concatenated in order:

  1. SOUL.md — sets voice before anything else
  2. USER.md — grounds in who's being served
  3. AGENTS.md — loads rules of engagement
  4. HEARTBEAT.md — only when triggered by the scheduler
  5. MEMORY.md — adds durable context
  6. Conversation history — the actual turn-by-turn exchange

Typical costs for a mature agent:

FileTypical tokensScaling strategy
SOUL.md200-400Prune ruthlessly. Persona doesn't need verbosity.
USER.md100-200Stays small by design.
AGENTS.md500-1500Move deep reference into companion files loaded on demand.
HEARTBEAT.md200-400Keep cadence rules tight.
MEMORY.md300-800Agent self-prunes old entries; enforce size cap.

Total baseline: 1,300 – 3,300 tokens of instructions. For context, that's less than a single 8K GPT-4 call and trivial on modern long-context models. The cost is real but bounded.

When files get too big

Signs AGENTS.md has outgrown the pattern:

The fix is usually to move deep detail into a companion file and have AGENTS.md reference it: "For the full deployment procedure, see docs/DEPLOY.md; read only when the user asks to deploy". The skill is teaching the agent when to pull in more context, not loading everything up front.

9. Adapting to your framework #

The pattern is framework-agnostic. What changes is where the files live and how they get concatenated into the prompt.

Claude-based systems (Claude Code, Anthropic SDK)

Simplest case. The five files live in a directory; a thin wrapper reads them and prepends them to the system parameter of the Messages API call. Some agent frameworks built on Claude (OpenClaw and similar) do this natively — the files are injected on every turn without any glue code.

OpenAI Assistants / Responses API

The instructions field is your single system prompt. Concatenate the five files at agent-build time. Treat MEMORY.md as mutable — update the assistant's instructions when memory changes, or store memory in a thread's metadata and inject at run-time.

LangChain / CrewAI / custom orchestrators

These frameworks usually expose a system_message or agent configuration object. Build a helper that reads the five files, joins them with delimiters, and returns the string. Everything upstream of the LLM stays generic.

Local inference (Ollama, llama.cpp)

Same story — build the system prompt from five files at runtime. If you're hot-swapping models, the pattern is even more valuable: persona stays in SOUL.md and ports to the new model without editing.

10. Worked refactor #

Taking a monolithic prompt and splitting it. A compressed before-and-after.

Before — one file

# System Prompt

You are Atlas, a focused coding assistant. You are terse, direct,
and dislike filler. Help the user with their Node.js backend project.

Always read files before claiming they exist. Use grep to search.
Do not run destructive commands without approval. Write code in
TypeScript unless told otherwise.

Every morning at 8am, greet the user and ask about priorities.
Send an evening summary at 7pm. Don't send more than 3 proactive
messages per day.

The user is Alex, a backend engineer in New York who prefers no
emojis. Alex is currently migrating auth to Postgres.

Alex has decided NOT to adopt GraphQL (March 2026). Alex prefers
rsync + systemd over Docker.

Everything jammed together. Editing "be less terse" means re-reading the whole thing to find the identity bit. Adding a new SOP means editing a file that also contains the user's timezone.

After — five files

# SOUL.md
You are Atlas. Focused, terse, direct. You dislike filler —
no “great question”, no apologies, no sign-offs. Direct is not rude.

Values: clarity over politeness, concrete over abstract.

# USER.md
- Name: Alex
- Role: Backend engineer, New York
- Prefers: no emojis, terse responses
- Current focus: migrating auth to Postgres

# AGENTS.md
## Tool Usage
- Read files before claiming they exist.
- Use search before reading whole folders.
- Never run destructive commands without approval.

## Defaults
- Write code in TypeScript unless told otherwise.

# HEARTBEAT.md
## Cadence
- Morning greeting: 08:00, ask about priorities.
- Evening summary: 19:00.
- Max 3 proactive messages/day. Quiet hours 22:00-07:00.

# MEMORY.md
## Decisions
- NOT adopting GraphQL (2026-03-02)

## Preferences
- Prefers rsync + systemd over Docker

Same information, separate concerns. Now:

11. Anti-patterns #

Mixing persona into SOPs

AGENTS.md lines like "Be warm and empathetic when the user is frustrated" are persona bleed. Move the emotional posture into SOUL.md (the agent is warm) or keep it as an operational rule ("when the user's tone signals frustration, acknowledge before answering") — but not mid-SOP where it confuses both concerns.

HEARTBEAT with no cadence rules

A proactive agent without explicit anti-spam rules will degrade into noise within a week. If you can't write HEARTBEAT.md with more words spent on when not to speak than when to speak, you don't have a heartbeat policy — you have a timer.

MEMORY as transcript

MEMORY.md is not a log. It's curated. If every conversation dumps raw notes into memory, the file becomes a transcript with the agent's knowledge-retrieval costs scaling linearly. The rule: an entry earns its place by being useful in a future conversation.

USER.md as CRM

Don't turn USER.md into a full profile with every fact the agent has ever learned. Keep it to what's relevant now: role, focus, preferences, constraints. Historical facts belong in MEMORY.md.

The "one big file" regression

After a few months, some teams drift back toward a single file "for simplicity". This is a false economy — the simplicity is illusory because nothing actually got simpler; you just pushed complexity back inside one file. Resist.

12. Rules & pitfalls #

The deeper point The value of this pattern isn't the file names — it's the discipline of separating concerns in agent instructions. Call the files whatever you want. What matters is that identity, behavior, and proactivity are three different problems, and collapsing them into one file is a design decision, not a default.