1. The monolithic prompt problem #
The default way to configure an LLM agent is a single large system prompt. Everything goes in: who the agent is, what it does, what tools it can use, what tone it takes, what it does on a schedule, what it remembers about you, what it should refuse. One blob. Often thousands of tokens. Usually called system.md or shoved into the system field of a JSON config.
This works until it doesn't. The failure modes are predictable:
- You want to tweak the tone — say, make the agent terser — and end up re-reading 400 lines of SOPs looking for the personality bits.
- You add a new tool and realize you're editing a file that also contains the agent's childhood backstory. Every change is a diff across unrelated concerns.
- You want two agents that share the same workflows but have different personalities. You copy-paste the prompt, duplicate the SOP section, and now you have two drifting copies to maintain.
- The agent starts behaving weirdly on Tuesday. You bisect the prompt line-by-line because there's no natural unit of change.
- You add scheduled / cron behavior. Now the same prompt has to describe both "what to do when the user messages you" and "what to do when nobody is there". The two bleed into each other.
All of these are one underlying problem: different concerns live in one file because the framework made that the path of least resistance.
2. The pattern: split by concern #
Five files. Each answers one question:
SOUL.md
Who are you? Persona, values, tone, boundaries.
Stable across workflows.AGENTS.md
What do you do? SOPs, tool usage, response format, rules of engagement.
Evolves with your system.HEARTBEAT.md
What do you do when no one prompts you? Cron rules, cadence, anti-spam.
Optional — only if scheduled.USER.md
Who are you talking to? Name, role, preferences, constraints.
Human-editable context.MEMORY.md
What do you remember? Curated, durable facts — not a transcript dump.
Agent-managed, human-auditable.Loaded in this order, these files reconstruct on every turn what a single monolithic prompt used to cram into one string — but now with separable units of change.
3. Why splitting wins #
You can edit one concern without disturbing others
Want a terser voice? Edit SOUL.md. You are guaranteed not to accidentally rewrite a tool-usage rule, because tool usage doesn't live there. Want to add a new SOP? Edit AGENTS.md. Your persona is untouched.
You can share files across agents
Two agents can reuse the same AGENTS.md (same workflows) with different SOUL.md files (different personas). When the workflow changes, you edit one file and both agents pick it up. Before the split, you maintained two drifting monoliths.
You get natural change propagation rules
| Edited | Takes effect |
|---|---|
SOUL.md, AGENTS.md, HEARTBEAT.md, USER.md | Next turn — they're re-read from disk |
MEMORY.md | Next turn, but the agent usually manages it |
| Tool registrations, model config | Framework restart |
You stop confusing personality with operations
This is the most underrated benefit. When everything is in one file, a sentence like "be direct" can mean "don't waffle in responses" (persona) or "prefer imperative commit messages" (operational). The split forces you to decide which file it goes in, and the decision itself clarifies the rule.
4. SOUL.md — identity #
Who the agent is when no task is pending.
What goes in
- Persona / character / voice
- Core values (honesty, terseness, caution — whatever the agent should be biased toward)
- Tone markers (formal, playful, direct)
- Hard boundaries that aren't tool-specific (e.g., "never fabricate facts", "flag uncertainty")
What does not
- Tool names or API specifics
- Step-by-step workflows
- Schedule or cron rules
- User-specific details
# You are Atlas
You are a focused technical assistant. You prefer terse, concrete
answers over long explanations. You flag uncertainty instead of
guessing. You dislike filler: no “great question”, no apologizing,
no sign-offs. Direct is not rude.
## Values
- Clarity over politeness
- Concrete over abstract
- Short answers unless depth is requested
- Never fabricate file paths, commands, or API shapes
## Voice
- First person, confident, mildly understated
- No emojis unless the user uses them first
- Code in fenced blocks, explanations in prose, never both mixed
SOUL.md without mentioning any specific tool, system, or task, you're doing it right. Identity is persistent; systems come and go.
5. AGENTS.md — behavior #
What the agent does when a task shows up.
What goes in
- Standard operating procedures (SOPs) for recurring tasks
- Tool usage rules ("before claiming a file exists, read it with the read tool")
- Response format conventions (length, bullet style, when to use tables)
- Decision rules ("if tool X fails twice, stop and report")
- Safety rails ("never run destructive commands without explicit approval")
What does not
- Persona or voice rules (those go in
SOUL.md) - Scheduled / cron behavior (that goes in
HEARTBEAT.md) - Facts about the user (
USER.md)
# Operating Rules
## Tool Usage
- Read before you claim. Use the read tool to verify a file exists
before referencing it.
- Search before you read a whole folder. Keyword search is cheap;
dumping a directory is expensive.
- If a tool fails twice, stop and report. Do not improvise around
repeated failures.
## Response Format
- Default to 3-5 sentences for prose answers.
- Use bullet lists for anything enumerable.
- Wrap shell commands in fenced code blocks.
- When uncertain, say so explicitly. Do not hedge with
weasel-words.
## Safety Rails
- Never run destructive commands (rm, force-push, drop table,
kill) without a clear user directive.
- Confirm before writing to shared systems (git remotes, deploys,
databases).
## Escalation
If a request falls outside your area, say so clearly and suggest
where it should go. Do not attempt tasks you are not configured
for.
Notice what's not here: no persona, no "Atlas is a focused technical assistant". Reading AGENTS.md in isolation should feel like reading a job description. Reading SOUL.md in isolation should feel like reading a personality profile.
6. HEARTBEAT.md — proactivity #
What the agent does when nobody prompts it.
Include HEARTBEAT.md only if the agent actually runs scheduled or cron-driven turns. If your agent is purely reactive (user messages โ agent responds), skip this file.
What goes in
- Cadence rules (how often to act proactively, quiet hours)
- Trigger conditions (what counts as a reason to speak)
- Anti-spam rules (max messages per day, dedup logic)
- Scheduled workflow templates (morning check, evening summary, weekly review)
# Proactive Behavior
You are invoked periodically by a scheduler. These rules govern
what you do when there is no user message.
## Cadence
- Morning check: 08:00 local. Ask about the day's priorities
if the user hasn't volunteered them.
- Evening check: 19:00 local. Summarize the day and prompt
for logging anything missed.
- Quiet hours: no proactive messages between 22:00 and 07:00.
## Trigger Conditions (act only if)
- The user has opened the app in the last 24 hours.
- The scheduled slot has not already fired today.
- There is something concrete to say — never generic check-ins.
## Anti-Spam
- Maximum 3 proactive messages per day across all slots.
- Never repeat yourself within 24 hours.
- If the user hasn't responded to the last 2 proactive messages,
pause proactive behavior for 48 hours.
## Message Style
- Lead with the reason you're reaching out.
- Keep it under 3 sentences.
- End with a question or an explicit “no reply needed”.
7. USER.md + MEMORY.md #
USER.md — who you're serving
A short file with facts about the human. Kept up-to-date by the user or by the agent after explicit confirmation.
# User
- Name: Alex
- Role: Backend engineer
- Timezone: America/New_York
- Prefers: terse responses, no emojis, commands as code blocks
- Current focus: migrating auth service to new database
- Off-limits: personal tasks, scheduling, calendar
USER.md keeps the agent grounded. Without it, the agent re-derives user context every conversation or (worse) assumes. With it, the first sentence of every response can be tuned to the reader.
MEMORY.md — curated, not comprehensive
Long-term facts the agent has chosen to remember. Not a transcript. Not a log. A curated list — the agent earns entries by deciding they're worth keeping, and the human can audit and prune them.
# Memory
## Preferences
- User prefers `jq` over `grep` for JSON
- User has rejected emojis in code comments twice — stop suggesting
- User's deploy workflow uses rsync + systemd, not Docker
## Active Projects
- Auth migration to Postgres 16 — in progress, blocker on schema review
- Internal dashboard rewrite — paused since last Thursday
## Decisions Made
- Chose TypeScript over Go for the new service (2026-02-14)
- Decided NOT to adopt GraphQL (2026-03-02)
The key test: a new person reading MEMORY.md should be able to infer what this person is working on and how they like to work. That's what "curated" means — selected for future usefulness, not archived for completeness.
8. Loading order & token budget #
On every turn, the agent's context starts with these files concatenated in order:
SOUL.md— sets voice before anything elseUSER.md— grounds in who's being servedAGENTS.md— loads rules of engagementHEARTBEAT.md— only when triggered by the schedulerMEMORY.md— adds durable context- Conversation history — the actual turn-by-turn exchange
Typical costs for a mature agent:
| File | Typical tokens | Scaling strategy |
|---|---|---|
SOUL.md | 200-400 | Prune ruthlessly. Persona doesn't need verbosity. |
USER.md | 100-200 | Stays small by design. |
AGENTS.md | 500-1500 | Move deep reference into companion files loaded on demand. |
HEARTBEAT.md | 200-400 | Keep cadence rules tight. |
MEMORY.md | 300-800 | Agent self-prunes old entries; enforce size cap. |
Total baseline: 1,300 – 3,300 tokens of instructions. For context, that's less than a single 8K GPT-4 call and trivial on modern long-context models. The cost is real but bounded.
When files get too big
Signs AGENTS.md has outgrown the pattern:
- You're using headers to hide structure — tables of contents inside a single file.
- You're tempted to write "if you're handling scenario X, skip ahead to section 4".
- Different parts of the file contradict each other because you forgot what was already there.
The fix is usually to move deep detail into a companion file and have AGENTS.md reference it: "For the full deployment procedure, see docs/DEPLOY.md; read only when the user asks to deploy". The skill is teaching the agent when to pull in more context, not loading everything up front.
9. Adapting to your framework #
The pattern is framework-agnostic. What changes is where the files live and how they get concatenated into the prompt.
Claude-based systems (Claude Code, Anthropic SDK)
Simplest case. The five files live in a directory; a thin wrapper reads them and prepends them to the system parameter of the Messages API call. Some agent frameworks built on Claude (OpenClaw and similar) do this natively — the files are injected on every turn without any glue code.
OpenAI Assistants / Responses API
The instructions field is your single system prompt. Concatenate the five files at agent-build time. Treat MEMORY.md as mutable — update the assistant's instructions when memory changes, or store memory in a thread's metadata and inject at run-time.
LangChain / CrewAI / custom orchestrators
These frameworks usually expose a system_message or agent configuration object. Build a helper that reads the five files, joins them with delimiters, and returns the string. Everything upstream of the LLM stays generic.
Local inference (Ollama, llama.cpp)
Same story — build the system prompt from five files at runtime. If you're hot-swapping models, the pattern is even more valuable: persona stays in SOUL.md and ports to the new model without editing.
10. Worked refactor #
Taking a monolithic prompt and splitting it. A compressed before-and-after.
Before — one file
# System Prompt
You are Atlas, a focused coding assistant. You are terse, direct,
and dislike filler. Help the user with their Node.js backend project.
Always read files before claiming they exist. Use grep to search.
Do not run destructive commands without approval. Write code in
TypeScript unless told otherwise.
Every morning at 8am, greet the user and ask about priorities.
Send an evening summary at 7pm. Don't send more than 3 proactive
messages per day.
The user is Alex, a backend engineer in New York who prefers no
emojis. Alex is currently migrating auth to Postgres.
Alex has decided NOT to adopt GraphQL (March 2026). Alex prefers
rsync + systemd over Docker.
Everything jammed together. Editing "be less terse" means re-reading the whole thing to find the identity bit. Adding a new SOP means editing a file that also contains the user's timezone.
After — five files
# SOUL.md
You are Atlas. Focused, terse, direct. You dislike filler —
no “great question”, no apologies, no sign-offs. Direct is not rude.
Values: clarity over politeness, concrete over abstract.
# USER.md
- Name: Alex
- Role: Backend engineer, New York
- Prefers: no emojis, terse responses
- Current focus: migrating auth to Postgres
# AGENTS.md
## Tool Usage
- Read files before claiming they exist.
- Use search before reading whole folders.
- Never run destructive commands without approval.
## Defaults
- Write code in TypeScript unless told otherwise.
# HEARTBEAT.md
## Cadence
- Morning greeting: 08:00, ask about priorities.
- Evening summary: 19:00.
- Max 3 proactive messages/day. Quiet hours 22:00-07:00.
# MEMORY.md
## Decisions
- NOT adopting GraphQL (2026-03-02)
## Preferences
- Prefers rsync + systemd over Docker
Same information, separate concerns. Now:
- Switching Atlas to a warmer voice touches only
SOUL.md. - Adding a new SOP touches only
AGENTS.md. - Disabling proactive messages during vacation touches only
HEARTBEAT.md. - Updating the user's project focus touches only
USER.md.
11. Anti-patterns #
Mixing persona into SOPs
AGENTS.md lines like "Be warm and empathetic when the user is frustrated" are persona bleed. Move the emotional posture into SOUL.md (the agent is warm) or keep it as an operational rule ("when the user's tone signals frustration, acknowledge before answering") — but not mid-SOP where it confuses both concerns.
HEARTBEAT with no cadence rules
A proactive agent without explicit anti-spam rules will degrade into noise within a week. If you can't write HEARTBEAT.md with more words spent on when not to speak than when to speak, you don't have a heartbeat policy — you have a timer.
MEMORY as transcript
MEMORY.md is not a log. It's curated. If every conversation dumps raw notes into memory, the file becomes a transcript with the agent's knowledge-retrieval costs scaling linearly. The rule: an entry earns its place by being useful in a future conversation.
USER.md as CRM
Don't turn USER.md into a full profile with every fact the agent has ever learned. Keep it to what's relevant now: role, focus, preferences, constraints. Historical facts belong in MEMORY.md.
The "one big file" regression
After a few months, some teams drift back toward a single file "for simplicity". This is a false economy — the simplicity is illusory because nothing actually got simpler; you just pushed complexity back inside one file. Resist.
12. Rules & pitfalls #
- One concern per file. Identity, behavior, proactivity, user, memory. Each in its own file.
- Write SOUL.md such that it references no specific tool or system. That's the test.
- AGENTS.md reads like a job description, not a biography.
- Include HEARTBEAT.md only when needed — unused files still cost tokens.
- MEMORY.md is curated, not archived. Prune ruthlessly.
- When a file exceeds ~1500 tokens, move detail into companion files and reference them.
- Share AGENTS.md across agents that do similar work — one source of truth.
- Put anti-spam rules in HEARTBEAT.md first, action rules second.
- Review USER.md monthly. Stale user context is worse than none.
- If you can't tell which file a rule belongs in, the rule is probably overloaded. Split it.