Building a Knowledge Vault in Obsidian

This guide documents the structure, conventions, and workflows behind a vault that scales to 1,000+ notes, stays useful for years, and works seamlessly with LLM agents via CLI. All personal identifiers have been stripped. Adapt the examples to your own domain.

▼ The Two-Audience Principle #

Every design decision in this vault serves two readers simultaneously:

Reader	What they need
You (scanning on a phone at a coffee shop)	Answer on the first line. Tables, not walls of text. Bold the thing that matters.
An AI agent (parsing with CLI access)	Queryable frontmatter. Wikilinks that form a graph. Tags as faceted filters. Predictable structure.

If a formatting choice helps one audience but hurts the other, redesign it until it helps both. This constraint is the foundation of everything below.

💡

The Litmus Test

Before adding anything to a note, ask: Can a busy human find this in 3 seconds? Can an agent extract this programmatically? If either answer is no, restructure.

▼ Folder Architecture #

Top-level folders are numbered for deterministic sort order. The numbers are a UX affordance, not semantic.

00-Dashboard/       Command center, Maps of Content, active overviews
01-Daily/           Daily notes (YYYY-MM-DD.md)
02-Projects/        One folder per project, with subfolders
03-People/          Contact pages with backlinks to projects
04-Domain/          Your primary subject (Research, Strategies, Data Sources)
05-Infrastructure/  Servers, services, hardware, incidents
06-Meetings/        Meeting notes with attendee links
07-Reference/       Long-lived guides, how-tos, education
08-AI/              Agent configs, prompt libraries, skill definitions
09-AI-Context/      Tiny fact-dense index files for agents
10-Engineering/     Architecture docs, patterns, runbooks
data/               Large archives (CSVs, database dumps)
Templates/          Note templates

ℹ️

Why These Folders Exist

00-Dashboard/ — Entry points, not notes. Curated indexes (Dataview queries or hand-maintained link lists) that surface what matters today. Essential once you pass ~100 notes and start forgetting what you have.

02-Projects/ — Projects accumulate artifacts: features, fixes, meeting notes, postmortems. Colocate them. Use subfolders (fixes/, Features/, Research/) once you have 3+ files of a type.

09-AI-Context/ — Small, high-density context files (projects.md, infrastructure.md, research-queue.md). An agent loads this entire folder at session start without blowing its context window, then drills into specific notes as needed.

▼ Filing Decision Tree #

Core rule: colocate with the subject, not by note-type. A research note about a specific project goes in that project's folder, even if it is research. A hardware troubleshooting writeup goes with the hardware, not in a generic Reference bucket. 07-Reference/ is reserved for cross-cutting material: business, legal, patterns that span multiple projects.

Apply in order. Stop at the first match:

#	Question	Target
1	About one specific project?	`02-Projects/<Project>/<subfolder>/`
2	About one piece of hardware or service?	`05-Infrastructure/<area>/<thing>/`
3	About one person?	`03-People/`
4	Research tied to one strategy or signal?	`04-Domain/Research\|Signals/`
5	Meeting?	`06-Meetings/`
6	Daily log?	`01-Daily/`
7	AI agent or skill config?	`08-AI/`
8	Fact-dense index for agents?	`09-AI-Context/` (max ~50 lines, facts only)
9	Engineering patterns spanning projects?	`10-Engineering/`
10	None of the above. Genuinely cross-cutting.	`07-Reference/<subfolder>/` (prefer a subfolder over root)

💡

The Wikilink Self-Check

Before writing, ask: if the note's primary wikilinks all point into one folder, does the note belong in that folder too? If yes, file it there. This one check catches most misfilings.

Common misfilings to avoid:

Hardware troubleshooting research goes under 05-Infrastructure/Hardware/<device>/, not 07-Reference/
Single-project postmortems go under 02-Projects/<Project>/, not a global Lessons/ folder (those are only for cross-project learnings)
Signal validation tied to one strategy goes under 04-Domain/Signals/, not 07-Reference/

If an AI agent or a classifier hook suggests 07-Reference/ but the note is clearly about one subject, override the suggestion and file in the subject folder. The tree beats the hint.

▼ File Naming #

Type	Format	Example
Daily note	`YYYY-MM-DD.md`	`2026-04-16.md`
Meeting	`YYYY-MM-DD Topic.md`	`2026-04-14 Q2 Planning.md`
Research	Descriptive or `Topic-YYYY-MM-DD.md`	`VPIN-Research-Report.md`
Person	Title Case	`Jane Doe.md`
Incident	`System-Type-YYYY-MM-DD.md`	`VPS-SSH-Incident-2026-04-16.md`

Rules: Title Case for notes. Hyphens over spaces for programmatic files. ISO dates always (YYYY-MM-DD). Filename matches the # Title inside.

▼ Frontmatter #

Every note opens with YAML frontmatter. This is what makes your vault queryable — Dataview reads it, agents parse it, and you filter by it.

Minimum

---
title: Note Title
tags: [topic, type, status]
created: 2026-04-16
updated: 2026-04-16
---

Extended (for research)

---
title: Research Note Title
tags: [domain, research, validated]
created: 2026-04-16
updated: 2026-04-16
status: validated
confidence: HIGH
---

Queryable Fields

Field	Values
`status`	`active` `archived` `draft` `blocked` `validated` `dead`
`confidence`	`HIGH` `MEDIUM` `LOW`
`tags`	domain + type + status (keep under 50 active tags)
`created` / `updated`	`YYYY-MM-DD`

⚠️

Discipline

Update updated: on every edit. Never delete notes — set status: archived. Resist one-off tags.

▼ The Note Skeleton #

Every non-trivial note follows this anatomy:

---
{frontmatter}
---

# Title

> **One-line summary — the bottom line a busy reader needs.**

---

## 1. Context

Brief setup. What is this, why does it matter.

## 2. Key Findings

Tables, data, evidence.

## 3. Implications

What to do with this information.

## 4. Next Steps

Actionable items.

---

## See Also
- [[Related Note 1]]
- [[Related Note 2]]

Three things make this work:

The blockquote summary — written last, read first. If someone reads one line, this is it.
Numbered sections — enable precise cross-references like [[Note#3. Implications]].
Horizontal rules (---) — visual breathing room between major sections.

▼ Formatting Toolkit #

Headers

#    Note title (one per note, matches filename)
##   Major sections
###  Subsections

Never skip levels (# directly to ###).

Callouts

The primary tool for information that must not be missed:

> [!warning] Critical Risk
> This approach has a known failure mode.

> [!tip] Key Insight
> The correlation holds only on the 1h timeframe.

> [!success] Validated
> 70 of 108 tests pass correction.

> [!failure] Dead
> Zero tests survive. Signal does not exist.

> [!note] Context
> Based on 30 days of data. Re-validate at 90.

Full list: note tip info success warning failure danger bug example quote abstract todo question

Tables

The default format for structured data. Bold the column that matters most.

| Item  | Verdict | Metric | Action          |
|-------|---------|--------|-----------------|
| **A** | PASS    | 0.368  | Use in prod     |
| B     | FAIL    | -0.357 | Do not use      |

Status Markers

In tables	In prose	Meaning
PASS	"validated"	Confirmed
FAIL	"dead"	Disproven
MIXED	"fragile"	Needs more data

Code Blocks

Always fenced, always with a language tag:

```bash
ssh server "systemctl status my-service"
```

```python
result = df["column"].rolling(100).mean()
```

Numbers

4x10^-12 (scientific) | 41,000 (commas) | 65.4% (one decimal) | 0.368 (three decimals)

Internal Links

[[WikiLinks]] for all internal references
Link on first mention only (not every occurrence)
Display text for clarity: [[Full-Report|the report]]
Section anchors: [[Report#Methodology]]

Diagrams

ASCII box diagrams are durable and render everywhere:

┌──────────┐     ┌──────────┐     ┌──────────┐
│  Source   │ ──> │  Engine  │ ──> │  Output  │
└──────────┘     └──────────┘     └──────────┘

▼ Note Types #

Daily Notes

Folder: 01-Daily/ | Filename: YYYY-MM-DD.md

Open it first thing in the morning. Use it as a running log: what you worked on, decisions made, blockers encountered, links to new notes. Close it at end of day with the blockquote summary.

Research Notes

Folder: 04-Domain/Research/ | Extra frontmatter: confidence: HIGH | MEDIUM | LOW

Required sections: Summary (blockquote verdict), Methodology, Results (tables), Verdict (pass/fail), Limitations, Files (for reproducibility).

Infrastructure Notes

Folder: 05-Infrastructure/

Must include: service name, location, management method (systemd, docker-compose), how to check status, how to restart, dependencies, ports.

Project Notes

Folder: 02-Projects/<Project>/

Root <Project>.md must include: status, owner, start date, one-paragraph overview, links to related services/repos/people, current priorities.

Incident Notes

Folder: 05-Infrastructure/Incidents/ | Filename: System-Type-YYYY-MM-DD.md

Sections: Symptom, Root Cause, Fix Applied, Verification, Lessons Learned.

▼ Tags #

Tags are a faceted filter layered on top of the folder structure. A note lives in one folder but matches many tags.

Category	Examples
Domain	`research` `infrastructure` `engineering` `business`
Type	`daily` `meeting` `reference` `strategy` `education`
Status	`active` `archived` `draft` `blocked` `validated` `dead`
Project	lowercase project slugs
Source	`notion` `external` `imported`

Keep under 50 active tags. Avoid deep nesting (#project/sub/phase). Review quarterly; merge or kill stragglers.

▼ Links and Maps of Content #

Wikilinks Build the Graph

Every person, project, and concept gets a [[WikiLink]] on first mention. This gives you automatic backlinks, a navigable graph, and fuzzy search that respects intent.

MOCs Are Curated Indexes

A Map of Content is a hand-maintained hub page that groups notes thematically:

# Projects MOC

> Hand-maintained index of all active and archived projects.

## Active
- [[Project A]] — description, status
- [[Project B]] — description, status

## Archived
- [[Project C]] — closed 2025-12, reason

MOCs beat folder listings because you control ordering, grouping, and annotations. At 100+ notes they become essential.

Dataview for Dynamic Indexes

The Dataview plugin lets you query frontmatter like a database:

```dataview
TABLE status, updated
FROM "02-Projects"
WHERE status = "active"
SORT updated DESC
```

▼ AI Agent Integration #

The CLAUDE.md Pattern

Place a CLAUDE.md at the vault root. This is the single entry point for any AI agent. Include:

Vault owner (name, timezone)
Folder structure with one-line descriptions
Filename conventions and content style guide
Note type requirements
Tag taxonomy
Key projects list

When an agent opens the vault, "read CLAUDE.md" is all the instruction it needs.

Two-Phase Retrieval

Agents should search before reading. Full-vault loads waste context.

1. Search    → vault-search "query" or grep
2. Browse    → .vault-index.md (if maintained)
3. Read      → specific note, only when needed

Budget: 3-5 files per turn unless the task demands more.

Auto-Memory

Agents can persist learnings across sessions in a dedicated memory folder (e.g., ~/.claude/projects/<vault>/memory/). Each memory is a small markdown file with type: user | feedback | project | reference. An index file lists them all. This is how agents accumulate context without bloating prompts.

▼ Multi-Agent Systems with Obsidian Memory #

The vault isn't just for you and one AI assistant. It becomes dramatically more powerful as the shared memory layer for multiple specialized agents — each with its own personality, domain expertise, and tools, all reading and writing to the same Obsidian-structured markdown files.

The Core Idea

Instead of one general-purpose AI that does everything, you run a fleet of specialists. Each agent has its own workspace with identity files, but they all share the vault as their knowledge base. The vault is the connective tissue.

┌─────────────────────────────────────────────┐
│         Agent Gateway / Orchestrator          │
│  ├─ Messaging (Telegram, Slack, Discord)     │
│  ├─ Memory (vector search over vault)         │
│  ├─ Scheduling (cron, heartbeats)             │
│  └─ Tool execution (CLI, API, web)            │
└──────┬──────────────────────────────────────┘
       │
       ├── Coordinator Agent
       │   Reads: vault MOCs, project notes
       │   Writes: daily notes, decision logs
       │
       ├── Infrastructure Agent
       │   Reads: infra notes, incident history
       │   Writes: incident reports, deploy logs
       │
       ├── Research Agent
       │   Reads: research notes, data sources
       │   Writes: analysis results, findings
       │
       └── [Your custom agents...]
           Each agent has its own SOUL.md
           All share the Obsidian vault as memory

Agent Workspace Structure

Each agent gets its own workspace directory with markdown files that define who it is and what it remembers:

File	Purpose
`SOUL.md`	Agent identity, personality, communication style, few-shot examples
`USER.md`	Context about who the agent is helping (preferences, goals, constraints)
`MEMORY.md`	Curated long-term memory (distilled insights, not raw logs)
`memory/YYYY-MM-DD.md`	Daily interaction logs (raw session data)
`memory/preferences.md`	Durable preferences that never decay
`HEARTBEAT.md`	Rules for proactive check-ins (when to reach out, what to check)
`AGENTS.md`	Operating manual: startup sequence, tracking rules, memory system

💡

Why Markdown for Agent Memory?

Markdown is human-readable, version-controlled with git, searchable with grep, and parseable by any LLM. Unlike a database, you can open an agent's memory in Obsidian and read what it knows. When something goes wrong, you debug by reading files, not querying opaque embeddings.

The SOUL.md Pattern

The most important file. It defines who the agent IS — not what it does, but how it thinks and talks. Write it in second person.

# SOUL — [Agent Name]

You are [Name] — a [role description].

## Communication Style
- Direct. Short sentences. No filler.
- Numbers-forward when reporting data.
- Quick confirmations, not essays.

## Attitude by Context
When someone asks for help: [how you respond]
When something breaks: [how you respond]
When someone does great work: [how you respond]

## Few-Shot Examples
User: [example input]
Agent: [example response in character]

User: [another input]
Agent: [response showing personality]

How Agents Connect to the Vault

Each agent's workspace is its private memory. The Obsidian vault is the shared knowledge base. The connection works through file access:

# Agent reads vault for context
~/agents/infra/          ← agent workspace (private)
~/vault/05-Infrastructure/ ← vault section (shared)

# Agent writes back to vault
Agent creates: ~/vault/05-Infrastructure/Incidents/API-Outage-2026-04-17.md
Agent updates: ~/vault/01-Daily/2026-04-17.md (adds incident summary)

The vault's frontmatter, wikilinks, and folder structure mean agents can search, filter, and cross-reference just like a human browsing in Obsidian.

Agent Types

Organize agents into tiers by responsibility:

Tier	Examples	Purpose
Core	Coordinator, Infrastructure, Research	Always-on, mission-critical
Extended	Marketing, Analytics, Creative	On-demand specialists
Lifestyle	Fitness coach, Reading tracker, Journal companion	Personal agents with personality

Automated Check-ins (Heartbeats)

Agents can proactively reach out at scheduled times using cron-triggered heartbeats. The HEARTBEAT.md file defines the rules:

# Example: Infrastructure agent heartbeat

## Morning Check (08:00)
1. Read yesterday's incident notes from vault
2. Check service status
3. If anything is degraded: alert with summary
4. If all clear: HEARTBEAT_OK (stay silent)

## Evening Check (18:00)
1. Summarize today's deployments and incidents
2. Flag any unresolved issues
3. Write daily summary to vault

Example Prompts by Agent Role

The real power of a multi-agent system is talking to the right specialist. Here are example prompts organized by agent role to show how each one thinks differently.

Coordinator Agent

The hub. Routes tasks, delegates to specialists, maintains the big picture.

# Status check
"What's the current state of all active projects? Summarize each in one line."

# Delegation
"Ask the research agent to pull the latest data on X, then have the writer agent draft a summary."

# Triage
"Three things broke overnight. Prioritize by business impact and assign to the right agent."

Infrastructure Agent

Monitors servers, services, deployments. Thinks in uptime, logs, and systemd.

# Health check
"Run a full health check on the VPS. Report service status, disk usage, and any errors in the last 24h."

# Incident response
"SSH is timing out to the production server. Diagnose whether it's a network issue or fail2ban."

# Deployment
"Deploy the latest changes from main to production. Verify the service restarts cleanly."

Research Agent

Deep analysis, data validation, literature review. Thinks in data and evidence.

# Hypothesis testing
"Test whether users who complete onboarding in under 5 minutes have higher 30-day retention. Report effect size and significance."

# Literature review
"Find recent papers on retrieval-augmented generation for domain-specific QA. Summarize the top 3."

# Comparison study
"Compare PostgreSQL vs DynamoDB for our read-heavy workload. Benchmark latency, cost, and ops burden."

Lifestyle Agent (e.g., Fitness Coach)

Personal agents with strong personality. Tracks habits, holds you accountable.

# Workout logging
"Did legs today. Goblet squats 4x8 @ 53lb, RDLs 3x10 @ 50lb, split squats 3x10 each @ 25lb"

# Quick macro tracking
"40p"  # shorthand: logs 40g protein, shows running tally

# Morning readiness
"4 3 3 4"  # sleep/soreness/energy/stress scores → agent adjusts workout intensity

💡

Prompt Design Principles

Be specific about output format. "Report as a table" or "Keep it under 200 words" prevents agents from over-generating.

Give context, not instructions. Instead of "Use pandas to...", say "Analyze this data for..." and let the agent choose its tools.

One task per message. Multi-part requests get better results when broken into sequential messages.

Reference prior work. "Using the same methodology as last week's retention study..." activates agent memory for consistency.

Walk-Through: Building a Fitness Agent

Want to see this pattern in action? The full step-by-step tutorial builds a fitness coaching agent from scratch — SOUL.md personality, USER.md context, heartbeat check-ins, macro tracking, streak accountability, and a dedicated Obsidian vault for permanent records.

Step-by-Step Tutorial

🏋️ Building a Fitness Agent: CoachPulse Tutorial →

▼ Templates #

Install the Templates core plugin. Keep templates in Templates/.

Daily Note

---
title: '{{date:YYYY-MM-DD}}'
tags: [daily]
created: {{date:YYYY-MM-DD}}
updated: {{date:YYYY-MM-DD}}
---

# {{date:YYYY-MM-DD}}

> **Summary** (fill at end of day).

---

## What I worked on

## Decisions

## Blockers

## Next

---

## See Also

Research Note

---
title:
tags: [research, domain]
created: {{date:YYYY-MM-DD}}
updated: {{date:YYYY-MM-DD}}
status: draft
confidence: LOW
---

# Title

> **One-line verdict.**

---

## 1. Context

## 2. Methodology

## 3. Results

## 4. Verdict

## 5. Limitations

## 6. Files

---

## See Also

Meeting Note

---
title: '{{date:YYYY-MM-DD}} Topic'
tags: [meeting]
created: {{date:YYYY-MM-DD}}
updated: {{date:YYYY-MM-DD}}
attendees: [[]]
---

# {{date:YYYY-MM-DD}} — Topic

> **One-line outcome.**

---

## Agenda

## Notes

## Decisions

## Action Items

- [ ] Owner — task

---

## See Also

▼ Daily Workflow #

Morning     Open today's daily note. Plan the day.
Throughout  Capture in the daily note. Link out to project notes as you create them.
End of day  Write the blockquote summary. Close items. Note what ships tomorrow.
Weekly      Scan the last 7 dailies. Update project statuses. Archive stale drafts.
Quarterly   Tag audit. Kill unused tags. Review orphan notes. Test your backup restore.

Editing Existing Notes

Always update updated: in frontmatter
Don't delete content — move outdated sections to ## Archive at the bottom
Add > [!info] Updated YYYY-MM-DD for significant changes

Link Hygiene

Periodic backlink audits — orphan notes (zero backlinks) are candidates for linking or deletion
Fix broken wikilinks or create the stub note
Every new person, project, or concept gets a link on first mention

▼ Sync and Backup #

Method	Best for	Trade-off
iCloud	Apple ecosystem	Automatic, no Windows/Linux
Syncthing	Cross-platform	Requires always-on peer
Git	Version history	Manual commits, learning curve
Obsidian Sync (paid)	Official, encrypted	Monthly cost

💡

The Durable Pattern

Git for history + one live-sync method for cross-device access. Git gives you git log as an audit trail. Live sync gives you phone + desktop parity.

Git Setup

cd /path/to/vault
git init
echo ".obsidian/workspace*" >> .gitignore
echo ".trash/" >> .gitignore
echo ".DS_Store" >> .gitignore
git add .
git commit -m "Initial vault commit"

Automate daily snapshots with a cron job:

#!/usr/bin/env bash
cd /path/to/vault
git add -A
git diff --staged --quiet || git commit -m "Daily vault snapshot $(date +%Y-%m-%d)"

Backup

Beyond sync, keep a cold backup: Time Machine (or equivalent), quarterly off-site copy (encrypted), and test the restore path once a year.

▼ Recommended Plugins #

Core (enable in Settings)

Daily notes, Templates, Backlinks, Outgoing links, Graph view, Tags pane, File recovery.

Community

Plugin	Purpose
Dataview	Query frontmatter like a database
Templater	Advanced templates with scripting
Calendar	Sidebar calendar for daily notes
Periodic Notes	Weekly/monthly review notes
Advanced Tables	Keyboard-friendly table editing

⚠️

Plugin Discipline

Don't install everything. Start with Dataview + Templater. Add others only when you hit a concrete need. Every plugin is a maintenance burden.

▼ Anti-Patterns #

Don't do this	Why it hurts
One giant note with everything	Can't link to it, can't retrieve from it
Deeply nested folders (5+ levels)	You'll never navigate them
One-off tags	Tag list becomes garbage
Copying content instead of linking	Goes stale immediately
No frontmatter	Not queryable, not filterable
Deleting instead of archiving	Loses history and context
Decorative formatting	Only format what passes the two-audience test
Bulk-importing from another tool	Noise. Bring only what you reference.

▼ Scaling #

Vault size	What breaks	Fix
0-100	Nothing	Enjoy it
100-500	Folder navigation	Introduce MOCs
500-1,500	Tag sprawl, search noise	Tag audit + Dataview indexes
1,500+	Graph view is noise	Focused graph filters, per-project subgraphs
Any size	Stale notes	Quarterly status review

▼ Getting Started Checklist #

Create the numbered folder structure
Write your CLAUDE.md at the vault root
Set up 3 templates (daily, research, meeting)
Enable core plugins (Daily notes, Templates, Backlinks, Graph view)
Install Dataview + Templater
Initialize git and add .gitignore
Write your first daily note
Create one MOC (Projects MOC or Infrastructure MOC)
Work in the vault for 2 weeks before importing anything

▼ Further Reading #

Official Documentation

Obsidian Help — help.obsidian.md

Plugin Documentation

Dataview Plugin Docs — blacksmithgu.github.io

Philosophy

Andy Matuschak's Evergreen Notes — notes.andymatuschak.org