Beads, the Backbone
The CLI issue tracker that gives an agent a memory that survives the next compaction · ~18 min read ~– min read · Suggested by Bob engineerpm
Agents have the memory of goldfish without scaffolding. Beads is how we give them a memory that survives compaction, restart, and the next morning. This is the cairn that explains why beads is required, what the daily loop looks like, and the specific gotchas that bite Strike contributors in their first week.
The agent-memory problem
A typical agent session at the start of the day reconstructs what it can from the repo, the open PR, and a few commit messages — plausible, but often subtly wrong. Rejected constraints get re-proposed; deferred decisions turn back into open questions. The agent’s memory is the union of the repo, the current context window, and anything it can read on demand; the first two are bounded, and the third is the lever to pull.
The agent’s memory across sessions is whatever it can read from a stable, durable place. Beads is that place for “what work is in flight, and what is the state of each piece.”
Beads, in one page
Beads is a CLI issue tracker with two non-obvious design choices. Work is a graph — issues have dependencies, and bd ready returns work whose blockers are cleared. State lives in two layers: an embedded Dolt database (gitignored cache) and .beads/issues.jsonl (committed, source of truth). A pre-commit hook auto-exports; a post-merge hook imports. Issues travel with the code in the same history and PR. Required, minimum 1.0.3.
Why JSONL + git beats a SaaS tracker for agents
If you have used issue trackers before, beads will look familiar in concepts and strange in surface. The real difference is what happens when the agent updates the tracker: SaaS status changes are network round-trips with payloads pushed back into context. Local git-backed JSONL is essentially free per write. The bigger win is multi-agent concurrency: parallel agents each operate against a local copy, and conflicts surface as merge conflicts in the JSONL — the same skill the team uses everywhere else.
Bringing a session up to speed
The first thing an agent does in a new session is run bd prime — though you almost never type it. Coding agents reach for beads proactively, and Strike’s AGENTS.md embeds bd onboard’s short context snippet inline so the agent has the gist before it runs bd prime for the full graph. After priming, bd ready shows work whose blockers are cleared; ask “what’s ready?” and the agent runs it and summarizes. Most contributors talk to beads through their agent, the way most people talk to git through commands rather than a client.
The daily loop
For a typical work session, the loop is short and you mostly do it by talking. “Pick up the next ready bead” — the agent runs bd ready, claims one, starts work. “Close os-pvw9, os-mn3k, and os-7zfx — they all landed in this PR” — one shot. Natural language is the surface; the CLI is the contract the agent implements against. The plan-then-claim pattern is the higher-leverage version: brief the agent on a four-hour goal, it creates beads with explicit dependencies, you confirm the breakdown, the graph self-drives.
For non-trivial features, write the bead description as if you are writing for a colleague who will pick it up tomorrow morning. The bead is the durable artifact, and the agent’s draft is something you should read before accepting.
T-shirt sizing: how we estimate complexity
Once a spec is ready, beads get broken down on a XS / S / M / L / XL scale. XS is a one-line tweak; S is contained with clear scope; M spans a few files but is well-understood; L involves real architectural decisions or unknowns; XL is a signal to split. The value is relative comparison, not absolute precision. Sizing drives review cadence: XS/S lean on the gates plus the pre-merge review pass; M+ earn a structured review at close.
Do not let the agent estimate in time. Claude will fall back to human hours if you do not redirect — not realizing it will be the one doing the work, often an order of magnitude or two faster than the reference frame it learned from. Stick to t-shirts.
Persistent memory: hard-won facts that should survive
Some learnings are not work-to-do; they are knowledge the next session will need — a library quirk that cost an hour, a convention that bites every new contributor. Beads provides bd remember and bd memories for these, reached through conversation: “remember that X”, “do we have a note about X?” The boundary: issues are for work; bd remember is for knowledge. We do not use TodoWrite, MEMORY.md, or in-chat todo lists for cross-session work — they do not survive resets reliably enough.
Strike-specific gotchas (so you can spot them when the agent slips)
A handful of conventions bite every new contributor. Issue prefix is os- (lowercase, not configurable). Priority is numeric 1–4 (P1 highest); --priority high is rejected. Labels are comma-separated only on create; post-creation, use bd label add <id> <name> once per label — passing "api,feature" creates one literal label. The 1.0.3 close-flush quirk on macOS sometimes leaves .beads/issues.jsonl lagging until the next commit; committed state is always correct, and bd export -o .beads/issues.jsonl force-flushes. Do not run bd hooks install — it writes to .git/hooks/, which git ignores when core.hooksPath = .githooks/.
Beads and the GitHub Project
Beads is the engineer’s execution surface; a single GitHub Project (Osprey Product Development) is the team’s coordination surface. The bd github sync/push/pull bridge exists in the CLI but Strike deliberately doesn’t auto-enable it — bidirectional sync would either pollute bd ready with auto-review noise or flood the Project board with sub-day bead churn, so the workflow uses a manual seam where beads get minted at claim time from issues labeled discovered or epic/*. The full coordination story is in Where the Work Lives, the next cairn in the trail.
- How We Build Here — The trail's opening cairn. The "agents need scaffolding" framing applied directly.
- The Workshop — The trail's tool map. Beads sits in the required three.
- Three Memories, One Q — On the layered memory model: short-term context, mid-term beads, long-term cairns.
- Beads on GitHub — Source code, install instructions, changelog.
- Dolt — The SQL database with git-style branching that beads uses for its local cache.
Agents forget. Not in the dramatic way computers used to forget — bluescreen, lost work, start over — but in the quieter, costlier way that comes from finite context windows. A session opens, fills with the thread of what you are working on, and at some point the harness compacts older messages to make room for new ones. Restart, /clear, the next morning’s session — every one of those resets the working memory of your collaborator to roughly zero.
If we relied on the agent to remember anything between those resets, we would lose hours every week to re-deciding decisions that were already settled. That is not a hypothesis; it is what the team did before beads. The cost is real and it compounds.
Beads is how we stopped paying it.
The agent-memory problem
A typical agent session at the start of the day looks like this. You open a chat, type “what were we doing yesterday?”, and watch the agent stitch together what it can from the repo, the open PR, and a few commit messages. The reconstruction is plausible. It is also, in practice, often subtly wrong: a constraint that was discussed and rejected gets re-proposed; a half-finished investigation gets restarted from scratch; a “we decided to defer this” turns back into “should we do this now?”
The mechanism is straightforward. Anything that was only ever in the chat thread is gone. The agent’s memory is the union of three things: what is in the repo on disk, what is in the current context window, and what is anywhere it can read on demand. The first two are bounded; the third is the lever we want to pull.Claude Code’s native rewind and resume features help inside a single session — a context window that has compacted can be restored to an earlier point. They do not help across sessions, across days, or across machines, which is what a persistent issue graph fixes.
The agent’s memory across sessions is whatever the agent can read from a stable, durable place. Beads is that place for “what work is in flight, and what is the state of each piece.”
Beads, in one page
Beads is a CLI issue tracker with two non-obvious design choices. First, it represents work as a graph rather than a list — every issue can have dependencies, blockers, and parents, and the tool’s primary query (bd ready) returns work whose blockers are cleared. Second, it stores its state in two layers: an embedded Dolt database for fast local queries, and .beads/issues.jsonl as the git-native source of truth.
The distinction matters. The Dolt directory (.beads/embeddeddolt/) is gitignored — a local cache. The JSONL file is committed — the canonical record. When you commit, a pre-commit hook auto-exports the current state to JSONL and stages it; when you pull, a post-merge hook imports the changes. Issues travel with the code, in the same git history, in the same review, in the same pull request.
Beads is required, minimum version 1.0.3. Older versions are missing the export defaults the team’s hooks rely on.
bd is the CLI command. beads is the project. The .beads/ directory holds both the cache (gitignored) and the source-of-truth JSONL (committed). When in doubt, the JSONL is the truth.
Why JSONL + git beats a SaaS tracker for agents
If you have used issue trackers before — Jira, Linear, GitHub Issues, Asana — beads will look both familiar and strange. Familiar because the concepts are the same: an issue has a title, a body, a status, dependencies, labels. Strange because the surface is a CLI and the storage is a file in your repo.
Most of those tools have CLIs too — that is not the differentiator. The real difference is what happens when the agent updates the tracker. Every status change in a SaaS tool is a network round-trip: HTTP latency on the way out, response payloads pushed back into the agent’s context window, retry logic when the API rate-limits, and the running noise of “issue updated” responses cluttering every turn. Local git-backed JSONL is essentially free per write — disk speed, no network, no payload, no rate limit. The per-command cost difference is small; over a quarter of agent-driven work, it is the difference between an issue tracker that rides along quietly and one that drags.
The bigger win is multi-agent concurrency, which is central to beads’ design.The same project that makes beads also runs Gastown Hall, an experiment in agent-coordinated multi-agent collaboration where the issue graph is the coordination surface. Worth a look as a curiosity, even if you never need that pattern. When two agents are working in parallel — one in your main checkout, one in a worktree, or a pair on different machines collaborating through git — they each operate against a local copy of the graph. Conflicts surface as merge conflicts in .beads/issues.jsonl at commit time, in the same skill the team already uses everywhere else. There is no “but my view says X” mismatch, and no central tracker becoming a bottleneck or a source of inconsistency between agents.
Bringing a session up to speed
The first thing an agent does at the start of a new session is run bd prime. In practice, you almost never type that command yourself. Modern coding agents are trained to reach for beads proactively, and Strike’s AGENTS.md is already pointed at bd prime via bd onboard. Open a fresh Claude Code session in this repo and the agent primes itself before you finish your second sentence.
Two commands are worth knowing exist anyway:
bd prime— full workflow context, designed to be read by an agent. Includes the project’s session-close protocol, the daily-loop semantics, and team conventions. If you ever want to inspect what the agent is seeing, run it yourself.bd onboard— a much shorter (~10 line) snippet suitable for pasting intoAGENTS.mdorCLAUDE.md. Strike already has this committed.
After priming, bd ready is what shows the work whose blockers are cleared — your queue. Again, you rarely type this. Ask the agent: “what’s ready?” or “what should we work on?” and it will run bd ready and summarize. If the queue is empty and you do not already know what you are doing, that is a useful signal in itself.
If you want a graphical view on top of the issue graph — most contributors here do not, but tastes vary — there is a small ecosystem of community-built GUIs that read the same JSONL. The community tools list is the canonical index. The team’s working assumption is that you will mostly talk to beads through your agent, the way you mostly talk to git through commands rather than a GUI client.
The daily loop
For a typical work session, the loop is short — and you mostly do it by talking. You describe outcomes; the agent runs the mechanics. The CLI invocations below are real, but they are happening under the hood, not at your keyboard.
What it looks like in practice:
- “Let’s pick up the next ready bead.” The agent runs
bd ready, summarizes the queue, claims one withbd update --claim, starts work. - “This one’s done — close it and move on.” The agent runs
bd close, picks the next item in the plan. - “Close
os-pvw9,os-mn3k, andos-7zfx— they all landed in this PR.” The agent runsbd close os-pvw9 os-mn3k os-7zfxin one shot.
You can absolutely run any of these commands yourself if you want to. Most contributors do not, most days. The natural-language interface is the one we live in; the CLI is the contract the agent implements against.
For non-trivial features, write the bead description as if you are writing for a colleague who will pick it up tomorrow morning. Title and body should let the next reader figure out what was intended without needing the chat thread that produced it. The bead is the durable artifact, and the agent’s draft of it is something you should read before accepting.
The plan-then-claim pattern is the higher-leverage version of the same loop: at the start of a four-hour session, brief the agent on what you want to accomplish. It will create the beads with explicit dependencies; you confirm the breakdown is right. From that point, the graph self-drives — the agent works through it, claiming and closing as it goes, while you stay at the level of “is this the right work, in the right order?” That is outcome-driven dev. It is what most of the day looks like.
T-shirt sizing: how we estimate complexity
Once a spec is ready, the next step is breaking it into actionable beads — and estimating each bead’s complexity. The team uses t-shirt sizes (XS / S / M / L / XL) as the canonical scale. The agent does the breakdown; you confirm the shape.
The sizes are deliberately rough:
- XS — typo, one-line tweak, doc-only change. Often does not need its own bead.
- S — small, contained change with clear scope and few unknowns. A single function, a tightly-bounded refactor, a small test addition.
- M — meaningful work that spans a few files but is still well-understood. The agent can carry it end-to-end with minimal direction.
- L — multi-file change, real architectural decisions, or notable unknowns. Worth a quick spec pass before the agent dives in.
- XL — split it. An XL bead is almost always a signal that the work wants to be a small epic with sub-beads.
The value is in relative comparison, not absolute precision. “This bead is M, that one is S” tells you which to claim first; “this bead is six hours” implies a precision that does not survive contact with how the work actually unfolds.
Do not let the agent estimate in time. Claude in particular will fall back to human hours, days, or weeks if you do not redirect — not realizing it will be the one doing the work, often at an order-of-magnitude (or two) faster pace than the human reference frame it learned estimates from. Time estimates were unreliable in the old world; they are dramatically less reliable now. Stick to t-shirts.
Sizing also drives review cadence. Later cairns use this scale: XS and S beads typically lean on the gates plus the pre-merge review pass and skip the per-bead review; M+ beads earn a structured review when they close. Sizing the work wrong means either over-reviewing trivia or under-reviewing real work. The agent’s first sizing pass is usually correct; spend a beat confirming it.
Persistent memory: hard-won facts that should survive
Some learnings are not work-to-do; they are knowledge the next session will need. A library quirk the agent spent an hour debugging. A non-obvious environment requirement. A convention that bites every new contributor. For these, beads provides a separate facility — the agent stores them with bd remember <key> "<note>" and recalls them with bd memories <keyword> — that you reach for through conversation.
The natural-language surface is “remember that X” (the agent stores a memory) and “do we have a note about X?” (the agent recalls). Most of the team rarely types either command directly. The CLI is the implementation; the conversation is the interface.
The boundary line: beads issues are for work; bd remember is for knowledge. Ephemeral session state belongs in a bead, not a memory. A one-off TODO belongs in a bead, not a memory. The team’s bd remember set is small and high-signal because we treat it as the durable wiki for hard-won facts.
We do not use TodoWrite, MEMORY.md files, or in-chat todo lists for cross-session work. Those exist; we have decided they do not survive resets reliably enough. Anything that matters past the current session belongs in a bead or in bd remember, and the agent knows to reach for the right one when you describe what you want preserved.
Strike-specific gotchas (so you can spot them when the agent slips)
A handful of conventions and quirks bite every new contributor in their first week. They are not in bd prime because they are project- or version-specific. Knowing them lets you redirect the agent when it falls into one — and most of these will show up at least once.
Issue prefix is os-. Issues are named os-<hash> — for example os-pvw9. If an agent invents OS-1234-style names, redirect it; the prefix is not configurable per session.
Priority is numeric (1–4), not high/medium/low. P1 is highest, P4 is lowest. The agent should call bd create --priority 1; bd create --priority high is rejected outright. If you ask for “high priority” and the agent argues with the tool instead of translating, redirect.
Labels: comma-separated only on create. bd create --label "api,feature" works. bd label add <id> "api,feature" does not — it creates a single literal label called "api,feature". Post-creation, the right form is bd label add <id> <name> called once per label. This footgun gets re-discovered roughly once per quarter, which is exactly why it lives in bd remember.
bd close flush quirk on 1.0.3 / macOS. When the agent closes a bead, the in-memory state updates but .beads/issues.jsonl sometimes lags until the next commit triggers the pre-commit re-export. Committed state is always correct. Mid-session, if the on-disk JSONL surprises you (say, you grep it before committing and a closed bead still looks open), that is the quirk — the agent can run bd export -o .beads/issues.jsonl to force-flush.
The flush quirk is a known upstream bug on 1.0.3. The pre-commit hook covers it, so commits never carry stale JSONL. If a mid-session inspection of the JSONL surprises you, it is probably this — not your work being lost.
Do not let the agent run bd hooks install. The Strike repo shares hooks via core.hooksPath = .githooks/ (set by just hooks). bd hooks install writes to .git/hooks/, which git ignores when core.hooksPath is set. If you see the agent reach for it on a fresh clone, redirect — the right move is to edit .githooks/ directly and commit.
Beads and the GitHub Project
Beads is one surface; the team’s GitHub Project — Osprey Product Development — is the other. Beads holds the active execution units: what someone is in the middle of building, with dependency graphs, claim semantics, and the design-field context that anchors each bead to its parent. The Project holds coordination: the discoveries someone noticed but won’t tackle today, the epics waiting for someone to start them, the auto-review queue. Both surfaces exist on purpose. The split between them is what lets a manager, a product partner, or a new contributor see what’s happening without running bd ready themselves, while keeping the engineer’s working set out of the rest of the company’s notifications.
A bd github sync/push/pull bridge does exist in the bd CLI, and it will round-trip records between the two surfaces. Strike’s methodology deliberately does not enable it by default — bidirectional auto-sync would either pollute bd ready with auto-review noise (the nightly routines file dozens of small findings every week) or flood the Project board with sub-day bead churn. Two specific edge cases call for invoking the bridge by hand: an epic that originated as a bead and now needs board visibility, or a GH issue whose sub-tasks specifically benefit from bead dependency graphs. Outside those, the seam is manual and intentional.
The full coordination story is in Where the Work Lives — the cairn directly downstream of this one in the trail. The short version: most discovered work files as a GitHub issue at discovery time (labeled discovered, or epic/<area> if epic-scale); the corresponding bead comes into existence at claim time when someone is about to start on it. The bead’s --design field carries the GitHub issue URL on the first line, so the seam is navigable from either direction. When a PR merges with Closes #N, GitHub closes the parent issue automatically.
Summary
- Agents forget; beads remembers. Sessions compact, context windows fill, restarts happen. Beads gives the next session a stable, durable place to read what's in flight.
- Two layers: Dolt cache and JSONL source of truth.
.beads/embeddeddolt/is gitignored;.beads/issues.jsonlis committed. The JSONL wins. - Required, minimum 1.0.3. Older versions miss the export defaults the team's hooks rely on.
- Agents prime themselves. The first thing a fresh session does is
bd prime; you almost never type it. Same forbd readyand the rest — you describe outcomes, the agent runs the CLI. - The daily loop is conversational. "Pick up the next ready bead." "Close these three." The CLI is the contract the agent implements against; natural language is the surface you work in.
- T-shirt sizing, not time. XS / S / M / L / XL on every bead. The agent will reach for human hours if you do not redirect — and it will get them wrong by an order of magnitude or two now that it is the one doing the work. Relative size, not absolute precision.
- Issues are for work;
bd rememberis for knowledge. Do not blur the line. - Strike gotchas to internalize.
os-prefix; numeric priority; comma-labels only oncreate; the 1.0.3 close-flush quirk; do not runbd hooks install.
- Some teams keep their issue tracker out of the repo (SaaS) on the theory that ops work shouldn't gate code review. We chose the opposite. Where would the SaaS-separated approach work better than ours, and what would have to be true for us to switch?
- The line between "this is a bead" and "this is a
bd remember" is not always obvious — especially for things like "we decided to defer X." How would you describe the test you apply to choose? Is it the same test the rest of the team applies? - If you had to pick the one Strike-specific gotcha that should move out of project memory and into a tool fix instead (in beads itself, in the hooks, somewhere), which would you pick and why?
- How We Build Here — The trail's opening cairn. The "agents need scaffolding" framing in this cairn is a direct application of the philosophy there.
- The Workshop — The trail's tool map. Beads sits in the required three; this cairn is the deep read on the first of them.
- From Intake Folder to Project Memory — A separate cairn on how raw incoming material becomes durable project memory. Useful background on the team's broader memory strategy beyond beads.
- Three Memories, One Q — On the layered memory model: short-term context, mid-term beads, long-term cairns. Beads is the middle layer; this cairn explains where the others fit.
- Beads on GitHub — Source code, install instructions, changelog. Required reading for any contributor adopting beads on a new repo or troubleshooting hook integration.
- Dolt — The SQL database with git-style branching that beads uses for its embedded local cache. Background context for understanding why the cache and the JSONL are layered the way they are.
Generated by Cairns · Agent-powered with Claude