Claude Code as Daily Driver
The agent harness most of us reach for first, what we run inside it, and the alternatives that are perfectly fine · ~19 min read ~– min read · Suggested by Bob engineer
Claude Code is the agent harness most of us reach for first. Codex is a strong alternative. Anything that takes Claude or Codex over OAuth is plausible; the farther you drift from precedent, the more you are maintaining your own setup. This cairn explains the recommended stack, the acceptable alternatives, and the daily-driver mechanics that turn a harness into a productive collaborator.
Why a harness, not a chat window
A chat window cannot read the rest of the codebase, cannot run the build, and cannot edit its own draft against a real failure. You end up narrating the work into chat and copy-pasting failures back. A harness closes both gaps: the agent reads the project as it needs to, runs the build itself, and edits based on what it sees. The collaboration shifts from “I will narrate this for you to comment on” to “here is the goal and the constraints — go.”
The harness is the difference between an agent that suggests and an agent that executes. We hired you to direct execution, not to copy and paste it.
The recommended stack
The author runs Claude Code with Anthropic’s Opus 4.7 on the 1M-context tier. Install Claude Code via Anthropic’s native installer — it is more featureful than the npm package and avoids the supply-chain incidents the npm path has had. Opus 4.7 trades latency and cost for capability per turn; for design, multi-file edits, and review that trade is worth it. 1M context is the difference between “I read most of the files I needed” and “I read all of them” on a codebase Strike’s size. Strongly recommended, not required — every step away should be a conscious choice. The desktop apps (Claude Desktop, Codex’s GUI, Cowork) earn a side seat for chat-mode thinking and reading; they complement the terminal harness rather than replace it.
Acceptable alternatives
Codex CLI with GPT-5.5 is plausible without serious DIY costs — different priors, same shape of collaboration, and several of us run it as daily driver or second-opinion lever. Other harnesses that wrap Claude or Codex over OAuth (Cursor, Cline, Aider, etc.) all run on the same underlying models and are fine if you already prefer one.
Each step away from “Claude Code, Opus 4.7, 1M, OAuth subscription” is a step toward maintaining your own setup. The team can help you debug Claude Code or Codex; we are less likely to be useful when your harness is the part that broke.
OAuth subscriptions, not API keys
This is the one structural decision the trail states plainly. We use OAuth subscriptions — Claude Pro / Max, the Codex equivalent — and not API keys for daily-driver work. A productive day at the context windows this trail assumes runs through tens of millions of tokens; at API rates that is a meaningful line item, at flat-rate it is a planned cost. If you get pulled into an automation, CI integration, or vendor sandbox that demands a key, that is a separate decision and a separate budget.
Skills and plugins, in moderation
Claude Code’s skills system turns the harness from “can read your code” into “knows how to act on it.” There is no shared team standard yet — the Constructured-specific plugin set is in progress. The author leans on dm-work:orchestrator and dm-work:subagent for delegation, dm-lang:go-pro and dm-lang:typescript-pro for language gotchas, dm-arch:solid-architecture and dm-arch:data-oriented-architecture for design review, dm-work:debugging for investigation, and dm-work:browser-qa for runtime QA. Treat these as a starting point, not a prescription.
Resist the urge to install every plugin you find. Plugins compete for space in the agent’s attention; a stack of marginal plugins dilutes the strong ones.
Session orientation
The first thirty seconds of every session set the tone. Strike’s AGENTS.md (and the CLAUDE.md symlink) loads automatically and points the agent at bd prime; the agent confirms branch, worktree, and the ready queue, then reports back. Your job is to be watching what it reports. The trick to making this passive is good ambient instrumentation — a shell prompt that surfaces branch and worktree at a glance (Starship is what the author uses) plus a custom Claude Code status line. Together, they turn the agent’s confirmation into verification rather than discovery. Never skip orientation by either path; working on the wrong branch wastes entire sessions silently.
Rotating: handoffs over compaction
Sessions don’t last forever. Two mechanisms exist: the harness’s automatic compaction, which summarizes older messages behind a curtain; and a deliberate handoff, where the agent writes a structured document, you /clear, and the next session reads the document first. The team’s strong preference is the handoff path. Transparency — you see exactly what carries forward. Context rot — summaries lose information the agent later cannot recover; past a certain density, accumulated context behaves like noise. Cache predictability — Anthropic’s prompt cache has a 5-minute TTL, and rotating at the natural break of a workstream produces clean cache behavior while mid-stream compaction thrashes it.
Context rot: the gradual degradation of an agent’s reasoning quality as accumulated context grows past the model’s effective working set. State of the art is meaningfully better than at 200k, but every model still has a curve.
The rule of thumb is 70% of the context window: past that, performance drops and cache misses bite. Treat it as a soft ceiling and rotate at the natural break of the current workstream — a closed bead, a merged PR, a finished investigation. A custom status line that surfaces context-budget percentage passively changes how often you actually rotate. A good handoff has three parts: what we were doing, the next concrete action (not “finish the feature”), and any open threads worth filing as beads. The /dm-work:handoff slash command drafts this where the plugin is installed; modern agents handle it well unprompted otherwise. Compaction is the right call only when a workstream genuinely cannot be paused, or when context is mostly throwaway exploration the next phase doesn’t need.
The 1M-context era makes the 70% rule and the status line more important, not less — at 200k the constraint pressed on you constantly, but at 1M you can run a long session without noticing you are 800k in until performance degrades.
Anything that needs to survive a /clear lives in beads, in timbers, or in committed code — never in the chat thread. When a session ends without a clean handoff, beads is the recovery surface (bd prime, bd show <id>).
Subagents, deliberately
Subagents let the orchestrator spawn a fresh agent with its own context for a scoped task — read-only research, parallel reviews of different angles on the same change, scope-isolated implementation with a clean definition of done. Do not use a subagent for a one-line edit (the briefing cost is the work) or for the main thread of an open-ended exploration (you’ll spend more time briefing than doing). The heuristic: spawn a subagent when the cost of bringing it up to speed is less than the cost of doing the work yourself in the parent context.
What about your IDE?
No IDE recommendation. Bring your favorite, use it as much or as little as you want, and avoid checking shared .vscode/ or .idea/ configs into the repo without team agreement. The trend for some of us is that the IDE has functionally become a markdown reader and a file tree with syntax highlighting — most editing, reviewing, and code navigation now happens in the agent’s chat. That is not a prediction every contributor has to share; it is one direction the work is drifting. Both extremes work, because the trail’s mechanics are agnostic — the agent runs just check, the gates fire on commit, the timbers entry happens whether you typed the commit yourself or the agent did.
What this cairn does not cover
Adjacent topics live in their own cairns: Your Box and Your Trust Model for trust-model and sandbox mechanics; Codex as Second Opinion for the second-opinion pattern; Working in Parallel (Mostly) for worktrees and the in-progress plugin set; Quality Gates for the deterministic constraints; From Plan to Pull Request for the integration cairn. The Quiet Teammate is the more reflective companion piece on long-form collaboration with an agent.
- How We Build Here — The trail's opening cairn. The "shape of work shifted up the stack" framing is the philosophical reason a harness, rather than a chat window, is the seat we direct work from.
- The Workshop — The trail's tool map. Claude Code is the agent harness layer in that tour; this cairn is the deep read on it.
- Claude Code — Anthropic's official harness. Install instructions, docs, plugin marketplace.
- Codex CLI — OpenAI's CLI harness. The alternative this cairn endorses for daily-driver work.
- Claude Code skills system — Documentation on how skills work in Claude Code.
An agent harness is the program that gives a model access to your filesystem, your shell, your editor, your test runner, and your local sandbox. It is the difference between a chat window that can suggest a code change and a collaborator that can read the file, edit it, run the tests, see the failure, edit again, and report back when it has something working. The chat window is a tool. The harness is the seat the agent sits in.
You can ship production software with either. We picked the seat. The picks are not strict, and we have good reasons not to be strict, which is what most of this cairn is about.
Why a harness, not a chat window
The first thing a chat window cannot do is read the rest of the codebase. You can paste a file in. You cannot paste in the file the function calls into, because you do not know yet which one matters. You can describe the project structure. You cannot describe everything an experienced reader would notice when actually opening the directory.
The second thing the chat window cannot do is run the build. It can suggest a change that “should pass tests.” It cannot prove it. The cost of that gap is a loop you, the human, are inside of: paste in code, run the test yourself, paste back the failure, paste in the suggested fix, run again. The agent moves at the speed of how fast you copy and paste.
A harness closes both gaps. The agent reads the project as it needs to. It runs the build itself. It edits its own draft based on the failure it just saw. The collaboration becomes “I want this done; here are the constraints” instead of “I will narrate the work into the chat for you to comment on.” That difference is most of what this trail is about.
The harness is the difference between an agent that suggests and an agent that executes. We hired you to direct execution, not to copy and paste it.
The recommended stack
The author runs Claude Code with Anthropic’s Opus 4.7 model on the 1M-context tier. That is the recommended default, and it earns its keep on this codebase.
A few specifics matter:
- Claude Code is the harness — install via Anthropic’s native installer. The native install is what Anthropic recommends; it is more featureful than the npm package and has not been subject to the supply-chain incidents the npm path has had. On a fresh machine today, there is no good reason to choose the npm route. The harness carries native session continuity (rewind, compact, resume), a robust skills system, an MCP server interface, and a plugin marketplace.
- Opus 4.7 is the model. The trade-off relative to Sonnet is more capability per turn at higher latency and higher cost; for the kind of work this trail describes — design judgment, multi-file edits, careful review — that trade is worth it most days.
- 1M context window is the long-context tier. Strike is large enough that the difference between 200K and 1M context is the difference between “I read most of the files I needed” and “I read all of them.” Pin to 1M for that reason.
This stack is strongly recommended but not required. The author is not trying to flatten contributor preferences — different people work in different ways, and every step away from this default is a choice worth making consciously rather than by accident.
One nuance worth naming. The desktop apps earn a side seat for some of the team. Claude Desktop (the standalone GUI app) is useful for chat-mode work that does not need filesystem or shell access — thinking, planning, research, document analysis, iterative writing. The author runs it alongside terminal Claude Code most days; the terminal version is the implementation seat, the desktop version is the thinking-and-reading seat. Codex’s desktop app fills a similar slot for contributors running Codex; Claude Cowork is another GUI surface worth a look. The desktop apps are a complement to the terminal harness, not a replacement.
Acceptable alternatives
A few“A few” undersells how many viable harnesses exist now. The list grows monthly. The relevant question is not “which harnesses exist” but “which harnesses give you the affordances this trail assumes” — file access, shell access, sandbox controls, persistent session state, plugin support. alternatives are plausible without serious DIY costs.
Codex CLI with GPT-5.5. Anthropic and OpenAI’s flagships are different models with different priors; both are perfectly capable of doing the work this trail describes. The Codex CLI is a mature harness in its own right — different conventions, different defaults, but the same shape of collaboration. Some of the team uses Codex as the daily driver. Some use Claude as the daily driver and Codex as a second-opinion lever. Codex as Second Opinion is the deep read on the second-opinion pattern.
Other harnesses that wrap Claude or Codex over OAuth. Cursor, Cline, Aider, and a handful more all run on the same underlying models. If your harness is one of these and you are happy with it, that is fine. The team’s skills and plugins are easier to share inside Claude Code’s plugin marketplace, but they are not exclusive to it.
Each step away from “Claude Code, Opus 4.7, 1M, OAuth subscription” is a step toward maintaining your own setup. The team will help you debug Claude Code or Codex; we are less likely to be useful when your harness is the part that broke. That is a reasonable price to pay for a setup you prefer; just price it in.
OAuth subscriptions, not API keys
This is the one structural decision the trail will state plainly. We use OAuth subscriptions — Claude Pro / Max for Claude, the equivalent tier for Codex. We do not use API keys for daily-driver work. This is not a preference; it is a budget reality at the level we lean on these agents.
The arithmetic is unfavorable for keys. A productive day with an agent at the kind of context window this trail assumes runs through tens of millions of tokens — not because any single turn is large, but because hundreds of turns accumulate. At API-key rates, that day is a meaningful line item. At a flat-rate subscription, the same day is bundled into a cost the team can plan around. The math holds for occasional users; it gets worse the more you use the tools.
OAuth subscription here means: an account-billed plan (Claude Pro, Claude Max, ChatGPT Plus, ChatGPT Pro, equivalents) that the harness authenticates against using OAuth, with usage included in the flat fee subject to per-account caps. As distinct from API key billing, where each token is metered and charged.
If you are pulled into a setup that demands an API key — for an automation, for a CI integration, for a vendor sandbox — that is a separate decision and a separate budget. For the daily seat, OAuth.
Skills and plugins, in moderation
Claude Code’s skills system is one of the levers that turn the harness from “can read your code” into “knows how to act on your code.” A skill is a markdown document with frontmatter that the harness loads when its description matches the work you are doing. The skill teaches the agent the shape of a task — which tools to reach for, which conventions matter, what order to do things in.
The honest picture today: there is no shared team standard for which skills to install. The author leans on a load-bearing handful of dm-work and adjacent skills; Noam touches a different (and smaller) subset; other contributors run different sets again. That variance is part of why the Constructured-specific plugin set is in progress — once it lands, the team will have a shared core that everyone can adopt and contribute to. Until then, the skills below are the ones the author finds load-bearing in his own loop, offered as a starting point rather than a team prescription:
dm-work:orchestratoranddm-work:subagent— the delegation protocol. Activated at session start; sets the orchestrator-versus-implementer contract.dm-lang:go-pro,dm-lang:typescript-pro— language-specific style and gotcha skills. Auto-activate when a Go or TS file is in scope.dm-arch:solid-architecture,dm-arch:data-oriented-architecture— architectural review skills. Useful when you are reviewing or designing.dm-work:debugging,debugger— investigation patterns. Useful when something is broken in a non-obvious way.dm-work:browser-qa— runtime QA via Chrome DevTools MCP, when you have it installed.
Working in Parallel (Mostly) covers the broader plugin landscape, including the constructured-specific plugin set in progress.
Resist the urge to install every plugin you find. Plugins compete for space in the agent’s attention; a stack of marginal plugins can dilute the strong ones. Pick the few that earn their keep, and add new ones deliberately.
Session orientation
The first thirty seconds of every Claude Code session set the tone for the rest of it. Strike’s AGENTS.md (and the CLAUDE.md symlink) is loaded automatically; it points the agent at bd prime and the project’s prime directive. The agent does the orientation work — it confirms branch, worktree, and the ready queue, then reports back. Your job is to be watching what it reports.
Concretely, an agent in a fresh Strike session should land at something like: “On branch swe-trail-plan in the main checkout, bd ready shows os-pvw9 as next. Continuing with that?” You confirm or redirect. One minute, sometimes less.
The trick to making this passive is good ambient instrumentation. Two pieces are worth setting up once:
- A shell prompt that surfaces the current branch and worktree at a glance. Starship is what the author uses — it works across shells and across machines. With starship configured, you do not have to ask whether you are on the right branch; your prompt tells you whenever you look at the terminal.
- A custom Claude Code status line. Surfaces the same data inside the agent’s UI, plus context-budget and auto-mode state. Docs at code.claude.com/docs/en/statusline; the rotating section later in this cairn covers why the budget visibility part matters.
Together, these mean a glance at your terminal tells you the state of the world. The agent’s spoken confirmation then becomes verification rather than discovery.
The raw form — git branch --show-current, git worktree list, bd ready typed directly — still works and is what you reach for if you do not have the prompt and status line set up yet. A new contributor can absolutely run the trail without them; the ambient setup just makes orientation cheaper after a couple of sessions.
Never skip orientation, by either path. Working on the wrong branch wastes entire sessions silently — the diff lands somewhere unexpected, the merge conflict shows up later, and the agent has done several hours of work it cannot easily redo. Whether you confirm by reading your prompt or by asking the agent to report, the confirmation has to happen.
Rotating: handoffs over compaction
Sessions don’t last forever. Context windows fill, prompt caches expire, and even a 1M-context Opus 4.7 session has a point past which fresh context outperforms accumulated context. Knowing when to rotate and how to rotate well is the discipline this section is about.
Two mechanisms exist. The harness’s built-in compaction runs automatically as you approach the limit — older messages get summarized into a condensed form to make room for newer ones. The other path is a deliberate handoff — the agent (with your confirmation) writes a structured document that captures the state of the work, you /clear or open a fresh session, and the next session reads the document first. The team’s strong preference is the handoff path, for three reasons.
Transparency. A handoff is visible. You read the document; you see exactly what carries into the next session; you correct anything wrong before committing it. Compaction happens behind a curtain — the new context is whatever the harness summarized, and the summary is not always the part you needed.
Context rot. Compaction summarizes, and summaries lose information the agent later cannot recover. Past a certain density, accumulated context starts to behave like noise — the agent re-asks questions it answered earlier, drifts from constraints it had internalized, or spends turns reconstructing what the summary lost. The state of the art is meaningfully better than it was at 200k a year ago, but “better” is not “solved.” A clean handoff sidesteps the problem.
Cache predictability. Anthropic’s prompt cache has a 5-minute TTL on inactive content. Rotating via handoff at the natural break of a workstream produces predictable cache behavior. Mid-stream compaction thrashes the cache, and on a session 700k tokens deep, a cache miss is an expensive way to start the next turn.
Context rot: the gradual degradation of an agent’s reasoning quality as accumulated context grows past the model’s effective working set. Past a threshold (which varies by model and task), more context starts to hurt instead of help. State of the art is meaningfully better than at 200k, but every model still has a curve.
The 70% rule of thumb
Past 70% of the context window, performance starts to drop and cache misses start to bite. Treat 70% as a soft ceiling — when you cross it, plan a rotation. The practical heuristic: rotate at the natural break of the current workstream (a closed bead, a merged PR, a finished investigation). If you find yourself approaching 70% mid-workstream, that is itself a signal to ask whether the workstream wants to be subdivided into smaller beads.
Visualize your context budget
Most harnesses do not surface context-budget data prominently by default. Claude Code’s status line, configured well, can. A useful status line shows model and tier ([Opus 4.7 (1M context)]), working directory, git branch, a visual bar of context usage, the current percentage, session elapsed time, the rolling-5-hour usage budget, and the auto-mode indicator — all on a single line.
The status-line docs live at code.claude.com/docs/en/statusline. The author built a custom one by pointing the agent at those docs and describing the features wanted; the agent did the heavy lifting. Recommended for any Claude Code user — knowing your context-budget state passively, without having to ask, changes how often you remember to rotate. Similar customizations exist for Codex CLI and other harnesses.
Writing a good handoff
A handoff has three parts:
- What we were doing. The current workstream and where we are in it. Bead IDs in flight; PRs in flight; the spec being implemented; decisions already settled in this session that are not yet captured in beads, timbers, or the code itself.
- What is next. The next concrete action. “Run
just check, push, wait for Q to review” is more useful than “finish the feature.” - Open threads. Anything noticed during the work that is not the current workstream — a TODO, a related bead worth filing, a question to come back to. Bead it now if you can; otherwise capture it here so the next session can.
The team’s /dm-work:handoff slash command, where the dm-work plugin is installed, generates this in one invocation — capturing the structured form above and writing it to history/handoff-<date>.md (or the equivalent the project uses). You confirm the contents; you commit; you /clear. The next session reads the handoff first.
Without the plugin, ask the agent to draft one: “Write a handoff for this session covering the workstream we were on, the next concrete action, and any open threads worth filing.” Modern agents handle this well; read what was drafted and adjust the sections that miss something.
A handoff is the moment to verify that beads, timbers, and the working tree are all clean. timbers pending reads 0; the bead being closed is closed; there are no uncommitted changes you forgot about. The handoff is not the place to defer cleanup; it is the place to confirm cleanup happened.
When compaction is the right call anyway
The working rule is handoff over compaction. Two cases make compaction the better choice:
- A workstream that genuinely cannot be paused. A debugging session that has finally cornered a bug, where stopping to write a handoff would lose state the handoff would not capture cleanly. Compact, accept the cost, finish.
- Context built mostly of throwaway exploration. Sessions sometimes accumulate 60% context on read-only investigation that was useful at the time but is not load-bearing going forward. The compaction summary is fine for that material — the next phase of work can continue with leaner context.
Outside those cases, the answer is rotate via handoff.
The 1M context era
A note on calibration. Back when contexts were 200k practical (much of which was loaded by initialization), handoffs were frequent — every couple of hours of focused work, sometimes more often. With 1M-context Opus 4.7, the cadence is dramatically less aggressive. A full workstream often fits in a single session, and the rotation point becomes the natural break between workstreams rather than a forced break inside one.
That actually makes the 70% rule and the status-line visibility more important, not less. At 200k the constraint pressed on you constantly; at 1M you can run a long session without noticing you are 800k in until performance degrades. The status line is the cheap, passive instrument that keeps you honest.
Cross-session recovery (when something does go wrong)
When a session ends without a clean handoff — a crash, a forced restart, a /clear you wish you hadn’t — beads is the recovery surface, not chat history. The agent runs bd prime to load project context, then bd show <id> for the work in flight, and reconstructs from the durable record. Claude Code’s native rewind and resume can also recover state from inside the same session if compaction collapsed something you needed. The broader rule, stated directly: anything that needs to survive a /clear lives in beads, in timbers, or in committed code — not in the chat thread.
Subagents, deliberately
Claude Code’s subagent system lets the orchestrator spawn a fresh agent with its own context window for a scoped task. The parent gets the result; the parent’s context never sees the raw research. Used well, subagents are how a senior engineer manages parallel work without losing their own thread.
The honest pattern, after a year of practice:
- Use a subagent for read-only research that would otherwise dump 50KB of file content into the parent’s context. (“Find every place we use the polling-completion checker and tell me whether they all share the same idempotency assumption.”)
- Use a subagent for parallel reviews — one reviewer reads the diff, another reads the design doc, a third checks the tests. The orchestrator integrates.
- Use a subagent for scope-isolated implementation, when the work has a clean handoff and a clear definition of done.
- Do not use a subagent for a one-line edit. The handoff cost is the work.
- Do not use a subagent for the main thread of an open-ended exploration. You will spend more time briefing it than doing the work.
The judgment is partly taste. Spawn a subagent when the cost of bringing it up to speed is less than the cost of doing the work yourself in the parent context. That sentence is not a rule; it is a heuristic that gets sharper with practice.
What about your IDE?
Notably absent from everything in this trail is a recommendation for a particular IDE, an extension list, or a shared .vscode/ config. That is deliberate. We are non-prescriptive about IDE choice — bring your favorite, and use it as much or as little as you want.
What is worth saying out loud is what the trend looks like for some of us. The author has functionally moved past the IDE as the primary surface for code work. VSCode is open most of the time, but it is open mostly to read — markdown specs, notes, cairn drafts, occasionally code I want to skim before briefing the agent on it. The actual editing happens in the agent’s chat. Reviews happen in the agent’s chat. Even most code navigation now happens there. What is left for the IDE in this setup is what an editor was thirty years ago: a file tree with syntax highlighting and a visual search bar. What’s old is new.
Honest disclosure: a lot of contributors will look sideways at the claim that the IDE is on a path to becoming a markdown reader and not much else. That is a reasonable reaction. Direct coding and the IDE may never go away. But every major IDE vendor is currently scrambling to redesign around an agent-first user experience, which is a signal that they share the underlying intuition even when they would rather the IDE remain central. That is not a coincidence; it is the industry telling on itself about where the wind is blowing.“IDE-centric” and “agent-centric” are not symmetric framings — the agent can use the IDE; the IDE has a hard time using the agent. That asymmetry is most of why the IDE-as-host model strains under serious agent-assisted work, and most of why the harness sits where it does in the seating chart this cairn argues for.
For a new contributor: do not feel pressure to abandon your IDE, and do not feel pressure to lean on it harder than you already do. Both extremes work. The trail’s mechanics are agnostic — the agent runs just check, the gates fire on commit, the timbers entry happens whether you typed the commit yourself or the agent did. The IDE is your seat for the parts of the work the agent does not do, and you will discover for yourself, over time, how much of the work that turns out to be.
Bring whatever IDE you already love. Avoid checking a shared .vscode/ or .idea/ config into the repo without team agreement; one person’s “obvious defaults” are another person’s friction. The day you find yourself spending most of your IDE time reading rather than typing, you will know what changed.
What this cairn does not cover
Several adjacent topics are deferred to later cairns where they belong:
- The trust model — auto mode versus per-call permissions, sandbox semantics, BYOD. Your Box and Your Trust Model is the deep read on your machine and your trust posture.
- Codex specifically as a second-opinion lever. Codex as Second Opinion covers the solo-Codex-reviewer pattern and when to escalate to a council.
- Worktrees and the plugin ecosystem. Working in Parallel (Mostly) covers parallel work and the constructured-specific plugin set in progress.
- Quality gates. Quality Gates: The Contract That Lets You Move Fast covers the deterministic constraints the agent works inside.
- The full daily loop, end to end. From Plan to Pull Request ties everything together against a realistic feature.
Adjacent reading from outside this trail: The Quiet Teammate is a meditation on what extended collaboration with an agent in production actually feels like. It is more reflective than this cairn; useful as background. The Operating Q trail is the related-but-distinct read on running an agent (rather than working with one) in a production-adjacent role.
Summary
- The harness is the seat. It is the difference between an agent that suggests and an agent that executes. We hired you to direct execution.
- Recommended: Claude Code with Opus 4.7 on the 1M tier. Strongly recommended, not required. Conscious deviation is fine; accidental deviation costs you.
- Codex with GPT-5.5 is acceptable. Different priors, same shape of work. Some of us run both. Codex as Second Opinion is the second-opinion deep read.
- OAuth subscriptions, not API keys. Structural decision. The math is unfavorable for keys at our usage levels.
- The DIY tax increases with distance from precedent. Wrap-Claude-or-Codex harnesses are plausible; the team is less able to help when your harness is the part that broke.
- Orient at the start of every session. Branch, worktree, queue, agent confirmation. One minute. Saves hours.
- Rotate via handoff, not via compaction. 70% context is the soft ceiling; rotate at the natural break of a workstream. A custom status line makes the budget visible.
/dm-work:handoffdrafts the document where the plugin is installed; modern agents handle handoffs well unprompted otherwise. Compaction is the fallback. - Subagents are a power tool. Spawn one when the cost of briefing it is less than the cost of doing the work in your parent context. Otherwise don't.
- If you came in with experience on a different harness, what do you find yourself missing — and is the gap worth bridging by switching, or by importing the pattern into Claude Code?
- The OAuth-vs-API-key call is structural for us. For a contributor whose usage is much lighter than the average, would API-key billing actually win on cost, and what would change about how they would have to work?
- The 70% rotation rule is empirical, not magical. What signal does your agent give you that says "context has gotten dense"? Is it different across models, and what would convince you to lower or raise that threshold for yourself?
- What is your own rule for spawning a subagent versus continuing in the parent context? Is it the same heuristic this cairn names, or do you apply something different?
- How We Build Here — The trail's opening cairn. The "shape of work shifted up the stack" framing is the philosophical reason a harness, rather than a chat window, is the seat we direct work from.
- The Workshop — The trail's tool map. Claude Code is the agent harness layer in that tour; this cairn is the deep read on it.
- The Quiet Teammate — A reflective companion piece on what working alongside an agent in production looks and feels like over months. Useful as background reading.
- Operating Q — The related-but-distinct trail on running an agent (Q) in a production-adjacent operating role. Different problem from "agent as collaborator," same underlying technology.
- Claude Code — Anthropic's official harness. Install instructions, docs, plugin marketplace. The canonical source for the recommended stack.
- Codex CLI — OpenAI's CLI harness. Source code and documentation; the alternative this cairn endorses for daily-driver work.
- Claude Code skills system — Documentation on how skills work in Claude Code. Required reading if you plan to author or evaluate skills the team should adopt.
Generated by Cairns · Agent-powered with Claude