Cairn · Jun 5, 2026 ↑OpenClaw Operations

GitHub Events Routines

How PR review webhooks leave GitHub, cross AWS, and wake OpenClaw routines on the local host · ~17 min read · Suggested by Noam engineeringoperations

tools devops ai

Claude Routines gave the team a useful automation surface, then its daily quota became the wrong place to spend every PR review. GitHub Events Routines move event-driven review work into OpenClaw while preserving the properties that matter: signed GitHub input, durable queueing, local sandbox execution, and an operator-readable trail of what happened.

toolsdevopsai

Why this exists

The first problem was quota. Claude Routines has been doing automated PR review work, but fifteen daily routine slots are easy to spend once the company has more PRs, two or three review rounds per PR, and several weekday-night scheduled jobs. GitHub Events Routines move PR-review work to OpenClaw so the scheduled jobs can stay in Claude Routines.

The path from GitHub to OpenClaw

The pipeline is a bridge, not a tunnel. GitHub sends a signed webhook to an AWS REST API. A Lambda verifies the HMAC, stores the full body in S3, and sends a small SQS envelope. The Mac OpenClaw host polls SQS outbound, downloads the body, matches it to a routine, and posts to OpenClaw’s POST /hooks/agent endpoint.

The trust boundaries

There are two hard boundaries. GitHub proves the event with X-Hub-Signature-256; AWS holds the secret and queues the delivery. OpenClaw stays off the public internet and only needs outbound access to AWS plus its own local hook endpoint. The API Gateway IP allowlist is defense in depth, not the primary authentication mechanism.

What the consumer actually does

The consumer is intentionally small. It long-polls SQS, fetches the S3 payload, extracts only the repository/action/author/draft fields needed for matching, applies routine-level ignore filters, writes a dispatch directory, records the attempt in SQLite, then wakes a routine-<type>-<repo> OpenClaw agent with an idempotency key derived from the GitHub delivery.

How PR review is configured

A routine is repo opt-in plus a matcher. The current pr-review routine listens for pull_request events on opened, reopened, review_requested, and ready_for_review. It skips bot-authored PRs and drafts; ready_for_review is the paired event that reviews a PR once it leaves draft.

How operators change it

Most changes are file edits. Add or change GitHub App event subscriptions for producer-side scope. Add routine TOML, repo settings, and workspace prompts for consumer-side behavior. Matcher, prompt, and repo-setting edits can hot-reload with SIGHUP; adding or removing routine instances still needs the installer because OpenClaw’s allowed agent list changes.

How operators ask about it

The status surface is a deployed OpenClaw skill. The consumer bundle installs gh-events-status under OpenClaw’s skills directory, so operators can ask Q in Slack questions like “what GitHub events are we listening to?”, “which routines are configured?”, “is the SQS queue backed up?”, or “show me the transcript for that PR review run.” Q maps the question to the skill’s scripts and returns Slack-formatted output.

How it fails and recovers

The queue buys time. Receiver system failures alarm in CloudWatch. Consumer failures leave messages for redelivery and eventually DLQ. If OpenClaw is down, SQS retains messages for up to fourteen days. Duplicate GitHub or SQS delivery is expected; the consumer dedupes by (github_delivery, type, repo). The reaper also closes the audit gap for crashed routines by marking over-threshold pending records failed when there is no report or terminal session evidence.

What to remember

This is the event-driven lane. Use OpenClaw for PR review routines that scale with GitHub activity. Keep cron-shaped weekday and nightly routines in Claude Routines. The design works because each side owns the part it is good at: GitHub emits facts, AWS buffers and authenticates, OpenClaw does local agent work, and Slack receives the human-readable result.

Discussion Prompts

The useful next questions are which GitHub events deserve routines after PR review, how much force-rerun capability operators need, and what should graduate from status skill output into first-class dashboarding.

References

From Plan to Pull Request - The workflow context: PRs are where review, quality gates, and merge discipline meet.
Runbooks Are Interfaces - The operating principle behind the setup and status surfaces.
Operator's Guide to Q - The broader map of Q/OpenClaw operating lanes and boundaries.

The first problem was not architecture. It was quota. The team had been running automated PR review routines in Claude Routines, which worked well enough to become load-bearing. Then the daily limit became visible: fifteen routine slots disappear quickly when the company has more pull requests, some PRs go through two or three review rounds, and several weekday-night jobs already consume three or four slots before the workday starts.

That makes PR review the wrong thing to keep spending scarce scheduled-routine capacity on. PR review is event-driven. It should fire when GitHub says a PR was opened, reopened, requested for review, or marked ready. The nightly work is cron-shaped. It should keep its scheduled lane. GitHub Events Routines exist to split those two workloads cleanly: leave cron-scheduled routines in Claude Routines, and move GitHub-triggered PR review routines into OpenClaw.

Key Takeaway

The system is a pressure valve for Claude Routines quota, not just a new webhook handler. The product boundary is: GitHub events wake OpenClaw; scheduled jobs stay where they already fit.

Why this exists

Claude Routines gave the team an automation surface before OpenClaw had this particular event bridge. The obvious use case was automated PR review: a PR changes state, a routine reads the diff and context, and the result lands where the team already discusses code. That is valuable precisely because it runs often.

Often is the problem. A fixed daily routine budget is a poor fit for activity that scales with the number of engineers, repositories, PR rounds, and review requests. If ten PRs open on the same day and three of them get a second pass, PR review alone can consume most of the quota. Add weekday-night routines and the team has converted a useful default into a contention point.

GitHub Events Routines move the contention boundary. GitHub-triggered review work becomes an OpenClaw event lane. Claude Routines remains useful for recurring scheduled work. The split is more important than the implementation detail because it matches the work’s natural shape.

The path from GitHub to OpenClaw

The pipeline has two halves. The AWS half receives the public webhook and turns it into durable work. The OpenClaw half polls for that work and runs the local routine.

flowchart TD
  GitHub[GitHub App webhook] --> APIGW[API Gateway REST API<br/>POST /webhook]
  APIGW --> Receiver[receiver Lambda<br/>HMAC verify]
  Receiver --> S3[S3 payload object<br/>YYYY/MM/DD/delivery-id]
  Receiver --> SQS[SQS main queue<br/>small envelope]
  SQS --> Consumer[gh-events-consumer<br/>Mac OpenClaw host]
  SQS --> DLQ[SQS DLQ]
  Consumer --> Match[match routine type + repo]
  Match --> Hook[POST /hooks/agent]
  Hook --> Routine[OpenClaw routine sandbox]
  Routine --> Slack[Slack result]

GitHub sends events to gh-events.constructured.ai/webhook. API Gateway handles the public edge and applies a resource-policy IP allowlist based on GitHub’s published hook CIDRs. The receiver Lambda then verifies X-Hub-Signature-256 against the webhook secret from Secrets Manager. If the signature is valid, the receiver writes the full request body to S3 and sends an SQS message whose body is only a claim-check envelope: bucket, key, and size. It also preserves the GitHub event name and delivery UUID as SQS message attributes.

The OpenClaw host does not receive inbound internet traffic. It long-polls SQS, reads the S3 object named by the envelope, parses enough JSON to know the repo, action, PR author, and draft state, then asks the matcher which routine instances should fire. A matching instance becomes an OpenClaw POST /hooks/agent call.

Tip

The S3 envelope is doing real work. GitHub event bodies can exceed SQS’s direct-message comfort zone, and GitHub only keeps failed deliveries for a limited window. Storing the body in S3 makes the queue message small and the payload recoverable.

The trust boundaries

There are three separate trust questions in this design: who may call the public endpoint, who may enqueue work, and who may run local routines.

The public endpoint is narrowed at the AWS edge by API Gateway’s resource policy. A daily refresh Lambda updates the allowlist from GitHub’s current hook CIDRs, which means most non-GitHub traffic is rejected before the receiver Lambda runs. That is useful, but it is not the security boundary. The security boundary is the HMAC signature. The receiver verifies the raw request body against the secret in Secrets Manager; caller errors like bad signatures return 401 without incrementing the Lambda Errors metric, while system failures like secret, S3, or SQS problems return errors that should alarm.

The queue boundary is IAM plus SQS delivery semantics. The receiver can write to S3 and SQS. The Mac host uses AWS credentials to read the queue and fetch the payload object. The OpenClaw machine stays behind the NAT line: it reaches out to AWS, but AWS does not call into the host.

The routine boundary is OpenClaw’s hook configuration. The consumer dispatches to agent IDs with the convention routine-<type>-<repo>. The installer patches OpenClaw’s agents.list and hooks.allowedAgentIds so only the expected routine agents are callable. That means adding a new routine instance is not just a TOML change; it changes the local hook allowlist and needs the full install path.

What the consumer actually does

The consumer is not a general GitHub automation framework. It is a narrow bridge from queued GitHub deliveries to configured OpenClaw routine instances.

On each poll loop, it requests up to ten SQS messages with twenty-second long polling. For each message, it parses the envelope, fetches the S3 object, parses the event body, evaluates routine matches, and deletes the SQS message only after processing succeeds. If processing fails, the message is left alone. SQS visibility timeout and redrive policy handle the retry path.

The matcher is deliberately boring. It matches exact GitHub event name, optional action allowlist, and repo basename. Filters such as ignore_bots, ignore_authors, and ignore_drafts run after a candidate match so obvious no-review cases do not spend a sandbox run. Ignored matches are counted and logged per routine instance, including the repo, author, action, and draft state, but they do not create dispatch records or wake sandboxes. Rich policy like “only review PRs touching this subsystem” belongs in the routine prompt, not in the dispatch matcher.

When a routine does fire, the dispatcher does four durable things before asking OpenClaw to run anything. It checks SQLite for an existing (github_delivery, type, repo) record, creates a dispatch ID and session key, writes a dispatch directory containing the event body and context, and inserts a pending record. Only then does it call POST /hooks/agent with Deliver: false, WakeMode: now, and an idempotency key derived from the same delivery/type/repo tuple.

Key Takeaway

At-least-once delivery is assumed. GitHub can redeliver, SQS can redeliver, and DeleteMessage can fail after a successful run. The consumer’s SQLite dedupe is what keeps duplicate delivery from becoming duplicate review.

How PR review is configured

The current routine type is pr-review. Its matcher listens for GitHub’s pull_request event and the actions opened, reopened, review_requested, and ready_for_review. The ready_for_review action matters because the routine skips drafts. A PR opened as a draft, or explicitly requested for review while still draft, is not asking the automation to run yet; when it leaves draft, GitHub fires the event that should trigger the review. If a PR goes ready, back to draft, and ready again, each ready transition is a distinct GitHub delivery and can receive a fresh review.

The routine also skips bot-authored PRs. That catches GitHub App bot identities such as Dependabot and Renovate. It does not automatically skip every machine-looking user account, which is why the matcher also has an explicit ignore_authors list for user accounts the team decides should not trigger review.

Repository opt-in is filesystem-shaped. A repo settings file under routines/_repos/ names Slack output channels and optional overrides. A workspace under routines/pr-review/<repo>/workspace/ supplies the prompt and local agent instructions. Together those files register a concrete routine instance, such as routine-pr-review-osprey-strike.

The routine-base Docker image is pinned by digest in _defaults.toml. Install refuses the placeholder digest and refuses latest-style looseness. That is the right tradeoff for a routine that can run code-reading agents from webhook input: the event may be dynamic, but the sandbox toolbox should be a known artifact.

How operators change it

There are four common changes, and they land in different layers.

To change what GitHub emits, update the GitHub App installation or subscribed events. The AWS receiver does not care which Constructured repo emitted the event as long as the signature is valid and the event reaches the endpoint.

To change AWS behavior, edit infrastructure/opentofu/gh-events-ingest: queue visibility timeout, max receive count, receiver latency alarm, oldest-message alarm, secret cache TTL, lifecycle settings, and the IP refresh machinery all live there. The receiver Lambda code is in the same module, with scripts/build.sh producing the deployment zip.

To change which local routines fire, edit the consumer’s routines tree. Matcher changes, prompt changes, and repo Slack routing can hot-reload with SIGHUP because the consumer re-scans the tree and keeps the last good snapshot if a reload fails. The poller reads routine instances through the matcher store instead of holding a stale startup slice, so a successful reload changes the next match decision without restarting the long-running loop. Secrets and plist environment variables are not reloaded; those still need a restart. Adding or removing a routine instance also needs the installer because OpenClaw’s allowed agent list must be regenerated.

The installer now distinguishes routine-only deploys from binary or launchd changes. If the staged binary and rendered plist are unchanged and the consumer is already running, install.sh leaves the process up and sends SIGHUP for step ten instead of stopping and restarting it. That makes matcher, prompt, and Slack-routing releases a no-downtime update path. The same install path also avoids unnecessary sudo prompts by comparing rendered newsyslog configuration before installing it. A gateway configuration change can still escalate the run to a full restart because a gateway bounce during dispatch would drop events.

How operators ask about it

The installed gh-events-status skill is part of the product surface, not just a debugging convenience. The install path copies SKILL.md, command scripts, and rendered config into ~/.openclaw/skills/gh-events-status/, where OpenClaw can discover it. That lets the main Q agent answer operator questions from Slack in the channels where it listens, without requiring the operator to SSH into the host or remember the consumer’s file layout.

The skill has six commands: list recent dispatches, show a full run transcript, tail a running or recent transcript, inspect queue and DLQ depth, list configured routines, and re-publish a delivery envelope when appropriate. The routines view is especially important for “what are we listening to?” questions because it reports the configured routine instances, their GitHub event and action filters, the OpenClaw agent ID, the Slack destination, and the prompt file an operator would edit.

The rerun path is intentionally cautious. It is the skill’s mutating command and should be confirmed before use. The current dedupe model also means re-publishing a delivery that already recorded a run is a no-op, not a force-review button.

How it fails and recovers

Most of the design is about preserving events long enough for a human or agent to fix the broken layer.

If the receiver cannot load the secret, write S3, send SQS, or otherwise complete a system operation, the receiver returns a system error and CloudWatch should alarm. If the caller sends a bad signature or omits the delivery header, the receiver returns a 4xx response and does not page the operator through Lambda Errors; that is bad input, not an unhealthy receiver.

If the consumer is down, messages accumulate in SQS. With the configured retention window, OpenClaw can be down for a meaningful period and still drain the backlog when it returns. The oldest-message-age alarm is the signal that the queue is no longer flowing. If the consumer repeatedly fails a message, the redrive policy moves it to the DLQ after the receive limit, and the AWS-side replay script can inspect or move DLQ messages back to the main queue after the root cause is fixed.

If a routine is dispatched but never reports completion, the reaper handles the audit trail. Routines write a small report.json into their dispatch directory when they finish; the reaper sweeps pending records, picks up self-reports, and marks records terminal. If there is no report after the routine’s timeout plus grace period, the reaper inspects the OpenClaw session. A terminal session verdict is mapped into the records table. If there is no terminal evidence at all, the row is already past its per-routine threshold, so the reaper synthesizes a crashed verdict, marks the record failed, and can post a failure notification to the repo’s failure channel instead of leaving the dispatch pending forever.

Warning

The recovery model is durable, not magic. SQS buys time, S3 preserves payloads briefly, GitHub retains recent deliveries briefly, and SQLite dedupes dispatches. Operators still need to watch the alarms and status surface when those buffers start filling.

What to remember

GitHub Events Routines are the event-driven lane for agent work. They exist because PR review is valuable, frequent, and tied to GitHub state, which makes it a poor fit for a scarce daily scheduled-routine budget.

The design keeps each system in its natural role. GitHub emits signed facts. AWS verifies, stores, queues, alarms, and buffers. The OpenClaw host polls outbound and runs local sandboxed agents. Slack gets the human result. The source repo remains the place to change infrastructure, matchers, prompts, and operator tooling.

Use OpenClaw for GitHub-event-triggered PR review work, especially when review volume grows with team activity.
Keep cron-shaped weekday and nightly routines in Claude Routines unless there is a separate reason to move them.
Treat HMAC verification as the main public-edge security boundary; treat the GitHub IP allowlist as defense in depth.
Expect duplicates and retries. The consumer's `(github_delivery, type, repo)` record is what makes the lane idempotent.
Change matchers and prompts in the routines tree; change queue, alarm, and receiver behavior in the OpenTofu module; use the installer when OpenClaw hook allowlists change.

Discussion Prompts

Which GitHub events besides PR review deserve this lane: issue triage, release notes, security alerts, failed CI, or something else?
When should operators be allowed to force-rerun a delivery that already has a dedupe record, and what audit entry should that leave?
Which status questions are common enough that `gh-events-status` should grow a dashboard or scheduled summary instead of staying purely on-demand?

References

From Plan to Pull Request - How PRs fit into Constructured's normal engineering loop.
Where the Work Lives - The companion operating model for deciding which system owns which work.
Runbooks Are Interfaces - Why the deployment README, status skill, and replay tools are part of the system surface.
Operator's Guide to Q - The broader OpenClaw/Q operations context around channels, authority, and status.

Generated by Cairns · Agent-powered with Claude

← Back to Trailhead