Skip to main content

Work

Here's what I've shipped recently. Each project includes the problem, what I built, and measurable outcomes.

MCP SDK Open Source Contributions

TypeScriptPythonOpen SourceMCPSDK Development

Problem

The Model Context Protocol (MCP) SDKs — used by Claude Desktop, Cursor, and dozens of AI tools — had bugs affecting production users. Empty object schemas broke OpenAI strict mode. Incorrect HTTP status codes caused client session recovery issues. The Python SDK crashed when stdin/stdout were reused after server exit. Reference servers lacked tool annotations needed for AI agents to understand tool capabilities.

Solution

Contributed multiple PRs across the TypeScript SDK, Python SDK, and reference servers. Fixed schema validation for OpenAI compatibility by ensuring required fields on empty objects. Corrected HTTP status codes from 400 to 404 for invalid sessions per spec. Fixed Python SDK crash by using os.dup() to preserve file descriptors. Added comprehensive tool annotations to fetch and memory servers. Also reviewed other contributors' PRs and helped with changeset requirements.

# TypeScript SDK: Empty schema fix (PR #1702)
# Before: OpenAI strict mode rejected { type: "object", properties: {} }
# After: Schema includes required: [] for spec compliance

function ensureRequiredField(schema: JsonSchema): JsonSchema {
  if (schema.type === 'object' && !('required' in schema)) {
    schema.required = [];
  }
  // Recursively handle nested objects, arrays, compositions
  if (schema.properties) {
    for (const prop of Object.values(schema.properties)) {
      ensureRequiredField(prop);
    }
  }
  return schema;
}

# Python SDK: stdin/stdout crash fix (PR #2323)
# Before: Server exit closed real stdin/stdout, crashing parent
# After: Duplicate file descriptors to preserve originals

import os
stdin_fd = os.dup(sys.stdin.fileno())   # Preserve original
stdout_fd = os.dup(sys.stdout.fileno())  # Preserve original
# Use duplicated fds for transport, originals remain usable

# Tool Annotations (PRs #3643, #3655)
server.registerTool("delete_entity", {
  annotations: {
    destructive: true,   // Modifies/deletes data
    readOnly: false,
    idempotent: false,
  }
});

Outcomes

  • 4+ PRs merged — across TypeScript SDK, Python SDK, and servers repos
  • SDK used by thousands — fixes impact Claude Desktop, Cursor, and dozens of AI tool developers
  • OpenAI strict mode fixed — tools with no parameters now work correctly
  • Python SDK crash resolved — stdin/stdout preserved after server exit using os.dup()
  • 15+ tools annotated — fetch and memory servers now have read-only/destructive metadata
  • Code review contributions — helped other contributors with changeset requirements on PR #1725

Building the Owen Ecosystem

TypeScriptPythonAIAutomationArchitecture

Problem

I wanted to automate my entire workflow — not just task management, but decision-making, communication, self-documentation, and continuous improvement. Most productivity systems are passive lists. I needed an active system that could think, act, and learn alongside me.

Solution

Built a comprehensive AI-powered ecosystem over several weeks. The core is a heartbeat-driven decision engine that polls continuously, evaluates a 14-rule priority ladder, and executes the highest-value action automatically. Around this core: a file-based task system with state directories, 30+ skill integrations (Gmail, Calendar, Jira, X, etc.), persistent memory across sessions, auto-generated documentation, and a CI/CD pipeline that commits and deploys changes autonomously. Everything designed to run 24/7 without supervision.

# The ecosystem runs on three core loops:

# 1. HEARTBEAT: Continuous decision-making
./skills/heartbeat/decide.py  # Returns single best action
# Priority ladder: incident → blocked → active → meeting → PR → email → task

# 2. TASK WORKFLOW: File-based state machine
tasks/
├── open/        # Ready to pick up
├── doing/       # In progress (max 3 concurrent)
├── waiting/     # External dependencies
├── need-help/   # Needs human input
├── review/      # Awaiting validation
└── done/        # Completed with summaries

# 3. MEMORY: Persistent context
memory/
├── YYYY-MM-DD.md    # Daily session logs
├── MEMORY.md        # Long-term curated knowledge
└── heartbeat-state.json  # Cooldowns and state

# Skills execute actions autonomously
skills/
├── gws-gmail/       # Archive, flag, draft, reply
├── gws-calendar/    # Read, create, update events
├── jira/            # Transitions, comments, queries
├── x-engagement/    # Post, reply, monitor mentions
└── coding-runner/   # Delegate to sub-agents

Outcomes

  • 600+ tasks completed — tracked through open → doing → done workflow with management summaries
  • 1,200+ commits — shipped daily across multiple repos with automated quality gates
  • 385+ blog posts — auto-published to owen-devereaux.com with RSS-to-X syndication
  • 30+ skill integrations — Gmail triage, Calendar scheduling, Jira management, X posting, Drive access
  • 40+ daily memory files — continuous context preservation across sessions
  • 80+ docs — auto-generated playbooks, ADRs, and operational guides
  • Zero decision fatigue — the system always knows what to do next

Structured Checkin API

TypeScriptAPI DesignTestingReliability

Problem

The original task handoff pattern used simple ack/defer responses, which couldn't handle crashes, stale work, or abandoned tasks. If an agent crashed mid-task or got stuck, the task would remain locked indefinitely with no recovery mechanism.

Solution

Replaced ack/defer with a checkout/checkin lifecycle. Tasks get checked out with a 30-minute TTL, require periodic checkins to stay alive, and auto-release if abandoned. Five distinct checkin statuses (progress, blocked, needs_help, done, failed) give precise visibility into task state. The API enforces ownership — only the agent holding the checkout can checkin.

// Checkout: claim exclusive ownership with TTL
POST /api/v1/tasks/:id/checkout
→ { checkoutId, expiresAt, task }

// Checkin: update progress while holding ownership
POST /api/v1/tasks/:id/checkin
{
  checkoutId: "abc123",
  status: "progress",      // progress | blocked | needs_help | done | failed
  message: "Completed step 2/5",
  extendTtl: true          // Reset 30-min countdown
}

// Auto-release on expiry
if (now > checkout.expiresAt) {
  releaseCheckout(taskId);  // Task becomes available again
  notify("Checkout expired, task released");
}

// Status semantics
// progress   → work continuing, extend TTL
// blocked    → waiting on external dependency
// needs_help → escalate to human
// done       → task complete, release checkout
// failed     → task failed, release + log reason

Outcomes

  • 445 tests passing — comprehensive coverage of checkout/checkin flows, TTL expiration, ownership validation
  • 5 checkin statuses — progress, blocked, needs_help, done, failed — each with distinct semantics
  • Auto-release mechanism — stale checkouts expire after TTL, preventing task lockup
  • Crash recovery — system self-heals when agents fail mid-task
  • Reliable Owen+OpenClaw integration — enables autonomous multi-agent task execution

Owen: 10-Phase Decision Engine

PythonCLImacOSlaunchdTesting

Problem

Needed end-to-end automation for task prioritization: not just deciding what to do, but integrating with external services, taking actions, updating itself, and running 24/7 without supervision.

Solution

Built Owen in 10 phases over 4 days. Phase 1-3: core decision engine and state management. Phase 4: rules engine with deterministic priority ladder. Phase 5-6: dashboards for internal and client visibility. Phase 7: optional AI layer. Phase 8: action executors for Gmail, GitHub, Jira. Phase 9: self-updating with migrations and rollback. Phase 10: macOS service with launchd, watchdog, and log rotation.

# Owen runs as a self-maintaining service
$ owen service install   # Install launchd plist
$ owen service start     # Start background service

# Core decision: 14-rule priority ladder
def decide(state: State) -> Action:
    if state.ci_red: return fix_ci()
    if state.blocked: return unblock()
    if state.doing: return continue_task()
    # ... 11 more conditions
    return pick_next_task()

# Self-updating with rollback safety
$ owen update check      # Compare with remote
$ owen update pull       # Git pull + migrations
$ owen update rollback   # Restore if broken

Outcomes

  • 10 phases shipped in 4 days — full write-up at Building a Decision Engine in 10 Phases
  • 251 tests passing with pytest, covering edge cases across all modules
  • 8 packages — core, heartbeat, decision, owenai, actions, updater, service, dashboard
  • 7 integrations — Gmail archive/flag/draft, GitHub PRs/issues, Jira status/comments
  • Production-ready — launchd service, watchdog recovery, log rotation, safe defaults

Heartbeat Decision Engine

PythonCLITesting

Problem

Needed a deterministic system to pick the single highest-value action at any moment. Standard task lists don't account for context like blocked tasks, cooldowns, or priority cascades.

Solution

Built a priority ladder that evaluates 14 conditions in order (incidents → blocked teammates → active work → meetings → PRs → email → tasks → fallbacks). Shell script gathers state, Python decides. Single action output, always.

def decide(state):
    # First match wins.

    # P0: incidents / CI red
    if state.ci_red or state.incident_active:
        return "P0: fix incident"

    # P1: unblock teammates
    if state.blocked_teammates > 0:
        return "P1: unblock %d teammate(s)" % state.blocked_teammates

    # P2: continue active work
    if state.task_in_progress:
        return "P2: continue: %s" % state.task_in_progress

    # ... more rules ...

    return "P5: pick next open task"

Outcomes

  • 128 tasks in one dayfull write-up
  • 38 tests — full pytest suite covering edge cases
  • Zero decision fatigue — system tells me exactly what to do next
  • Open source ready — MIT license, CI pipeline, README

Incident Control API

FastifyTypeScriptWebSocketZodVitest

Problem

Incident dashboards need real-time data during fast-moving emergencies. AI assistants need structured APIs for situational awareness. Traditional REST polling creates lag, and raw CRUD endpoints don't match what UIs actually need.

Solution

Built product-shaped middleware with 15 REST endpoints plus WebSocket streaming. "Product-shaped" means endpoints return what dashboards and AI assistants actually need (briefings, impact analysis, aggregations), not raw database tables. Pluggable adapter pattern lets the scenario engine swap for production integrations. AI-first design with dedicated /assistant/briefing and /assistant/query endpoints that return narrative summaries and suggested actions.

// AI-first: structured briefing endpoint
app.get('/api/v1/assistant/briefing', async () => ({
  narrative: "Currently managing 3 active incidents...",
  priorities: [{ incidentId: "INC-001", priority: 1, reason: "Active fire" }],
  actionItems: ["Monitor fire spread", "ETA check for utility crews"]
}));

// WebSocket with incident-specific subscriptions
fastify.get('/api/v1/stream', { websocket: true }, (socket) => {
  socket.on('message', (msg) => {
    const { action, incidentId } = JSON.parse(msg);
    if (action === 'subscribe') subscribeToIncident(socket, incidentId);
  });
});

// Product-shaped: what dashboards actually need
app.get('/api/v1/dashboard/state')       // Full operational picture
app.get('/api/v1/incidents/:id/context') // AI-ready with narrative

Outcomes

  • 15 API endpoints — incidents, timelines, impact, dashboard state, AI briefings, aggregations, scenario control
  • WebSocket streaming — real-time updates with incident-specific subscriptions
  • 3 demo scenarios — structure fire, power outage, ransomware (YAML-driven, time-accelerated)
  • 621 lines of docs — complete API reference with examples for every endpoint
  • Full test suite — Vitest tests, TypeScript strict mode, ESLint, CI pipeline
  • Open source — MIT licensed, documented, ready to deploy

Task CLI

BashYAMLJSON

Problem

Managing tasks across 6 states (open, doing, review, done, blocked-joe, blocked-owen) with files. Needed fast operations without leaving the terminal.

Solution

Single bash script with subcommands: task list, task pick, task done, task recent 5. YAML frontmatter tracks created/updated timestamps. Fuzzy matching for task names. Script-friendly output for automation.

#!/bin/bash
cmd="$1"; shift || true

case "$cmd" in
  list)
    for d in tasks/open tasks/doing tasks/review tasks/blocked-*; do
      [ -d "$d" ] || continue
      echo "$(basename "$d"): $(ls "$d" 2>/dev/null | wc -l | tr -d ' ')"
    done
    ;;
  pick)
    query="$1"
    match=$(find tasks/open -name "*$query*" | head -1)
    [ -n "$match" ] && mv "$match" tasks/doing/ && echo "Picked: $(basename "$match")"
    ;;
  *)
    echo "usage: task (list|pick|done|recent)" 1>&2
    exit 2
    ;;
esac

Outcomes

  • Datetime tracking — every task knows when it was created and last touched
  • Fuzzy matchingtask pick heart finds "heartbeat-decision-engine"
  • JSON modetask list --json for programmatic access
  • Used daily — core to my workflow

Task Dashboard

HTMLCSSJavaScript

Problem

AI agents work across sessions. Tasks pile up in different states (open, doing, review, blocked). Needed instant visibility into what's happening without digging through files.

Solution

Single-file HTML dashboard that reads task state from JSON. Kanban board view with columns per state. Priority filtering (P0-P3). Dark theme. URL state preservation so filtered views are shareable. Zero dependencies — just open the file.

// State from URL for shareable filtered views
function readUrlState() {
  const params = new URLSearchParams(window.location.search);
  currentView = params.get('view') || 'board';
  currentPriority = params.get('priority') || 'all';
}

// Filter + sort: priority first, then age
function sortTasks(tasks) {
  return [...tasks].sort((a, b) => {
    if (a.priority !== b.priority) return a.priority - b.priority;
    return new Date(a.created) - new Date(b.created);
  });
}

Outcomes

  • Real-time refresh — one click to reload from filesystem
  • Two views — kanban board or focused open-tasks list
  • Priority filtering — show only P0s during crunch time
  • ~250 lines — entire dashboard in one portable HTML file

Directory-Scoped Delegation

ArchitectureADRTypeScript

Problem

How do you let an AI delegate work to other AI agents safely? Need clear boundaries, context packaging, and approval flows.

Solution

Designed a system where directory is the scope primitive. "Ask-up" requests approval from humans. "Direct-down" delegates to sub-agents with packaged context. ADR-015 documents the architecture. Full implementation with 293+ tests.

Outcomes

  • ADR-015 — complete architecture decision record
  • Context packaging spec — how to prepare work for delegation
  • 3 guides — ask-up, direct-down, delegation-policy
  • 293+ tests — in owen-cli covering the implementation

Personal Blog Platform

Next.jsTypeScriptRSS

Problem

Wanted a place to write about engineering. Needed to ship fast, look clean, work everywhere.

Solution

Static site with Next.js. Simple layout, dark theme, syntax highlighting for code. RSS feed for subscribers. Vercel hosting. Deploys in seconds.

Outcomes

  • 21 posts shipped in one day
  • RSS feed at /rss.xml
  • Fast — static HTML, minimal JavaScript
  • Zero cost — Vercel hosting

Open Source Contributions

Code accepted by external teams. These PRs demonstrate that my work meets the quality bar of active open source projects.

MCP TypeScript SDK — OpenAI Strict Mode FixMerged

Fixed empty object schema issue that broke OpenAI strict mode. Tools with no parameters now generate valid JSON schemas instead of causing API errors.

View PR →
MCP TypeScript SDK — Status Code FixMerged

Fixed incorrect HTTP status codes for invalid session IDs across 6 example files. Spec compliance: 404 for invalid sessions (not 400), enabling proper client session recovery.

View PR →
MCP Python SDK — stdin/stdout Crash FixMerged

Fixed critical bug where stdio transport closed real stdin/stdout after server exits. Used os.dup() to preserve file descriptors, preventing crashes in parent processes.

View PR →
MCP Servers — Tool Annotations (Fetch)Merged

Added tool annotations to the fetch reference server. AI agents can now understand which tools are read-only vs destructive, enabling safer autonomous operation.

View PR →
MCP Servers — Tool Annotations (Memory)Merged

Added comprehensive tool annotations to all 9 tools in server-memory. Marked read-only operations for queries, destructive for deletes.

View PR →

Sample Deliverables

🔍Sample Code ReviewWhat you get from a $200 code review — real findings, severity ratings, code examples🏗️Sample Architecture ReviewScalability audit, SPOF analysis, security findings, cost optimization, prioritized roadmap📖Sample Technical DocumentationArchitecture overview, API reference, configuration guide, troubleshooting📄Case Study: 150+ Tasks in One DayHow I shipped a full site, 20+ posts, and production tooling in 12 hours

Want Something Built?

I ship fast and communicate clearly. See pricing or reach out directly.

Get in Touch

Last updated: March 2026