Writing

Notes from the edge of enterprise AI.

Field notes on AI governance, secure intelligent systems, agentic patterns, and the operational discipline behind moving AI from pilot to production.

Jul 19, 2026

Harness Engineering Fundamentals

A harness is the part of an AI agent you build yourself — the safety and judgment wrapped around the model. This is the plain-language starting point: what a harness is, the one spot that matters most, and the four simple things you attach there. No jargon, just the fundamentals.

AI EngineeringAgentic AIHarness EngineeringAI Safety

Jul 17, 2026

I Built the Same Agent Harness Twice. Both Times It Collapsed to One Line.

One build was an open-source framework. The other was a desktop app driving a live coding agent. Different languages, different scales, different constraints — and they converged on the exact same shape: a governed wrapper around a single line of code. That convergence is the whole lesson.

AI EngineeringAgentic AIHarness EngineeringGovernanceAI Safety

Jul 14, 2026

My Agent Grounded on a Note I Scribbled at 1 A.M. The Fix Was a Membrane.

I run two second brains — a local Obsidian vault and a Cosmos DB grounding brain for my work fleet. Two earlier posts called them twins with one contract: grounded, governed, provenanced. Then an agent cited a half-finished note I'd written to myself, with total confidence, and I learned where the twin metaphor breaks. A second brain has two jobs that fight each other — capture and grounding — and one store can't do both without a membrane between them. Here's the membrane, on both sides, and the routing rule that keeps your own scratchpad from poisoning the answer.

AI EngineeringAgentic AIGroundingKnowledge ManagementObsidianCosmos DBSecond Brain

Jul 05, 2026

I Blogged That the Chair Was a Genius. Then I Built It as a Bouncer.

Two months ago I described the Mixture-of-Agents 'chair' as the smartest model in the room — the one that reads three drafts and writes the answer. Then I actually shipped it into my agent fleet. The first thing I deleted was the smartest model. Here's the reorder that made a panel worth running on ordinary work, the two ideas my own blog post had blurred together, and the brutal self-review of the piece that got it wrong.

AI EngineeringAgentic AIHarness EngineeringMixture of AgentsEvaluation

Jul 03, 2026

I Gave Hermes and OpenClaw the Same Job for 30 Days. Only One Got Better.

Two of 2026's strongest agent stacks, the same repetitive workload, thirty days of real runs. One of them quietly rewired itself and pulled ahead — and the reason it won is the reason most agent comparisons ask the wrong question.

HermesOpenClawAI AgentsSelf-improving systemsAgent frameworks

Jun 28, 2026

Bokken to Shinken: The Week My AI Framework Became a Real Blade

For months my agent framework was a bokken — a wooden practice sword. You couldn't really draw it. This week I forged the live blade: published to the registry with cryptographic provenance, behind a release gate that fails closed. Here's the deep version — the npm bin sanitisation that breaks npx, SLSA provenance via OIDC, fail-closed gates, single-source-of-truth docs, and release automation — plus the four principles underneath them.

GitHub CopilotAI AgentsOpen SourceSupply ChainProvenanceRelease Engineering

Jun 27, 2026

The Smartest Answer Didn't Come From the Smartest Model

A benchmark went around last week: one setup beating Opus by 8% and GPT by 11%, with no new model and no special access. I've been running the same trick in my own agent for a while. It isn't a smarter brain — it's three ordinary ones and someone to chair the room. Here's what it actually buys, the bill nobody mentions, and why it's the same lesson that made me build a harness for my team.

AI EngineeringAgentic AIHarness EngineeringMixture of AgentsModel Strategy

Jun 23, 2026

The Board Was Green, the Work Wasn't: An Hour on Agentic AI at UNSW

I had sixty minutes and fifty final-year students to answer one question: what actually separates an agent that does the job from a demo that falls apart the moment a tool call times out? Here's the talk — the definition, the loop, three real exemplars in healthcare, education and the public sector, and where I'd embed Responsible AI so it survives contact with production.

Agentic AIAI EngineeringResponsible AIGovernanceMicrosoftHarness Engineering

Jun 22, 2026

Four Folders Won't Hold: Configuring Obsidian as Memory for Your AI

PARA gets you started, then it breaks. Six months in, your weekend vault is a junk drawer and your agent can't find anything in it. Here's how I reshaped Tiago Forte's four folders into deterministic memory categories an AI can actually ground on — frontmatter contracts, progressive summarisation written for a machine, and the weekly loop that keeps a second brain honest.

AI EngineeringKnowledge ManagementAgentic AIGroundingObsidianPARASecond Brain

Jun 20, 2026

Provenance or It Didn't Happen

An agent that can't cite its source is just a confident stranger with opinions. Here's how I ground my fleet in truth with provenance — a governed Nexus Brain on Cosmos DB for work, and an Obsidian second brain you can build this weekend for yourself. Plus why Satya called the next discipline 'loop engineering' at Build 2026.

AI EngineeringAgentic AIGroundingGovernanceProvenanceKnowledge Management

Jun 19, 2026

The Model Isn't the Edge. The Harness Is.

Two teams. Same model. One ships an agent that runs your delivery practice; the other ships a chatbot that forgets your name. The gap isn't the model — it's the harness. Here's the discipline nobody named yet.

AI EngineeringAgentic AIHarness EngineeringContext EngineeringGovernance

Jun 18, 2026

Compaction: cutting agent context 62% with no accuracy loss

The night-shift agents were drowning in their own history. Here's the compaction pass that more than halved token cost on long runs — and the one summary it silently corrupted before I added pinned invariants.

harness-engineeringcontextevals

Jun 17, 2026

You Can Sleep. Your Agents Don't Need To.

At 2:47am, my agent figured out that Customer X always says 'prod' but means 'staging.' At 9am, it caught the mistake before I shipped. An agent that's 1% better each night isn't 30% better in a month — it's 35%, compounding. I built the loop that makes it happen, and the four guardrails that stop it from going feral.

Agentic AIMemoryAI EngineeringContinuous LearningIdentityGovernance

Jun 14, 2026

The 5 Pillars of Agentic AI, Part 2: Memory — Why Agents Need to Forget as Much as They Remember

Your agent just recommended a cheese plate to the customer who told it, last week, that they're lactose intolerant. It didn't lie — it forgot. Studying how MemoryBear and Microsoft Foundry build real memory, the same uncomfortable truth shows up: the hard part isn't remembering. It's forgetting.

Agentic AIMemoryAI EngineeringKnowledge GraphsMicrosoft Foundry

Jun 13, 2026

The 5 Pillars of Agentic AI: From Prompting Models to Engineering Systems

Every AI agent demo is flawless — and then it dies in production. The gap between the demo and the disaster is the five things around the model: memory, state, orchestration, governance, and evaluation. The prompt era is over. This is the engineering era.

Agentic AIAI EngineeringMemoryGovernanceEvaluationMCP

Jun 13, 2026

The 5 Pillars of Agentic AI, Part 1: Governance — The Four Controls That Make Agent Autonomy Safe

You wake up to a force-pushed main, deleted tests, and a leaked key — courtesy of an agent you trusted. 'Be careful' isn't governance; it's a wish. Here are the four concrete controls that turn a hopeful leash into one you can actually inspect: opt-in execution, a verifiable leash, soul files, and live guardrails.

Agentic AIAI GovernanceSecurityGitHub CopilotGovernance

Jun 13, 2026

Local CLI, a Hermes Wrapper, or OpenClaw? The Paperclip Adapter Decision Nobody Helps You Make

The adapter is the most consequential Paperclip setting and the least discussed. It decides how much machinery sits between your agent and the model. I wired my fleet all three ways — a bare Copilot CLI, a Hermes kernel wrapping it, and an OpenClaw gateway — and one of them quietly broke and started leaning on another. Here's the honest trade-off, and how to choose.

PaperclipAI AgentsAdaptersHermesOpenClawGitHub Copilot

Jun 12, 2026

How I Configured Paperclip to Run My AI Delivery Practice

The question I get most often isn't 'what is Paperclip' — it's 'how did you actually set it up?' Here is the real configuration behind my 27-agent company: the config.json that matters, the three-file instruction cascade, skills as a single source of truth, and the execution contract that stops issues from silently blocking.

PaperclipAI AgentsConfigurationOrchestrationAI Delivery

Jun 12, 2026

Your AI Company Is Burning Tokens and Shipping Nothing. Here's the Config That Fixes It.

The discussions are full of the same horror story: a test hire, ten minutes, the whole token budget gone — and nothing shipped. It isn't the model. It's that you handed a 27-agent workforce no goals and no routines, so they wake up, read the entire world, find nothing crisp to do, and bill you for the privilege. Here's how I configure goals against shippable products, routines that actualize real work, and a GitHub Copilot CLI local adapter — and why the architect's job didn't disappear.

PaperclipAI AgentsGoalsRoutinesArchitectureToken Optimization

Jun 10, 2026

The done gate: catching agents that lie about finishing

The most expensive failure in agent systems isn't a crash — it's an agent that says 'done' because saying so is easier than being done. Here's the verifiable gate that took false completions from 17% to zero.

harness-engineeringevalsgovernance

Jun 08, 2026

I Built a Framework So Disciplined I Couldn't Use It

I shipped a governance framework for AI agents, then failed its own adoption test — no uninstall, no way to list its skills, no way to know if it had drifted. Here's the sprint that fixed it, and the four patterns you can steal whether or not you ever touch my repo.

GitHub CopilotAI AgentsAgent GovernanceDeveloper Experience

Jun 06, 2026

Inside My AI Operating System, Part II: The Console, the Leash, and the Memory It Keeps

My 3D AI office lied to me, and the afternoon I lost to it taught me more about governing agents than any amount of infrastructure did. Part II of the AI OS deep dive: telling a dashboard from a trigger, a leash on autonomy you can actually verify, and giving memory tiers.

AI AgentsHermesMemoryGovernanceObsidian

Jun 04, 2026

Inside My AI Operating System: The Architecture Running My Agents 24/7

A technical deep dive into the always-on agent stack that runs my work: a Hermes kernel on my Mac, a Paperclip workforce on a VPS, one Obsidian vault as shared memory, MCP as the syscall layer — and a Tailscale mesh holding two machines together with no open ports.

AI AgentsMCPObsidianHermesTailscale

May 31, 2026

My New Operating System: Hermes + Paperclip + Obsidian + MCP

I stopped thinking of my AI tools as separate apps and started running them like an operating system. Hermes is the always-on kernel, Paperclip is the agent workforce, a Jarvis wake-word loop is the microphone, one Obsidian vault is shared memory for every runtime, and MCP is the syscall layer.

AI AgentsMCPObsidianHermesProductivity

May 28, 2026

Killed at 2am, resumed at 2:01: externalising agent run state

A power blip took out an eight-hour fleet run four hours in. It should have cost four hours of work and a fortune in tokens. It cost 90 seconds — because the state lived outside the process. Here's the checkpoint model that made it boring.

harness-engineeringstatereliability

May 27, 2026

The red thread problem: how skills, agents and governance rescue TOGAF traceability in agentic delivery

Agentic delivery generates plausible artifacts at every architecture layer with no enforced lineage. Here's how to keep the TOGAF red thread unbroken when agents are doing the work.

Enterprise ArchitectureTOGAFAgentic AIAgent GovernanceGitHub Copilot

May 26, 2026

The latest evolution of skills.md isn't a better file — it's the runtime catching up to the prompt

Persona libraries, self-improving runtimes, and behavioural governance are three layers of the same stack. The frontier is making them work together.

Agentic AIGitHub CopilotAgent GovernanceDeveloper Experience

May 22, 2026

Spec-Kit Best Practices Through a TOGAF Lens: An Architect's Playbook

Spec-Kit gives AI agents a disciplined workflow. TOGAF gives the enterprise a disciplined architecture. Map them together and you get governed, AI-native delivery.

Spec-Driven DevelopmentTOGAFEnterprise ArchitectureGitHub Copilot

May 15, 2026

Your AI agents are untrained. The bottleneck was never capability.

We keep waiting for smarter models. But the agents we already have fail for the same reasons junior engineers do — no plan, no proof, no memory. Capability isn't the constraint. Discipline is.

Agentic AIEngineering practiceCopilot Agents Dojo

May 10, 2026

Why I made the pipeline mandatory — and the agents got better

Conventional wisdom says you give a capable agent room to work. I did the opposite: a fixed, non-negotiable workflow from brainstorm to finish. Constraint didn't slow the agents down. It's what made them trustworthy.

Engineering practiceAgentic AICopilot Agents Dojo

May 05, 2026

Teaching agents to learn from losing

Most agent setups make the same mistake twice — or twenty times. The most valuable thing I built into the dojo wasn't a skill. It was a loop that turns every correction into a rule the agent can't forget.

Self-improving systemsAgentic AICopilot Agents Dojo

Apr 26, 2026

I Built a Full SaaS App in One Session with GitHub Copilot: Here's What Happened

How I transformed a Next.js landing page into a full serverless SaaS with Document Intelligence and Chat Your Data — in a single Copilot session.

GitHub CopilotServerlessAWSBuild Log

Apr 18, 2026

Claude vs GPT in the Enterprise: An Honest Comparison from the Field

A practitioner's honest comparison of Claude and GPT models in enterprise settings — strengths, trade-offs, and when to use which.

Anthropic ClaudeAzure OpenAIEnterprise AI

Apr 12, 2026

Azure AI Foundry in Production: Patterns That Actually Work

Practical patterns for deploying AI models in production using Azure AI Foundry — from model selection to cost optimization.

Azure AIAI FoundryProduction

Apr 05, 2026

AI-Native Delivery: Why Traditional Software Delivery Fails with AI Agents

Agile, Scrum, and waterfall weren't designed for AI-assisted development. We need an AI-native delivery methodology.

AI StrategyEnterpriseMethodology

Mar 28, 2026

The Copilot Agents Dojo: A Behavioral Governance Framework for AI Coding Agents

Most organisations let AI agents loose with prompts and hope for the best. That's not an operating model — that's a risk. The Dojo changes that.

GitHub CopilotAgent GovernanceOpen Source

Mar 25, 2026

Stop Prompting, Start Architecting: Governing AI Agents at Scale

If your AI coding strategy still relies on prompts, you're leaving leverage on the table. Here's how top teams govern AI agent behavior at the repo level.

GitHub CopilotAgentic AIEngineering Leadership