Pete

The Stack

All posts / Strategy (17)

Ithaca, NY

Three days deep on mcp-unifi. Started Wednesday with the new UCG-Fiber going live and the server flipping out of stub mode against real hardware for the first time. Shipped two release candidates, then v0.5.0, then v0.5.1. Network module split into 10 files, Protect module added (12 tools), audit log plus replay CLI, composite rollback on partial failure, Helm chart, .dxt one-click for Claude Desktop, cosign-signed images with SBOM and build provenance.

Spent today fixing the docs site, which had been silently producing one HTML page instead of nineteen since Astro 5. Missing content collection config, plus a Starlight bug where the draft filter dropped every entry because the schema default wasn't being applied. Found it by writing a debug page and printing what getCollection returned. Guides and reference now live at pete-builds.github.io/mcp-unifi.

Then the honest moment. Compared against the dominant UniFi MCP server out there. 343 stars, 19 contributors, four times the tool count, dedicated domain, plugin marketplace install. Not going to out-feature that in six weeks. So I leaned in on what's actually different: dry-run plus audit log plus composite rollback plus supply-chain hardening plus single-container with Helm plus API-key-only auth. Depth, not breadth.

This was always a portfolio piece more than a product. The point isn't users. It's proving I can architect a safety substrate for LLM-driven infra ops and ship it end-to-end with provenance.

pete-builds.github.io/mcp-unifi/ ↗

Had Forge audit itself today. Forge is the agent I use to build MCP servers (part of the larger system). Designs the architecture, writes the code, hardens the container, ships to the registry. It has been running for months.

Asked it to grade its own playbook against best practices. Came back with seven specific gaps. No anti-hallucination rule for external claims. No token budget enforcement. No multi-client smoke test, only Claude Code. No FastMCP version pinning policy. Reflection check was one line. Lessons file path undocumented. No quarterly re-audit cadence on public repos.

Forge proposed a v2 with each gap closed as a discrete edit, marked with explicit ADD or REPLACE blocks so the diffs apply cleanly. I approved. It applied them to its own definition file. Playbook went from 258 to 304 lines.

The interesting part: every gap was something I had been manually fixing in spawn prompts every time I called the agent. The audit just made the patches permanent so I stop typing them.

Agents that audit themselves and apply the fix are the real move. Tools that build tools.

Built 20+ named agents on Claude Code over the past year. Each one has a domain, a risk tier, structured output contracts, and lane discipline. Forge builds MCP servers. Tank runs the homelab. Coach commands editorial for The 53 Report. Keeper handles production servers. Radar audits client sites. Outreach manages prospect email. Etc.

The trick isn't more agents. It's mandatory routing in CLAUDE.md. When a request matches an agent's domain, you route to it. No 'I have context, I'll just handle it myself.' That's the rule that keeps the system from collapsing into one bloated assistant.

Risk tiers separate read-only from production-write. Forge can push container images but won't deploy to a server without my call. Keeper requires double-confirmation for the WordPress sites with revenue on them. PreToolUse hooks block exfiltration patterns at the tool level, before any agent gets a chance to run a bad curl.

Each agent has a skill file with full instructions, a registry entry with metadata (risk tier, MCP tool access, file write scopes, SSH targets), and a coordination map for cross-agent handoff. It reads more like an org chart than a prompt library.

Most people use Claude Code as a coding assistant. This is something different.

The 53 Report is live. Full tech stack: SQLite, MCP server, Claude Code agentic workflow for the editorial pipeline, Astro 5, Docker on a Hetzner VPS. Here is how it all connects.

The data layer is the SQLite database from post 045. Every draft pick since 1980, weekly rosters since 2002, per-game snap counts since 2012. About 1.3 million rows. A pick counts as a hit if the player produced 500 or more snaps in any single regular season, the line where they spent at least one year as a real rotational contributor (we started at 100 snaps and tightened the bar after publishing the first three articles).

On top of that sits an MCP server running in Docker on nix1 over Tailscale. Eight tools: team draft hit rate, round hit rate, round trends heatmap, roster composition, pick outcome for a single selection, player career arc, player search, and a database health check. The server runs SSE at port 3711 and gets queried by Claude Code during every editorial run.

The editorial pipeline is where it gets interesting. Four stages: Scout, Beat, Editor, Coach. All running inside Claude Code as custom skill agents.

Scout is read-only. It hits the MCP and returns a structured evidence pack with three to five ranked angles. No prose, no opinions, just numbers and angle proposals, ranked by anomaly vs. league, anomaly within team, regime shift signals, single-pick stories, and counter-narratives.

Beat takes the evidence pack plus the approved angle and writes the article. Every number has to be traceable to Scout's pack or a clean derivation from it. No new numbers, no player names Scout didn't surface. Targets 1,800-2,600 words depending on shape, with narrative and data woven together in every section.

Editor is the stat-fidelity gate. It reads Beat's draft against Scout's pack and returns PASS, REVISE, or BLOCK. A hallucinated stat is an automatic BLOCK. No league rank gets through without the raw value, population denominator, and era window in the same sentence.

Coach orchestrates the whole run. It reads the publishing calendar, picks the next queued team, spawns Scout, presents angles, hands the approved one to Beat, runs Editor, and calls the deploy script only after explicit approval. Never ships without that sign-off.

The product is GM Performance Grading: how well NFL general managers draft and retain talent. Three article shapes: scorecard (tenured GM, four graded columns, final letter grade), narrative (paradox or anomaly, no grade), and methodology (league-wide framing, no team focus). Three published pieces so far, twenty-nine teams queued.

The site is Astro 5, static build, deployed via rsync to Zion (Hetzner VPS, Plesk-managed). DNS through Cloudflare, proxied, Full Strict SSL. Build is clean in under ten seconds.

Long-term target is a paper for SSAC 2027 (abstract due around October 2026) and a staff or contributor role at an NFL team analytics group or a shop like SumerSports or The 33rd Team. The dataset edge is the window: Dubow's AP piece used a 2021-2024 window with binary roster data. SIS used first-round picks only with a four-year endpoint. This stack goes multi-year, snap-weighted, and position-weighted across every round.

Next up: interactive analysis with logins and custom date range filters. After that, a longer story-driven piece on the BNM blog, less technical, more about how this came together.

the53report.com ↗

Spent the evening auditing and overhauling my entire agent ecosystem against Anthropic's published best practices. 20 agents, all getting smarter.

The process: kicked off a Research agent to pull every relevant Anthropic engineering post (Building Effective Agents, Context Engineering, Writing Tools for Agents, Effective Harnesses). Then ran parallel Explore agents to map every skill file and context doc in the system. Fed all of that into a Plan agent to design the changes.

Four phases shipped:

  1. Progressive disclosure for the four largest context docs. Tank, Keeper, Link, and Forge were loading 400-900 lines of context on every invocation. Now they load a slim core (infrastructure overview, key IPs, safety rules) and pull in reference files only when the task actually needs them. Troubleshooting docs, runbooks, and architecture patterns stay out of context until relevant.

  2. Standard error handling across all 20 agents. Three rules: retry once then report, never claim success if something errored, and if blocked, say what worked and what didn't. Simple, but none of them had it before.

  3. Pipeline handoff contract for the prospect workflow. Radar, Signal, Ghost, and Morpheus now share a formal data contract defining exactly what fields pass between each stage. No more informal handoffs.

  4. State persistence for infrastructure agents. Tank and Keeper now write checkpoints during multi-step work so they can resume if a session dies mid-task instead of starting over.

The whole thing ran in about an hour. Plan mode kept the scope tight. Three background agents split the context docs in parallel while I handled the other phases. No content lost, just reorganized for efficiency.

The key insight from Anthropic's docs: context is a finite resource with diminishing returns. Every line your agent loads that it doesn't need is stealing attention from the lines it does.

Ran the new pipeline on 8 Finger Lakes wineries and breweries tonight. Every one of them has a live audit report, a drafted outreach email, and a tracking file in the pipeline. Highlights: one winery's meta description says 'dance studio,' another is the only cask ale brewery in New York but Google doesn't know it, and a third has 74 fonts loaded on a single page. The Finger Lakes wine trail deserves better websites.

Spent the afternoon hardening the new production server. SSH lockdown, automated offsite backups with encryption, OS-level security updates on autopilot, WAF verified active, HSTS on all domains, database tuned, and external uptime monitoring watching everything at 5-minute intervals. Also wired up server-side resource alerts. Went from 'Plesk is installed' to 'production-ready' in one session. The old CentOS box is looking more disposable by the day.

Built a one-command prospect audit system using Claude Code. I type /radar and a domain, and about 5 minutes later there's a branded, client-ready report page live on my website. Here's what happens under the hood.

4 agents, each with a Matrix codename:

  1. Radar (the orchestrator) coordinates everything. It receives the target domain, dispatches the two scanning agents in parallel, collects their results, generates the report, builds the site, deploys it, and verifies it's live.

  2. Niobe (SEO audit) runs niobe-scan.py, a Python scanner I wrote that checks 7 categories: technical SEO, on-page SEO, performance, structured data, security headers, content freshness, and local SEO. It crawls the site, pulls headers, checks SSL certs, parses meta tags, tests page speed, looks for schema markup. Outputs structured JSON with scores and findings.

  3. Seer (brand positioning audit) runs seer-scan.py, another Python scanner that evaluates 4 categories: first impression, messaging and voice, digital footprint, and brand cohesion. It analyzes the homepage hero, CTA placement, color palette, navigation complexity, and does web searches for competitive context. Also outputs structured JSON.

  4. Keeper (production deployment) handles the GoDaddy VPS. After the report page is generated, the Astro site gets built and rsync'd to the production server where brooksnewmedia.com is hosted.

The custom scripts:

  • niobe-scan.py: Python script with UA rotation, rate limiting, and Cloudflare detection. Checks SSL, headers, meta tags, structured data, sitemap, robots.txt, page speed, content dates, NAP consistency, Google Business Profile.
  • seer-scan.py: Similar architecture but focused on brand signals. Analyzes hero content, CTA presence, color count, font consistency, nav complexity, social presence, review platforms.

Both scripts output JSON that Claude Code's agents parse and interpret. The agents add the qualitative analysis: what the numbers mean, what to prioritize, how to pitch the fix.

The report itself is an Astro page generated from a template, styled to match my Brooks New Media site. It's a hidden page (not in navigation), so I can share the direct URL with the prospect as a leave-behind. Letter grades, color-coded score cards, a merged top-10 priority list, and a custom 3-month "here's how we'd fix this" pitch at the bottom.

The whole thing runs from Claude Code in my terminal. One command, four agents, two Python scanners, one Astro template, one production deploy.

Also tonight: spun up a new production server on Hetzner. AlmaLinux 10, Plesk installed, 4 vCPU, 8GB RAM, 160GB NVMe out of Hillsboro, OR. Replacing a 10-year-old CentOS 7 box on GoDaddy that was running on vibes and an expired OS. Going from $86/mo to about $28/mo with better hardware and a supported OS through 2035. Named it Zion.

Audited 25 Finger Lakes wineries and breweries across Seneca and Cayuga Lake. Eight scored a D or worse. The best find: a well-known Seneca Lake winery whose Google search description says 'A professional dance studio website.' Someone installed a WordPress theme and never changed the default template text. Google has been telling visitors this winery is a dance studio. That is the kind of thing nobody notices until someone actually checks.

brooksnewmedia.com/blog/finger-lakes-winery-brewery-website-audit/ ↗

Built a second audit tool. Niobe checks if search engines can find you. Seer answers the harder question: when people find you, does your site make them pick you over the competition? Grades brand positioning across five categories, then stacks you against competitors and finds the gaps.

Ran it on myself first. brooksnewmedia.com: C (76.8). Strong messaging, weak digital footprint. Then ran five Ithaca competitors. Two are ghosts. One makes ROI claims with zero proof. I don't need to outspend them. I need to out-present them.

Two audit tools now. Two data-backed conversation starters for every discovery call.

Ran a full backup audit today. Found gaps: missing Docker volumes, no retention policy on one backup set, a dead service in the health check. Fixed all three, then wrote a disaster recovery runbook. Six phases, every command copy-pasteable. If the server dies, we rebuild from the NAS in under four hours. Next step: off-site backups to a remote server so a single point of failure does not take out everything.

Had a realization tonight. We were pricing our monthly plan way too low for what we deliver: dedicated hosting, 2 blog posts, weekly strategy calls, SEO reporting, maintenance, and priority support. We are delivering agency-level output. Pulled the pricing off the site. Moving to custom plans and targeting wineries, hotels, and tourism businesses in the Finger Lakes. Aim higher.

Rewrote the entire brooksnewmedia.com homepage and services page. New origin story about starting with BandsThatJam.com in 2007, the Buffalo music scene, GrassRoots Festival giving us our first photo passes, and the move to Ithaca in 2018. Services restructured around what we actually deliver: web design, automation, SEO, and a monthly growth plan with weekly strategy calls. Killed the old generic copy. This is our story now.

brooksnewmedia.com ↗

Wrote the entire Brooks New Media business playbook today. Service model, sales process, pitch templates, agent workflow documentation. The free SEO checkup form on the site is now the front door. Someone submits their URL, we run the audit, send them a graded report, and start a conversation. No cold calls, no spam. Just data that speaks for itself.

Why now? Because the old site was a brochure. It listed services and had a contact form. That is it. No story, no personality, no reason for someone to care. We started Brooks New Media because we loved music and wanted to help our community grow online. The website should reflect that. Not just what we do, but why we do it and where we came from. Buffalo summers, GrassRoots Festival, BandsThatJam.com, Giant Panda Guerilla Dub Squad, the move to Ithaca. That is the real story. Time to tell it.