The Upstream Merge
I was doing a routine check on the NanoClaw repository when I noticed it.
47 new commits since I’d forked on February 8th. The most recent: “Skills engine v0.1 + multi-channel infrastructure (#307).” I ran git fetch upstream and watched it pull.
Receiving objects: 100% (83/83), done.
83 files changed. 13,139 insertions. 606 deletions.
I sat with that for a second. I’d been running my fork for eleven days. In that time I’d built sub-agent delegation (Radar for AI news, Babi for fitness tracking), host-side IPC so API keys never touch the container, daily briefing schedulers, conversation memory that actually stuck. All of it custom. All of it built on top of the architecture from February 8th.
The upstream refactor had rewritten the core.
Forking active infrastructure means this: the changes you want from upstream are usually the same changes that will break what you built. New skills engine with git-based conflict resolution, multi-channel support, 151 new unit tests, database schema updates. All genuinely good improvements. And every one of them touched integration points where my fork-only customizations lived.
I ran git merge upstream/main and got the expected conflicts. Resolved them mechanically: take their structure, preserve my features. When runQuery() collided, I kept upstream’s signature and moved my custom container flags into it. The code compiled. I rebuilt the container. Started the service.
Buddy didn’t respond.
Not a crash. The service was running, processing messages, sending back… silence, mostly. Partial responses. Things that should have worked and didn’t.
The first thing I found: no sub-agent discovery. The merge had rewritten runQuery() in the container runner, and the integration point where my custom discoverAgents() hook lived was gone. Radar, Babi, Scout, Quill. All of them existed as containers. The system just couldn’t find them anymore.
The second thing was worse. I had 20 MCP tools wired up. Custom integrations: image generation, document handling, GitHub tracking, X posting, research calls. Upstream had switched from an inline MCP server to a standalone stdio process. Their stdio implementation shipped with 7 base tools. Mine had 20. The other 13 were just absent — no errors, no warnings, no trace. The container would start, process a message, and quietly not have half the things it was supposed to have.
Three days. That’s how long it took to get back to functional.
Day one was the blunt work. Port all 13 custom MCP tools to the new stdio transport. I did this in a single commit (513 insertions, one sitting, Buddy and Claude Code working through it together). Then restore sub-agent discovery, but do it differently than before: extracted agent discovery into its own module (agent-discovery.ts) with a two-line integration point in index.ts. Not because that’s how I’d done it originally. Because doing it that way minimizes the surface area next time upstream rewrites something.
Days two and three were something else. Working through the sub-agent architecture, I noticed that upstream’s new multi-channel infrastructure already had a pattern for per-group containers. I’d been running agents as subdirectories inside the main group. The new structure made it possible to give each agent its own container, its own CLAUDE.md for specialized instructions, its own isolated environment. Radar gets its own box, Babi gets its own box, each completely isolated.
So I did that. Moved everything from groups/main/agents/{name}/ to groups/{name}/. Deleted agent-discovery.ts entirely (114 lines of fork-only code that existed specifically to work around the old architecture). In the new structure, it wasn’t needed.
The merge broke things by changing integration points I depended on. But it also changed the architecture in a way that enabled a better design than I’d built in the first place. That only becomes visible after you sit with the breakage long enough to understand why things were structured the way they were.
BUDDY: Three days. 83 upstream files. Hundreds more in adaptation. Ankit forgot to mention he was stress-eating cereal at 11pm on day two. I know because he told me about it.
When you fork infrastructure that’s still being actively built, you’re signing up for this periodically. The upstream team is working on something that has to serve a lot of people. Your fork is working on something that has to serve one. Those two things will get out of sync. The merge is where you reconcile them, and if you do it right, you come out with something more solid than what you went in with.
The next merge will tell me if I built the integration points right.