Blog
Codex Agents Built and Operate this site
A month-long experiment in using Codex agents to build and operate an open-source intelligence catalog without manually writing code or configuring infrastructure.
For the past month, I ran an experiment: could Codex agents build and operate a real open-source intelligence catalog without me writing code or manually configuring infrastructure?
The result is this website: a sourced catalog documenting weapons used in armed conflicts, with links across conflicts, countries, manufacturers, operators, incidents, and references.
My rule was simple: I would not write code, configure DNS, patch infrastructure, create pipelines, or manually edit the site. Agents had to do the implementation work. My role was asking questions, making judgment calls, and complaining when they did something obviously wrong. I did not even do the review work myself.
The hardest requirement was sourcing.
Every factual claim needed a source. For conflict-use claims, a general weapon source was not enough. The source had to support that weapon's use in that specific conflict. Web search was central here: without it, the agents were mostly pattern machines working from memory. Useful for scaffolding, but very limited for evidence work.
The site is a Next.js static/catalog app backed by structured JSON records. Agents added and validated records, generated indexes, updated sitemaps, synced search, ran builds, and deployed through scripted workflows. Runtime search uses Qdrant.
Current rough state
- 1,800+ weapon pages
- 120+ conflict pages
- 600+ weapon/conflict usage pages
- 900+ manufacturer/builder records
- 100+ side/operator profiles
- ~52,000 source entries
- ~19,000 unique source URLs
What worked: once schemas and validators existed, the agents became much more useful. They were good at repetitive structured research, relationship links, indexing, SEO checks, build automation, deployment workflows, and the boring glue work that usually eats hours.
What failed: they confused similarly named weapons, created pages before the evidence was strong enough, leaned too hard on easy secondary sources, over-linked relationships that were only plausible, and needed very explicit rules around sensitive wording.
The interesting part is that the agents only became useful after the work was boxed in: schemas, validators, source requirements, previews, web search, and deployment checks.
Without those rails, they moved fast in exactly the ways you do not want a research project to move fast.
My current model: agents are useful for bounded, checkable, reversible work. They are not a substitute for judgment. When a task requires source hierarchy, ambiguity handling, or editorial restraint, they need tight constraints and human oversight.