How removing 3000+ lines of content made our docs more accurate
5 min read
Forgive the slightly clickbaity title, but it feels like you need a hook to pull people into the boring truisms of documentation.
Truisms like:
- As a writer, you’re often the first (and only) person to walk through a process end-to-end.
- Frequently asked questions frequently become “asked questions”, expanding far beyond their original scope.
- More content isn’t always better.
This story is about the last point here, with a particular focus on how docs relate to other sources of truth.
One massive set of CLI docs
Section titled “One massive set of CLI docs”In the Cloudflare docs, we had a problem with specific piece of our documentation, the Wrangler command-line interface (CLI) ↗.
You can use the Wrangler CLI to interact with every piece of the Cloudflare Developer Platform, such as creating a new key-value store in Workers KV ↗.
To do this in the Cloudflare dashboard, you would:
- Log into the Cloudflare dashboard ↗.
- Select your account.
- Go to Storage & Databases > Workers KV.
- Selecting Create Instance.
- Enter a name.
- Select Create.
Or, if you were using Wrangler, you’d type this command into your terminal:
npx wrangler kv namespace create [NAMESPACE]Developers are often already in the terminal, so a short command like this can be easier than navigating through a UI path. This preference means that Wrangler (and its associated documentation) is an incredibly important part of our developer experience.
And that importance has only grown over the years, leading to:
- Over 300 Wrangler commands ↗
- Supporting 15+ products
- Over 4000 lines in one MDX file
Maintained without automation
Section titled “Maintained without automation”Pretty sure the tech writers in the room knew this was coming.
We maintained this important, massive set of documentation almost entirely by hand.
It was painful on several levels, including:
- No testing, meaning we depended on users reporting issues to fix them.
- Frequent merge conflicts, because this single file was one of the busiest areas of our repo ↗.
- Coordinating doc updates with not just another repo ↗, but specific releases ↗ of a new Wrangler package.
Those are standard problems with manually maintained documentation, but CLI’s often have a unique problem, fragmentation.
Within Wrangler, you can use the --help flag to get more information about a specific command.
npx wrangler kv namespace create --helpOutput
Create a new namespace
POSITIONALS namespace The name of the new namespace [string] [required]
GLOBAL FLAGS -c, --config Path to Wrangler configuration file [string] --cwd Run as if Wrangler was started in the specified directory instead of the current working directory [string] -e, --env Environment to use for operations, and for selecting .env and .dev.vars files [string] --env-file Path to an .env file to load - can be specified multiple times - values from earlier files are overridden by values in later files [array] -h, --help Show help [boolean] -v, --version Show version number [boolean]
OPTIONS --preview Interact with a preview namespace [boolean] --use-remote Use a remote binding when adding the newly created resource to your config [boolean] --update-config Automatically update your config file with the newly added resource [boolean] --binding The binding name of this resource in your Worker [string]Because it was defined in a different place, this CLI help output often differed from the manually-written documentation on our developer documentation site. This difference created ambiguity: users didn’t always know which one to trust.
Instead of a single source of truth, we had multiple, conflicting ones.
One definition to rule them all
Section titled “One definition to rule them all”We solved this problem by collapsing the “docs only” source of truth.
And we did it using a solution that was sitting in front of us the whole time, the version of Wrangler already installed in our docs site.
- Wrangler already had specific conventions ↗ around defining commands.
- The Wrangler team then made these commands available via an API,
experimental_getWranglerCommands(). - Using the version of Wrangler installed in our docs, we hit that API ↗ to grab the command definitions.
- Using this data, we can build two different components:
- WranglerCommand ↗ gets and displays information about a specific command, such as
npx wrangler d1 execute. - WranglerNamespace ↗ takes this implementation a step further, getting all the commands for a given namespace and displaying them in a list.
- WranglerCommand ↗ gets and displays information about a specific command, such as
The implementation of this solution — thanks for leading, Jun Lee ↗ — took quite a few PRs and back and forth with teams to make sure the CLI arguments had the latest information, but was an unqualified success.
With this new pattern, we:
- Removed 3000+ lines of code from our repo.
- Automated testing, because we pulled from a programmatic source of truth.
- Simplified the update workflow (
Just update the Wrangler version in our docs site). - Saved a metric ton of engineering time.
Appendix - Why not an agent?
Section titled “Appendix - Why not an agent?”During this phase, we also thought about implementing this as an AI agent ↗ that worked to keep our docs in sync with the Wrangler source of truth.
The workflow isn’t that difficult to imagine:
- Change hits the Wrangler SDK.
- Agent “sees” the change, processes it, and then submits a change request to our docs repository.
- Writer / engineer review and approve.
- Merge changes.
What this workflow still includes — however — is a human in the loop. Someone’s going to need to be auditing those AI changes for potential hallucinations. That introduces work (and cost) that just doesn’t need to happen, given how our docs relate to this specific source of truth.
When you have a 1:1 relationship between two destinations — often a reference ↗ topic in Diataxis terms — you can often rely on auto sync, or programmatically pulling that information from one location to another.
It’s only when that relationship gets more abstract (or you don’t have control of the source of truth) that you’d want to start thinking about an AI sync approach.