What If the Docs Wrote Themselves?

TL;DR

Updating documentation is hard to get right. Code changes quickly, documentation lives somewhere else, and keeping the two aligned usually depends on somebody remembering to update both systems.

Imagine changing code and having the user documentation update itself from the same source of truth. That means less drift, more consistent coverage, and less time spent maintaining duplicate descriptions.

We are moving toward that model in Opteryx by extracting API metadata from the source repository into JSON, then using the documentation repository to turn that structured data into user-facing documentation. It is still a work in progress, but it is already improving the quality of the metadata in the codebase.

If you are maintaining technical docs by hand, this is the prompt to start moving them closer to the code.

The Problem

Documentation drifts.

That is especially true for product areas that change frequently, and it is particularly visible in systems like Opteryx where the user-facing surface includes:

data types
functions
operators
APIs

These areas already exist as structured concepts in the engine, but the documentation for them often lives elsewhere, maintained separately, and updated later.

That creates a familiar set of problems:

docs and implementation fall out of sync
small API changes are easy to miss
coverage becomes uneven
quality depends on whether someone remembered to update a second system

None of this is unusual. But it is a signal that the documentation process is too detached from the thing being documented.

The Shift in Approach

The direction we are moving toward is to document the thing at the thing.

Instead of treating the documentation site as the primary source of truth, we are treating the source code and its metadata as the starting point.

The current shape of the pipeline is:

source repo
  -> extract API metadata
  -> generate JSON manifest
  -> hand off to docs repo
  -> render user documentation

The point is not just automation for its own sake.

The point is that the engine already knows a great deal about its user surface. If we can extract that information in a structured form, we can generate documentation more consistently and with less manual duplication.

Why This Matters

There are a few immediate benefits to this approach.

First, it improves consistency.

If data types, functions, and operators are described through a common metadata structure, the resulting documentation becomes more uniform. The same kinds of information can appear in the same places, in the same format, across the site.

Second, it improves coverage.

Manual documentation tends to accumulate around the most visible or most recently changed features. Generated documentation makes it easier to see what is missing because gaps in metadata become explicit.

Third, it improves hygiene in the codebase.

Once annotations and inline descriptions are surfaced directly to users, poor metadata becomes much more obvious. That creates healthy pressure to improve the implementation details that describe the API surface.

This is one of the most useful side effects of the work. The tooling is not just producing docs, it is encouraging higher-quality API definitions.

What We Are Generating

Today, the focus is on structured user-facing elements such as:

Many of these are already created programmatically inside Opteryx.

That is useful because it means we are not starting from an unstructured codebase. There is already a model of these concepts in the system. The work now is to improve the metadata around them so that the model is rich enough to support good documentation.

In practice, that means making sure each documented element can expose things like:

name
signature
description
supported argument patterns
return behavior
examples
notes on semantics or edge cases

Not every part of that is complete yet, and some of the existing definitions have issues. But the important point is that these problems become easier to identify and more straightforward to fix once the documentation pipeline depends on structured metadata.

The Trade-Offs

There is no real shortcut here.

Generated documentation only works well if the underlying metadata is good. If the source descriptions are weak, incomplete, or inconsistent, then the generated output will be weak, incomplete, or inconsistent too.

So the trade-off is clear:

less manual duplication later
more discipline required in the source now

That is a trade worth making.

It shifts effort away from maintaining duplicate descriptions in separate systems and toward improving the definitions closest to the implementation. Over time that should make both the code and the documentation better.

What This Changes Culturally

This is not just a tooling change.

It changes where documentation work happens.

Instead of thinking of docs as a separate writing task that happens after engineering is done, this pushes us toward treating API metadata as part of the engineering work itself.

That is a better fit for technical surfaces that evolve quickly. It means changes can be described where they are introduced, reviewed where they are implemented, and surfaced to users through a repeatable pipeline.

For a project like Opteryx, that is a more scalable model than relying on manual synchronization.

What’s Next

The immediate next step is to keep improving the extraction pipeline and the metadata behind it.

The generated JSON is only useful if it captures enough meaning for the docs repository to produce documentation that is genuinely helpful to users.

That means we still need to refine:

the completeness of API annotations
the consistency of field definitions
the shape of the JSON manifest
how examples and semantic notes are represented

This is still work in progress, but the direction already looks right.

If documenting an API programmatically exposes weak metadata, that is not a failure of the approach.

That is the approach doing its job.