# Public Audit Note: `karpathy/autoresearch` From the Greyforge Systems Lens

Date: 2026-04-08

## Public Redaction Note

This version is intentionally scrubbed for public release.

It omits:

- internal machine names
- local file paths
- service names
- runtime lease details
- private implementation notes
- unpublished operational counts and state surfaces

The goal is to preserve the reasoning while removing proprietary or environment-specific detail.

## Executive Verdict

Greyforge does not plan to integrate `karpathy/autoresearch` and does not currently plan to borrow patterns from it.

That is not because the repo is bad. It is because the repo solves a much smaller problem than the one Greyforge is already working on.

`autoresearch` is a compact self-editing benchmark loop. Greyforge is focused on the broader architecture around autonomous research systems: supervision, routing, bounded execution, review surfaces, durable artifacts, memory discipline, and operator control.

Under that standard, `autoresearch` is interesting as a reference specimen but not compelling as an adoption target.

## What `autoresearch` Does Well

The repo is unusually clear about its core loop:

- the human writes `program.md`
- the agent edits `train.py`
- the run gets a fixed time budget
- the metric decides whether the change is kept or discarded

That clarity is real. It is one reason the project has attracted such visible attention.

## Why Greyforge Is Not Adopting It

Greyforge uses a stricter adoption bar for outside patterns and systems.

A candidate should:

1. solve a live bottleneck
2. generalize beyond one narrow setup
3. add a capability Greyforge cannot already assemble from existing primitives

`autoresearch` does not currently clear those bars for Greyforge.

The reasons are straightforward:

- the loop is tightly scoped to one narrow form of self-editing research
- the public posture is hardware-specific enough that it does not generalize cleanly
- the pattern assumes a simpler authority and control model than Greyforge uses
- the repo does not address the broader governance layer Greyforge treats as central

## What Greyforge Already Prioritizes

Greyforge is already focused on the systems layer above the loop:

- research supervision
- provider and capability routing
- bounded execution
- review and promotion discipline
- durable notes and memory surfaces
- operator-governed control planes
- artifact preservation

This is the harder problem once a system moves beyond a narrow overnight benchmark experiment.

## Strict Pattern Review

### Fixed-budget experiment loop

This is the strongest idea in `autoresearch`.

It is attractive because it:

- keeps runs comparable
- limits agent thrash
- creates a clean keep-or-discard rule

Greyforge still does not consider it a current borrow case because the pattern pays off best when there is one stable target, one stable metric, and a large number of cheap repeatable trials. That is not the main Greyforge bottleneck today.

### Keep-or-revert branch discipline

This is tidy, but it is not enough by itself to justify adoption.

Greyforge already operates with stronger doctrine around bounded changes, authority separation, and deployment discipline. A naive automatic revert loop would not fit cleanly into that environment outside a purpose-built sandbox.

### Markdown control surface

The `program.md` pattern is elegant, but it is not an upgrade for Greyforge.

Greyforge already relies on a more explicit authority stack for instructions, doctrine, and operational state. Adding another control surface would increase fragmentation rather than reduce it.

## Public Conclusion

`autoresearch` is a clean public demonstration of a narrow autonomous research loop.

Greyforge is already building further up the stack.

That is why the internal recommendation, after strict review, was:

- do not integrate the repo
- do not port the workflow
- do not borrow patterns from it right now
- revisit only if a dedicated isolated optimization lab becomes a first-class Greyforge need

## Public Sources

External:

- https://github.com/karpathy/autoresearch
- https://github.com/karpathy/autoresearch/issues
- https://raw.githubusercontent.com/karpathy/autoresearch/master/program.md
- https://raw.githubusercontent.com/karpathy/autoresearch/master/pyproject.toml

Greyforge public context:

- https://greyforge.tech/
- https://greyforge.tech/chronicles/harness-engineering-agent-reliability
- https://greyforge.tech/chronicles/retiring-the-fork
- https://greyforge.tech/openforge
- https://github.com/GreyforgeLabs/devcap
- https://github.com/GreyforgeLabs/memory-quality-gate
- https://github.com/GreyforgeLabs/sqlite-checkpoint