Independent research lab. Currently shipping.

Can AI be 100x more efficient?

The thesis: the next jump in capability comes from architecture, not scale. I run small experiments on a single GPU to find out where today's models waste compute, and write down everything I learn.

Most of the public conversation about AI capability is about scaling. The work here is about the other axis. I probe pre-trained models for the places they waste compute (KV cache, attention patterns, embedding usage, routing) and test small architectural modifications that change the efficiency math without retraining from scratch. Each experiment is one writeup. Successes and failures both ship.

Latest Experiment

Writing

Research notes, experiment logs, and honest results. The failures included.

All posts