Independent research lab. Currently shipping.

Can AI be 100x more efficient?

The thesis: the next jump in capability comes from architecture, not scale. I run small experiments on a single GPU to find out where today's models waste compute, and write down everything I learn.

Read the research GitHub

Most of the public conversation about AI capability is about scaling. The work here is about the other axis. I probe pre-trained models for the places they waste compute (KV cache, attention patterns, embedding usage, routing) and test small architectural modifications that change the efficiency math without retraining from scratch. Each experiment is one writeup. Successes and failures both ship.

Latest Experiment

New 01

Embedding Compression for KV Cache

Replace raw tokens in LLM context with sentence-level pooled embeddings. 10x compression, 89% fact extraction, scales flat to 500+ pools.

Read the writeup →

Writing

Research notes, experiment logs, and honest results. The failures included.

All posts →