VAC Anabasis

Quantifying information for discovery.

Information theory operationalized for AI — measurement tools for what learning systems know, transfer, and lose.


What VAC is

VAC is an information company. Information theory is the right accounting language for AI, in the same sense coding theory is the right accounting language for communication: it tells you what is actually being moved, measured, and lost across a learning system, independent of the architecture or domain that happens to be in fashion.

The premise is that a sound program of measurement is a precondition for a sound program of generation. Most of the headline progress in AI over the past decade has been on the generation side; the measurement side has lagged, with the consequence that we have very capable systems and very few principled tools for saying what those systems actually know, transfer, or lose under intervention.

VAC builds those tools. They are local where the data is small, anytime-valid where the data is sequential, graph-aware where the structure matters. They are unified under one ledger — bits, rates, channels — applied at the resolution where measurement is decision-useful.

VAC builds an information-theoretic substrate for AI that measures, not just generates — quantifying information transfer across learning systems with the same rigor that coding theory brings to communication.

The four toolkits

The work spans four toolkits — each operationalising a different layer of the same information-theoretic ledger. They were developed independently before they consolidated; the unification is the substance of the program.

Nonparametric

Local estimators of entropy, intrinsic dimension, and divergence on embedding neighborhoods.

Tools that measure what an embedding actually carries — locally, at the resolution of a query point and its neighbors. Quantization-dimension and high-rate theory give the analytical backbone; the practical output is a family of estimators that work where the sample is small, the dimension is high, and the modality is new.

Sequential

Anytime-valid sequential testing built from martingales, e-values, and trajectory-aware estimators.

Testing under continual monitoring, where the data arrives over time and the decision rule must remain valid at every stopping time. The mathematical engines are martingales and e-values; the deployment surface is real-world decision pipelines where peeking, early stopping, and post-hoc inference are the norm rather than the exception.

Genomic

Exponential-family and multi-omic information theory for sequence and cell-state measurements.

Information-theoretic measurements for sequence over the genome, for chemical and single-cell embeddings, and for multi-omic data fusion. Built on exponential-family structure where it exists and on nonparametric machinery where it does not. The applied target is the per-stage information yield of a discovery pipeline — what each filter actually reveals, in bits.

Graph

Graph Schrödinger bridges and optimal transport, applied to proteins and spatial omics.

Information geometry of stochastic dynamics on graphs. Schrödinger bridges, MERW, and optimal transport give a common language for protein conformational ensembles, spatial transcriptomic flows, and reinforcement-learning trajectories. The tools translate between domains that have historically each invented their own vocabulary.

Why information theory

I(X;Y) = H(X) − H(X∣Y)

Information theory operationalises a small set of universal accountings — entropy, mutual information, rate, divergence — that work the same way across statistics, signal processing, learning theory, and decision theory. Coding theory took the same accounting and made it the language of communication. The case for doing the same with AI is that the universal-accounting property holds: reinforcement learning, p-hacking, graph learning, sequential testing, model editing — all of them describe themselves more crisply in information-theoretic terms than in their native vocabularies.

The reason this is the right banner now is the feedback loop. Operationalised measurement of information transfer feeds the design of better information-processing agents, which generate more measurable transfer, which sharpens the measurement, and so on. The loop is the program.

A second reason, more cultural than technical: information theory carries a lineage. The engineers building modern AI grew up on Shannon — read Cover and Thomas in graduate school, cite MacKay in undergraduate ML courses. Returning measurement to the center of the program is a return to a frame the audience already knows.

Papers

Released papers from the Anabasis project. Filter by topic.

No papers yet.

Timeline

Released papers, ordered by date.

This section populates as papers release.

Dependency graph

Released papers cite earlier work in the program. This graph shows the citations we have published so far.

This section populates as papers release.

About

I'm Akshay Balsubramani. A year and a half ago I was a big-pharma researcher striking out as a biotech consultant focused on RNA vaccines; a policy shock to that industry made the plan harder, several AI/ML side projects I'd been keeping warm started working, and I committed to AI. VAC is what that consolidation became — an information company that operationalises information theory for AI the way coding theory was operationalised for communication: as the accounting language for what a learning system knows, transfers, and loses. Anabasis is the project name for the public release.

What I'm interested in talking about

  • Information accounting on real pipelines
  • Biotech and drug-discovery applications
  • Sequential decision-making under continual monitoring
  • Nonparametric estimation in finite-sample regimes
  • How a small research operation stays competitive on a surface the field treats as needing a large one

akshay@vac.bio