Research · Company Memo · AI Infrastructure
Company Memo Series C AI Infrastructure Inference

Lattice AI: why the inference layer is a better bet than the model layer.

Lattice is pricing its Series C at a $4.2B post on 6.1× forward revenue. The model layer is burning capital at a rate that can't be justified by current unit economics. The inference layer — where Lattice operates — is where the cost curve actually pays back. Here's the math, the moat, and the two things we think the consensus is missing.

Alphaneo Research
Published Apr 14, 2026
14 min read
Coverage: Initiating

Our bottom line up front: we underwrite Lattice at a composite score of 0.84 (A−) and view the Series C as fairly priced given the contracted revenue base, 74% gross margin profile, and a defensible position at the compiler level of the stack. The round is not cheap. But this is one of the few AI names where revenue quality already supports the entry multiple, and where we can articulate what has to be true in three years without invoking a new category.

The thesis in one paragraph

The capital markets have spent two years pricing AI as though the model is the product. It isn't. The model is the commodity. Every generation of frontier model compresses in price within 18 months of release, and the delta gets absorbed downstream — by whatever software layer sits between the model and the customer. Lattice built that layer for enterprise inference workloads: a compiler that takes an arbitrary transformer graph, rewrites it against the customer's silicon (H100, B200, MI350, Trainium, Gaudi 3), and serves it with deterministic latency at roughly 42% lower cost per token than the hyperscaler-hosted equivalent. That is not a prompt wrapper. That is the kind of systems work that compounds.

Why we're initiating now

Three things changed in Q1 2026 that we think reset the risk/reward: (1) a multi-year contract with a top-3 US bank priced on committed throughput, not seat count; (2) the Graphene 3 compiler release closed the last meaningful latency gap versus bare-metal TensorRT; and (3) the secondary market has finally produced comparable pricing on inference-layer peers that lets us triangulate the Series C.

What Lattice actually does

Strip the marketing away and Lattice is three things stacked together. First, a graph-level compiler that ingests models in ONNX or PyTorch and produces hardware-specific kernels optimized for the customer's accelerator. Second, a serving runtime that handles batching, KV-cache management, and speculative decoding across mixed-precision paths. Third, an observability layer — p99 latency, token-level cost attribution, drift monitoring — that plugs into the customer's existing SRE stack.

The compiler is the moat. The runtime is table stakes. The observability layer is the reason enterprise procurement signs the contract. None of these three, on its own, wins the deal. Stacked, they do.

Who is actually buying

We spoke to nine current Lattice customers (five under NDA, four on background) across financial services, healthcare, and industrial. The procurement pattern is consistent: the customer already spent 2024–2025 on a hyperscaler-hosted inference bill that ran 3–6× their initial budget. The CFO applied pressure. The platform team ran a bake-off. Lattice won on a combination of throughput and data residency — most of these customers cannot send inference traffic to a shared multi-tenant endpoint under their regulatory posture, and self-hosting on Lattice gives them a defensible answer.

"We weren't trying to replace the model. We were trying to stop burning $1.8M a month serving it." — VP Platform Engineering, top-5 US insurer (under NDA)

The numbers

We rebuilt the operating model from primary inputs rather than the company deck. Management's figures and ours reconcile to within 3% on every line. The picture is cleaner than the headline multiple suggests.

Table 1 · Operating metrics, FY24A–FY26E (Alphaneo estimates)
MetricFY24AFY25AFY26EFY27E
Contracted ARR$18.4M$61.2M$143.0M$248.0M
YoY growth+232%+134%+73%
Gross margin58%68%74%76%
Net revenue retention161%148%138%
Burn multiple2.4×1.1×0.7×0.5×
Cash runway (months)142231

Two numbers do the work here. The first is the burn multiple — net cash burned per dollar of net new ARR — which crossed below 1.0 in the back half of FY25 and continues to compress. Under 1.0 is the line at which a company is creating value at the margin rather than consuming it. The second is net revenue retention at 161%, which is not a cohort quirk: it's the mechanical result of inference workloads scaling with customer usage on a committed-throughput contract. When a customer rolls out a new LLM-powered product, the Lattice bill moves with it without renegotiation.

Where we disagree with the deck

Two places. First, management is modeling FY27 gross margin at 79%. We think 76% is the honest number — the last three points of margin require the company to get customers onto its own managed-capacity offering rather than bring-your-own-silicon, and adoption of that product is still pre-traction. Second, the deck frames a $12B serviceable market by 2028. We get to $7–8B. The larger number assumes Lattice wins outside the regulated-enterprise wedge it currently dominates; the smaller number is what the wedge itself supports, and it's enough to justify the round.

Why the model layer is not the trade

We want to be precise here because this is where we think consensus pricing is wrong. The frontier model companies are extraordinary businesses in the research sense. They are not yet extraordinary businesses in the financial sense. Three reasons:

  1. Gross margin is structurally compressed. Training and inference costs scale roughly with model capability. Every time a frontier lab ships a new model, the prior generation re-prices downward by 60–80% within two quarters. The customer captures that surplus, not the lab.
  2. The product has no switching cost at the API layer. A well-architected enterprise keeps two models behind a router. Swapping the default model takes an afternoon. The data that would create lock-in — fine-tuning artifacts, evaluation suites, retrieval indexes — lives with the customer or with the inference-layer vendor. It does not live with the model lab.
  3. The capital intensity is self-reinforcing. To stay at the frontier, labs must raise every 12–18 months at multiples that assume the prior round's thesis is still intact. The math works until it doesn't.

None of this means the model layer loses. It means the model layer is priced as though it wins all the surplus, when the evidence says most of the surplus flows one layer down. Lattice sits one layer down.

The model is the product the customer buys. The inference layer is the product the customer budgets around.

Competitive position

The serious competitors are Together, Fireworks, and the hyperscaler first-party offerings (AWS Bedrock, Azure AI Foundry, GCP Vertex). We ran the comparison across five dimensions that procurement teams actually weigh.

Table 2 · Competitive position, enterprise inference (Alphaneo scoring, 1–5)
DimensionLatticeTogetherFireworksHyperscalers
Cost per token (regulated workload)5442
Latency determinism (p99)5433
Silicon portability5331
Data residency & deploy topology5334
Model catalog breadth3544

Lattice loses on catalog breadth and it does not matter for the wedge it sells into. Regulated enterprise buyers do not want a menu of 200 models. They want two or three, deterministic latency on their silicon, and a SOC 2 Type II report that survives their GRC team's read. That is the shape of the contract, and Lattice has built for that shape specifically.

Round structure and what we'd watch

The Series C is $280M at a $4.2B post-money, led by an existing crossover with two strategic co-leads. Preference is 1× non-participating, standard weighted-average anti-dilution, no ratchet, no pay-to-play. The cap table cleaned up in Q4 2025 through a tender at a 15% discount to the current round — removing roughly $90M of early employee and angel stock that had been clogging secondary interest. This is the kind of housekeeping that matters more than it reads.

The two things we're watching

One: managed-capacity adoption. If Lattice cannot get 30%+ of FY27 revenue onto its own hosted offering, the gross margin story pauses at 74%, and the multiple compresses. Two: the next-gen accelerator transition. When Blackwell Ultra and MI400 ship volume in H2 2026, the compiler has to retarget within one quarter. Lattice has done this twice before on schedule. The third time is the one that proves it's repeatable.

Valuation

At a $4.2B post on our FY26E $143M ARR, the round prices at 29.4× current and 16.9× forward. That is above the public infrastructure median of 11× forward but below the AI-native peer set at 22× forward. Our fair-value range, triangulating DCF, comp-set multiple, and optionality on the managed-capacity product, is $3.8B–$5.1B. The round sits inside the range, closer to the midpoint than the top. We are comfortable taking allocation at the Series C price.

What we are not comfortable with is chasing the secondary into the top quartile of the range. Above a $4.8B mark, the risk/reward skews toward waiting for an FY27 data point.

Scorecard — how we got to 0.84

Published under the Alphaneo six-dimension framework. Full methodology here.

  • Team & Execution — 0.90. Founding team out of the Google TPU compiler group and two Nvidia CUDA alumni. Three prior shipped products. Recruiting density above category average.
  • Market & TAM — 0.82. $7–8B wedge on our numbers, $12B on management's. We score the wedge, not the ambition.
  • Product & Moat — 0.86. Compiler depth is real. Observability layer is stickier than it looks. Catalog gap is the one weakness.
  • Unit Economics — 0.78. Burn multiple under 1.0, NRR at 161%, 74% gross margin. Held back from higher only by concentration in top-10 customers.
  • Round & Valuation — 0.80. Inside our fair-value range. Clean structure, no ratchets, recent tender.
  • Risk & Disclosure — 0.88. Audit quality is strong. Key-person risk flagged on CTO. Regulatory posture defensible.

What would change our mind

We would downgrade to hold if any of the following occur in the next two reporting cycles: (1) net revenue retention breaks below 130% on a trailing basis, signaling that the committed-throughput contracts are not expanding as usage grows; (2) managed-capacity revenue remains below 15% of total by Q4 2026; or (3) a hyperscaler ships a first-party compiler with comparable cross-silicon portability. We assign roughly 10%, 25%, and 15% probability to each, respectively.

We would upgrade to high conviction if managed-capacity crosses 35% of revenue on the current growth trajectory, or if Lattice announces a second top-3 bank — at which point the wedge becomes a category.


This memo reflects the views of the Alphaneo research desk as of the publication date. It is provided for informational purposes only and does not constitute investment advice or an offer to sell or a solicitation of an offer to buy any security. Analyst scores are re-rated on material events. Alphaneo analysts and affiliates may hold positions in the companies covered. Full disclosures at alphaneo.ai/legal.

Research list

Get the next memo.

New analyst notes and scorecard updates. Accredited investors only.