Capital Markets · Report

Engram raises $98 million to develop token-cost-reduction infrastructure for AI inference optimization.

Validates inference efficiency as critical infrastructure differentiator; positions token economics as key competitive lever.

Trade pressSlicast · June 23, 2026 · US · Source: Google News

importance 86

AI memory startup Engram has raised $98 million from investors including General Catalyst, Kleiner Perkins, Sequoia, and OpenAI co-founder Andrej Karpathy. Founded in October with approximately 13 employees, the company claims its models can match or outperform frontier labs while consuming up to 100 times fewer tokens. Engram counts Microsoft, Notion, and legal AI startup Harvey among its customers. The company plans to deploy the funding toward compute resources and talent acquisition.

Kleiner Perkins partner Leigh Marie Braswell characterized the market opportunity, noting the industry faces an "explosion of data, explosion of cost." She described Engram's approach as mapping organizational requirements and delivering "orders of magnitude cheaper output."

Memory systems and retrieval techniques have emerged as a common industry response to rising inference costs, as they reduce the amount of model-context processed per query. Companies building memory layers typically employ selective retrieval, compression, or learned indexing to lower token consumption per request. For practitioners, these approaches trade engineering complexity and storage or compute resources against lower per-request token consumption—a particularly valuable tradeoff for organizations where models charge by token or context window size.

The broader market context reflects a shift toward newer, more sophisticated models with larger context windows and correspondingly higher per-token pricing. Startups marketing memory and retrieval systems are capturing enterprise demand for cost-efficient deployments as teams limit developer access or throttle usage to manage infrastructure bills.

For practitioners evaluating such solutions, measurable cost and latency outcomes from vendors claiming large token reductions warrant careful scrutiny, along with customer case studies reporting end-to-end savings. The integration approach matters as well—whether solutions integrate with major model providers or require custom model retraining—as do pricing models distinguishing between memory and model usage.

Read the original