OpenInfer has raised $8 million in funding to redefine AI inference for edge functions.
It’s the mind youngster of Behnam Bastani and Reza Nourai, who spent practically a decade of constructing and scaling AI methods collectively at Meta’s Actuality Labs and Roblox.
By way of their work on the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system structure allows steady, large-scale AI inference. Nevertheless, immediately’s AI inference stays locked behind cloud APIs and hosted methods—a barrier for low-latency, personal, and cost-efficient edge functions. OpenInfer adjustments that. It desires to agnostic to the kinds of gadgets on the edge, Bastani mentioned in an interview with GamesBeat.
By enabling the seamless execution of enormous AI fashions immediately on gadgets—from SoCs to the cloud—OpenInfer removes these limitations, enabling inference of AI fashions with out compromising efficiency.
The implication? Think about a world the place your cellphone anticipates your wants in actual time — translating languages immediately, enhancing images with studio-quality precision, or powering a voice assistant that really understands you. With AI inference working immediately in your gadget, customers can count on sooner efficiency, better privateness, and uninterrupted performance regardless of the place they’re. This shift eliminates lag and brings clever, high-speed computing to the palm of your hand.
Constructing the OpenInfer Engine: AI Agent Inference Engine
Since founding the corporate six months in the past, Bastani and Nourai have assembled a staff of
seven, together with former colleagues from their time at Meta. Whereas at Meta, that they had constructed Oculus
Hyperlink collectively, showcasing their experience in low-latency, high-performance system design.
Bastani beforehand served as Director of Structure at Meta’s Actuality Labs and led groups at
Google targeted on cellular rendering, VR, and show methods. Most lately, he was Senior
Director of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles in
graphics and gaming at business leaders together with Roblox, Meta, Magic Leap, and Microsoft.
OpenInfer is constructing the OpenInfer Engine, what they name an “AI agent inference engine”
designed for unmatched efficiency and seamless integration.
To perform the primary aim of unmatched efficiency, the primary launch of the OpenInfer
Engine delivers 2-3x sooner inference in comparison with Llama.cpp and Ollama for distilled DeepSeek
fashions. This increase comes from focused optimizations, together with streamlined dealing with of
quantized values, improved reminiscence entry by means of enhanced caching, and model-specific
tuning—all with out requiring modifications to the fashions.
To perform the second aim of seamless integration with easy deployment, the
OpenInfer Engine is designed as a drop-in alternative, permitting customers to modify endpoints
just by updating a URL. Current brokers and frameworks proceed to perform seamlessly,
with none modifications.
“OpenInfer’s advancements mark a major leap for AI developers. By significantly boosting
inference speeds, Behnam and his team are making real-time AI applications more responsive,
accelerating development cycles, and enabling powerful models to run efficiently on edge
devices. This opens new possibilities for on-device intelligence and expands what’s possible in
AI-driven innovation,” mentioned Ernestine Fu Mak, Managing Companion at Courageous Capital and an
investor in OpenInfer.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference
on massive fashions—outperforming business leaders on edge gadgets. By designing inference from
the bottom up, they’re unlocking increased throughput, decrease reminiscence utilization, and seamless
execution on native {hardware}.
Future roadmap: Seamless AI inference throughout all gadgets
OpenInfer’s launch is well-timed, particularly in mild of current DeepSeek information. As AI adoption
accelerates, inference has overtaken coaching as the first driver of compute demand. Whereas
improvements like DeepSeek scale back computational necessities for each coaching and inference,
edge-based functions nonetheless battle with efficiency and effectivity as a result of restricted processing
energy. Working massive AI fashions on client gadgets calls for new inference strategies that
allow low-latency, high-throughput efficiency with out counting on cloud infrastructure,
creating vital alternatives for firms optimizing AI for native {hardware}.
“Without OpenInfer, AI inference on edge devices is inefficient due to the absence of a clear
hardware abstraction layer. This challenge makes deploying large models on
compute-constrained platforms incredibly difficult, pushing AI workloads back to the
cloud—where they become costly, slow, and dependent on network conditions. OpenInfer
revolutionizes inference on the edge,” mentioned Gokul Rajaram, an investor in OpenInfer. Rajaram is
an angel investor and at present a board member of Coinbase and Pinterest.
Specifically, OpenInfer is uniquely positioned to assist silicon and {hardware} distributors improve AI
inference efficiency on gadgets. Enterprises needing on-device AI for privateness, value, or
reliability can leverage OpenInfer, with key functions in robotics, protection, agentic AI, and
mannequin improvement.
In cellular gaming, OpenInfer’s know-how allows ultra-responsive gameplay with real-time
adaptive AI. Enabling on-system inference permits for diminished latency and smarter in-game
dynamics. Gamers will get pleasure from smoother graphics, AI-powered personalised challenges, and a
extra immersive expertise evolving with each transfer.
“At OpenInfer, our vision is to seamlessly integrate AI into every surface,” mentioned Bastani. “We aim to establish OpenInfer as the default inference engine across all devices—powering AI in self-driving cars, laptops, mobile devices, robots, and more.”
OpenInfer has raised an $8 million seed spherical for its first spherical of financing. Buyers embrace
Courageous Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Gadgets’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others.
“The current AI ecosystem is dominated by a few centralized players who control access to
inference through cloud APIs and hosted services. At OpenInfer, we are changing that,” mentioned
Bastani. “Our name reflects our mission: we are ‘opening’ access to AI inference—giving
everyone the ability to run powerful AI models locally, without being locked into expensive cloud
services. We believe in a future where AI is accessible, decentralized, and truly in the hands of
its users.”