ElevenLabs’ new speech-to-text mannequin Scribe is right here with highest accuracy price to date (96.7% for English)

Be a part of our every day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra

ElevenLabs, the highly-valued AI voice cloning and technology startup from former Palantir alumni, as we speak launched Scribe v1, a brand new speech-to-text mannequin that reportedly achieves the very best accuracy throughout a number of languages. Customers can attempt it right here.

In line with the corporate’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3 and Deepgram Nova-3 in precisely changing spoken speech into textual content on the net, attaining new record-low error charges.

The corporate claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, together with improved efficiency in beforehand underserved languages resembling Serbian, Cantonese and Malayalam.

As Flavio Schneider, ElevenLabs lead researcher wrote on X, Scribe is the “smartest audio understanding model” launched by ElevenLabs but.

“Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a thread. “It can detect non-verbal events (like laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.”

“Diarization” is the title given to the method of separating audio system by their vocal qualities on a recording.

The truth is, ElevenLabs’ documentation states Scribe can distinguish and isolate as much as 32 completely different audio system in the identical audio file.

Whereas ElevenLabs cautions that Scribe is “best used when high-accuracy transcription is required rather than real-time transcription,” the corporate additionally plans to introduce a low-latency model quickly, increasing its use for real-time functions.

Lowest phrase error charges (WER)

Scribe is designed to deal with real-world audio challenges with precision. In line with benchmark outcomes from FLEURS and Frequent Voice, it information the bottom phrase error charges (WER) for a lot of languages, together with Italian (98.7%) and English (96.7%).

Key options embody:

Speaker diarization to distinguish audio system in multi-speaker recordings.
Phrase-level timestamps for detailed transcription accuracy.
Detection of non-speech occasions, resembling laughter and background noises.
Structured transcript output for seamless integration by way of API.

Pricing and availability

Scribe is obtainable now by the ElevenLabs web site and API.

Pricing is ready at $0.40 per hour of enter audio, with a 50% low cost for the subsequent six weeks. A low-latency model for real-time functions can also be in improvement.

What it means for enterprises

For enterprise decision-makers, Scribe presents a device for scalable, high-accuracy transcription, making it helpful for industries counting on automated documentation, assembly transcription and content material accessibility.

The mannequin’s means to deal with various languages with excessive precision additionally advantages multinational companies, media corporations and buyer help functions.

Scribe’s pricing construction makes it aggressive for companies that require high-volume transcription providers, and its API-based integration permits for seamless adoption in enterprise workflows.

Moreover, the upcoming low-latency model might place Scribe as a viable choice for real-time communication instruments.

Coming the identical day as rival Hume’s reverse text-to-speech mannequin Octave

Timing is every thing, and ElevenLabs selected to launch Scribe the identical day as rival Hume AI unveiled Octave, an LLM-powered text-to-speech mannequin that enables customers to customise AI-generated voices with adjustable feelings.

It’s designed for content material creation, together with audiobooks, podcasts and online game voiceovers. In contrast to normal TTS programs, Octave considers context past particular person sentences, adjusting tone, rhythm and cadence dynamically to sound extra pure.

Hume AI positions Octave as a direct competitor to ElevenLabs’ text-to-speech choices, highlighting that Octave’s pricing is about half the price of ElevenLabs’ present AI voice providers.

Whereas Scribe and Octave serve completely different capabilities, their improvement displays the rising competitors in AI-driven audio fashions.

ElevenLabs is prioritizing exact, multi-language speech recognition, whereas Hume AI is advancing expressive AI-generated speech.

For enterprises, this implies extra specialised options for each transcription and artificial voice functions, enabling extra environment friendly content material manufacturing, buyer engagement and accessibility instruments.

Scribe is now dwell, and ElevenLabs is internet hosting a digital occasion subsequent week with the group behind its improvement. Extra particulars, benchmarks and API documentation can be found within the official weblog publish.

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.