Be part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra
Hume AI, the startup specializing in emotionally clever voice interfaces, has launched Voice Management, an experimental function that empowers builders and customers to create customized AI voices via exact modulation of vocal traits — no coding, AI immediate engineering, or sound design expertise required.
This launch builds on the inspiration laid by the corporate’s earlier Empathic Voice Interface 2 (EVI 2), which launched superior capabilities in naturalness, emotional responsiveness, and customization.
Each EVI 2 and Voice Management keep away from the dangers of voice cloning, a follow that Cowen has said carries moral and sensible challenges.
As an alternative, Hume focuses on offering instruments for creating distinctive, expressive voices that align with consumer wants, akin to customer support chatbots, digital assistants, tutors, guides, or accessibility options.
Shifting past preset AI voices towards customized bespoke options
Voice Management affords builders the flexibility to regulate voices alongside 10 distinct dimensions, together with:
“Masculine/Female: The vocalization of gender, ranging between extra masculine and extra female.
Assertiveness: The firmness of the voice, ranging between timid and daring.
Buoyancy: The density of the voice, ranging between deflated and buoyant.
Confidence: The assuredness of the voice, ranging between shy and assured.
Enthusiasm: The joy inside the voice, ranging between calm and enthusiastic.
Nasality: The openness of the voice, ranging between clear and nasal.
Relaxedness: The stress inside the voice, ranging between tense and relaxed.
Smoothness: The feel of the voice, ranging between easy and staccato.
Tepidity: The liveliness behind the voice, ranging between tepid and vigorous.
Tightness: The containment of the voice, ranging between tight and breathy.”
This no-code device permits customers to fine-tune voice attributes in actual time via digital onscreen sliders. It’s at present obtainable in Hume’s digital playground, which requires a free consumer sign-up to entry.
The discharge addresses a key ache level within the AI {industry}: the reliance on preset voices, which frequently fail to satisfy the particular wants of manufacturers or functions, or the dangers related to voice cloning.
This deal with customization aligns with Hume’s broader objective of creating emotionally nuanced voice AI.
The corporate’s efforts to advance voice AI have been highlighted in September 2024 with the launch of EVI 2, which the corporate described as a big improve to its predecessor.
EVI 2 improved latency by 40%, lowered prices by 30%, and expanded voice modulation options, providing builders a safer different to voice cloning.
Sliders > textual content prompts
Hume’s research-driven method performs a central position in its product improvement. The corporate, co-founded by former Google DeepMinder Alan Cowen, makes use of a proprietary mannequin based mostly on cross-cultural voice recordings paired with emotional survey knowledge.
This technique, rooted in emotion science, varieties the spine of each EVI 2 and the newly launched Voice Management.
Voice Management extends these rules by addressing the granular, usually ineffable methods people understand voices.
The device’s slider-based interface displays widespread perceptual qualities of voice, akin to buoyancy or assertiveness, with out trying to oversimplify these attributes via text-based prompts.
Voice Management is straight away obtainable in beta and integrates with Hume’s Empathic Voice Interface (EVI), making it accessible for a variety of functions.
Builders can choose a base voice, regulate its traits, and preview the ends in actual time. This course of ensures reproducibility and stability throughout classes, key options for real-time functions like customer support bots or digital assistants.
EVI 2’s affect is clear in Voice Management’s capabilities. The sooner mannequin launched options like in-conversation prompts and multilingual capabilities, which have broadened the scope of voice AI functions.
For instance, EVI 2 helps sub-second response instances, enabling pure and speedy conversations. It additionally permits dynamic changes to talking type throughout interactions, making it a flexible device for companies.
Differentiating in a aggressive market
Hume’s deal with voice customization and emotional intelligence positions it as a powerful competitor within the voice AI area, even in opposition to well-funded rivals akin to OpenAI with its Superior Voice Mode and ElevenLabs, each of which supply libraries of pre-set voices.
Hume continues to construct on its progressive method to voice AI. Plans for increasing Voice Management embody introducing further modifiable dimensions, refining voice high quality underneath excessive changes, and rising the vary of base voices obtainable.
With the launch of Voice Management, Hume reinforces its place as a frontrunner in voice AI innovation, providing instruments that prioritize customization, emotional intelligence, and real-time adaptability. Builders can entry Voice Management immediately by way of Hume’s platform, marking one other step ahead within the evolution of AI-driven voice options.