Be part of our day by day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra
Midjourney is greatest often called one of many main AI picture mills — with almost 20 million customers on its Discord channel, based on third-party trackers, and presumably extra atop that on its web site — however its ambitions are starting to broaden.
Following the information in late summer time 2024 that it was constructing its personal computing and AI {hardware}, the corporate this week launched a brand new analysis paper alongside machine studying specialists at New York College (NYU) on coaching text-based massive language fashions (LLMs) comparable to Meta’s open supply Llama and Mistral’s eponymous supply fashions to put in writing extra creatively.
The collaboration, documented in a new analysis paper revealed on AI code neighborhood Hugging Face, introduces two new technieques — Diversified Direct Choice Optimization (DDPO) and Diversified Odds Ratio Choice Optimization (DORPO)— designed to broaden the vary of attainable outputs whereas sustaining coherence and readability.
For an organization that’s greatest identified for its diffusion AI picture producing fashions, Midjourney’s new method to rethinking creativity in text-based LLMs exhibits that it’s not limiting its ambitions to visuals, and that, an image might not really be value a thousand phrases.
Might a Midjourney-native LLM or fine-tuned model of an current LLM be within the playing cards from the small, bootstrapped startup? I reached out to Midjourney founder David Holz however have but to listen to again.
No matter a first-party Midjourney LLM providing, the implications of its new analysis transcend tutorial workouts and could possibly be used to assist gasoline a brand new wave of LLM coaching amongst enterprise AI groups, product builders, and content material creators trying to enhance AI-generated textual content.
It additionally exhibits that regardless of latest curiosity and funding amongst AI mannequin suppliers in new multimodal and reasoning language fashions, there’s nonetheless loads of juice left to be squeezed, cognitively and performance-wise, from traditional Transformer-based, text-focused LLMs.
The issue: AI-generated writing collapses round homogenous outputs
In domains like fact-based Q&A or coding help, LLMs are anticipated to generate a single greatest response.
Nevertheless, artistic writing is inherently open-ended, that means there are various legitimate responses to a single immediate.
For an instance supplied by the Midjourney researchers, given a immediate like “Write a story about a dog on the moon”, the LLM may discover a number of various paths like:
- An astronaut’s pet canine by accident left behind after a lunar mission.
- A canine who finds itself in a futuristic canine house colony.
- A stranded canine that befriends an alien species.
Regardless of this vary of prospects, instruction-tuned LLMs usually converge on related storylines and themes. This occurs as a result of:
- Publish-training methods prioritize person desire over originality, reinforcing well-liked however repetitive responses.
- Instruction tuning usually smooths out variation, making fashions favor “safe” responses over distinctive ones.
- Present diversity-promoting methods (like temperature tuning) function solely at inference time, relatively than being baked into the mannequin’s studying course of.
This results in homogenized storytelling, the place AI-generated artistic writing feels repetitive and lacks shock or depth.
The answer: modifying post-training strategies to prioritize variety
To beat these limitations, the researchers launched DDPO and DORPO, two extensions of current desire optimization strategies. The core innovation in these approaches is the usage of deviation—a measure of how a lot a response differs from others—to information coaching.
Right here’s the way it works:
- Throughout coaching, the mannequin is given a writing immediate and a number of attainable responses.
- Every response is in comparison with others for a similar immediate, and a deviation rating is calculated.
- Uncommon however high-quality responses are weighted extra closely in coaching, encouraging the mannequin to be taught from various examples.
By incorporating deviation into Direct Choice Optimization (DPO) and Odds Ratio Choice Optimization (ORPO), the mannequin learns to provide high-quality however extra assorted responses.
This methodology ensures that AI-generated tales don’t converge on a single predictable construction, however as a substitute discover a wider vary of characters, settings, and themes—simply as a human author would possibly.
What Midjourney’s researchers did to attain this
The examine concerned coaching LLMs on artistic writing duties utilizing a dataset from the subreddit r/writingPrompts, a Reddit neighborhood the place customers publish prompts and reply with brief tales.
The researchers used two base fashions for his or her coaching:
- Meta’s Llama-3.1-8B (an 8-billion-parameter mannequin from the Llama 3 sequence).
- Mistral-7B-v0.3 (a 7-billion-parameter mannequin from Mistral AI).
Then, they took these fashions by way of the next processes:
- Supervised Tremendous-Tuning (SFT): The fashions have been first fine-tuned utilizing LoRA (Low-Rank Adaptation) to regulate parameters effectively.
- Choice Optimization:
- DPO and ORPO have been used as baselines—these customary strategies concentrate on bettering response high quality primarily based on person desire indicators.
- DDPO and DORPO have been then utilized, introducing deviation-based weighting to encourage extra distinctive responses.
- Analysis:
- Computerized analysis: Measured semantic and stylistic variety utilizing embedding-based methods.
- Human analysis: Judges assessed whether or not outputs have been various and interesting in comparison with GPT-4o and Claude 3.5.
Key Coaching Findings:
- DDPO considerably outperformed customary DPO when it comes to output variety whereas sustaining high quality.
- Llama-3.1-8B with DDPO achieved one of the best steadiness of high quality and variety, producing responses that have been extra assorted than GPT-4o whereas sustaining coherence.
- When dataset measurement was diminished, DDPO fashions nonetheless maintained variety, although they required a sure variety of various coaching samples to be totally efficient.
Enterprise implications: what does it imply for these utilizing AI to provide artistic responses — comparable to in advertising copywriting, company storytelling, and movie/TV/online game scripting?
For AI groups managing LLM deployment, enhancing output variety whereas sustaining high quality is a crucial problem. These findings have vital implications for organizations that depend on AI-generated content material in functions comparable to:
- Conversational AI and chatbots (making certain assorted and interesting responses).
- Content material advertising and storytelling instruments (stopping repetitive AI-generated copy).
- Recreation growth and narrative design (creating various dialogue and branching storylines).
For professionals accountable for fine-tuning and deploying fashions in an enterprise setting, this analysis supplies:
- A brand new method to LLM post-training that enhances creativity with out sacrificing high quality.
- A sensible various to inference-time variety tuning (comparable to temperature changes) by integrating variety into the training course of itself.
- The potential to develop extra participating AI functions, from AI-assisted writing instruments to digital assistants that may adapt their responses dynamically.
For these dealing with AI mannequin orchestration and automation, this analysis highlights:
- The significance of tuning fashions on the coaching stage, lowering the necessity for post-processing changes at deployment.
- A strategy to introduce adaptive storytelling into AI-driven functions, making certain variability whereas protecting content material high quality excessive.
- A technique for making LLM outputs extra human-like, which is essential for functions requiring interactive storytelling, buyer engagement, or dynamic content material creation.
The way forward for AI generated artistic tasks seems to be vivid
The success of DDPO and DORPO demonstrates that coaching LLMs with diversity-focused targets can yield vital enhancements in artistic writing. Some concepts embrace:
- Integrating deviation-based studying into enterprise AI fashions to reinforce response variety in customer-facing functions.
- Exploring how these strategies apply to different generative duties, comparable to AI-powered poetry, screenwriting, or sport storytelling.
- Growing hybrid coaching approaches that steadiness variety and instruction-following capabilities for AI assistants.
For these curious about making use of these methods, the researchers plan to make their code publicly accessible on this GitHub Repository
Whether or not you’re fine-tuning LLMs for enterprise functions or optimizing large-scale AI orchestration, this examine supplies actionable insights into how fashions will be extra dynamic, participating, and attentive to artistic duties.
By adopting these methods, AI groups can transfer past inflexible, formulaic outputs—constructing AI techniques that aren’t solely sensible but additionally really imaginative.