Be a part of our every day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Study Extra
In case you haven’t heard of “Qwen2” it’s comprehensible, however that ought to all change beginning in the present day with a stunning new launch taking the crown from all others with regards to a vital topic in software program improvement, engineering, and STEM fields the world over: math.
What’s Qwen2?
With so many new AI fashions rising from startups and tech firms, it may be laborious even for these paying shut consideration to the house to maintain up.
Qwen2 is an open-source massive language mannequin (LLM) rival to OpenAI’s GPTs, Meta’s Llamas, and Anthropic’s Claude household, however fielded by Alibaba Cloud, the cloud storage division of the Chinese language e-commerce big Alibaba.
Alibaba Cloud started releasing its personal LLMs beneath the sub model identify “Tongyi Qianwen” or Qwen, for brief, in August 2023, together with open-source fashions Qwen-7B, Qwen-72B and Qwen-1.8B, with 72 billion and 1.8-billion parameters respectively (referencing the settings and in the end, intelligence of every mannequin), adopted by multimodal variants together with Qwen-Audio and Qwen-VL (for imaginative and prescient inputs), and eventually Qwen2 again in early June 2024 with 5 variants: 0.5B, 1.5B, 7B, 14B, and 72B. Altogether, Alibaba has launched greater than 100 AI fashions of various sizes and capabilities within the Qwen household on this time.
And clients, significantly in China, have taken word, with greater than 90,000 enterprises reported to have adopted Qwen fashions of their operations within the first yr of availability.
Whereas many of those fashions boasted state-of-the-art or close-to-it efficiency upon their launch dates, the LLM and AI mannequin race extra broadly strikes so quick world wide, they had been rapidly eclipsed in efficiency by different open and closed supply rivals. Till now.
What’s Qwen2-Math?
Right now, Alibaba Cloud’s Qwen workforce peeled off the wrapper on Qwen2-Math, a brand new “series of math-specific large language models” designed for English language. Essentially the most highly effective of those outperform all others on the earth — together with the vaunted OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and even Google’s Math-Gemini Specialised 1.5 Professional.
Particularly, the 72-billion parameter Qwen2-Math-72B-Instruct variant clocks in at 84% on the MATH Benchmark for LLMs, which gives 12,500 “challenging competition mathematics problems,” and phrase issues at that, which may be notoriously tough for LLMs to finish (see the check of which is bigger: 9.9 or 9.11).
Right here’s an instance of an issue included within the MATH dataset:
Candidly, it’s not one I may reply by myself, and definitely not inside seconds, however Qwen2-Math apparently can more often than not.
Maybe unsurprisingly, then, Qwen2-Math-72B Instruct additionally excels and outperforms the competitors at grade faculty math benchmark GSM8K (8,500 questions) at 96.7% and at collegiate-level math (Faculty Math benchmark) at 47.8% as effectively.
Notably, nevertheless, Alibaba didn’t evaluate Microsoft’s new Orca-Math mannequin launched in February 2024 in its benchmark charts, and that 7-billion parameter mannequin (a variant of Mistral-7B, itself a variant of Llama) comes up near the Qwen2-Math-7B-Instruct mannequin at 86.81% for Orca-Math vs. 89.9% for Qwen-2-Math-7B-Instruct.
But even the smallest model of Qwen2-Math, the 1.5 billion parameter model, performs admirably and near the mannequin greater than 4 instances its dimension scoring at 84.2% on GSM8Kand 44.2% on school math.
What are math AI fashions good for?
Whereas preliminary utilization of LLMs has centered on their utility in chatbots and within the case of enterprises, for answering worker or buyer questions or drafting paperwork and parsing info extra rapidly, math-focused LLMs search to supply extra dependable instruments for these seeking to commonly clear up equations and work with numbers.
Mockingly given all code is predicated on mathematic fundamentals, LLMs have up to now not been as dependable as earlier eras of AI or machine studying, and even older software program, at fixing math issues.
The Alibaba researchers behind Qwen2-Math state that they “hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.”
The customized licensing phrases for enterprises and people searching for to make use of Qwen2-Math fall in need of purely open supply, requiring that any industrial utilization with greater than 100 million month-to-month lively customers receive an extra permission and license from the creators. However that is nonetheless a particularly permissive higher restrict and would enable for a lot of startups, SMBs, and even some massive enterprises to make use of Qwen-2 Math commercially (to make them cash) without spending a dime, basically.