By Tapojit Debnath Tapu, Co-founder & CTO, Obviously AI.
This article was co-written with my colleague and fellow YEC member, Nirman Dave, CEO at Obviously AI.
Back in March of this year, MIT Sloan Management Review made a sobering discovery: The majority of data science projects in businesses are deemed failures. A staggering proportion of companies are failing to obtain meaningful ROI from their data science projects. A failure rate of 85% was reported by a Gartner Inc. analyst back in 2017, 87% was reported by VentureBeat in 2019 and 85.4% was reported by Forbes in 2020. Despite the breakthroughs in data science and machine learning (ML), despite the development of several data management softwares and despite hundreds of articles and videos online, why is it that production-ready ML models are just not hitting the mark?
People often attribute this to a lack of appropriate data science talent and disorganized data, however, my business partner and co-founder, Nirman Dave, and I were discussing this recently, and we believe there is something more intricate at play here. There are three key factors that hinder ML models from being production-ready:
1. Volume: The rate at which raw data is created
2. Scrubbing: The ability to make an ML-ready dataset from raw input
3. Explainability: The ability to explain how decisions are derived from complex ML models to everyday non-technical business users
Let’s start by looking at volume, one of the first key bottlenecks in making production-ready ML models. We know that the rate of data being collected is growing exponentially. Given this increasing volume of data, it becomes incredibly essential to deliver insights in real-time. However, by the time insights are derived, there is already new raw data that is collected, which may make existing insights obsolete.
MORE FOR YOU
Additionally, this is topped with data scrubbing, the process of organizing, cleaning and manipulating data to make it ML-ready. Given that data is distributed across multiple storage solutions in different formats (i.e., spreadsheets, databases, CRMs), this step can be herculean in nature to execute. A change as small as a new column in a spreadsheet might require changes in the entire pipeline to account for it.
Moreover, once the models are built, explainability becomes a challenge. Nobody likes to take orders from a computer unless they are well explained. This is why it becomes critical that analysts can explain how models make decisions to their business users without being sucked into the technical details.
Solving even one of these problems can take an army and many businesses don’t have a data science team or cannot scale one. However, it doesn’t need to be this way. Imagine if all these problems were solved by simply changing the way ML models are chosen. This is what I call the Tiny Model Theory.
Tiny Model Theory is the idea that you don’t need to use heavy-duty ML models to carry out simple repetitive everyday business predictions. In fact, by using more lightweight models (e.g., random forests, logistic regression, etc.) you can cut down on the time you’d need for the aforementioned bottlenecks, decreasing your time to value.
Often, it’s easy for engineers to pick complicated deep neural networks to solve problems. However, in my experience as a CTO at one of the leading AI startups in the Bay Area, most problems don’t need complicated deep neural networks. They can do very well with tiny models instead — unlocking speed, reducing complexity and increasing explainability.
Let’s start with speed. Since a significant portion of the project timeline gets consumed by data preprocessing, data scientists have less time to experiment with different types of models. As a result, they’ll gravitate toward large models with complex architecture, hoping they’ll be the silver bullet to their problems. However, in most business use cases — like predicting churn, forecasting revenue, predicting loan defaults, etc. — they only end up increasing time to value, giving a diminishing return on time invested versus performance.
I find that it’s akin to using a sledgehammer to crack a nut. However, this is exactly where tiny models can shine. Tiny models, like logistic regression, can train concurrently by making use of distributed ML that parallel trains models across different cloud servers. Tiny models require significantly less computational power to train and less storage space. This is due to the lack of complexity in their architecture. This lack of complexity makes them ideal candidates for distributed ML. Some of the top companies prefer simple models for their distributed ML pipeline involving edge devices, like IOTs and smartphones. Federated machine learning, which is based on edge-distributed ML, is quickly becoming popular today.
An average data scientist can easily identify how a simple model like a decision tree is making a prediction. A trained decision tree can be plotted to represent how individual features contribute to making a prediction. This makes simple models more explainable. They can also use an ensemble of trained simple models, which takes an average of their predictions. This ensemble is more likely to be accurate than a single, complex model. Instead of having all your eggs in one basket, using an ensemble of simple models distributes the risk of having an ML model with bad performance.
Simple models are much easier to implement today since they’re more accessible. Models like logistic regression and random forests have existed for much longer than neural nets, so they’re better understood today. Popular low-code ML libraries, like SciKit Learn, also helped lower the barrier of entry into ML, allowing one to instantiate ML models using one line of code.
Given how crucial AI is becoming in business strategy, the number of companies experimenting with AI will only go up. However, if businesses want to gain a tangible competitive edge over others, I believe that simple ML models are the only way to go. This doesn’t mean complex models like neural nets will go away — they’ll still be used for niche projects like face recognition and cancer detection — but all businesses require decision-making, and simple models are a better choice than complex ones.