COVID has accelerated us ten years into the future. During these unprecedented times, businesses have become vigilant in keeping an eye on how their business functions and IT systems are performing. Moreover, we see a mass movement towards the cloud and towards becoming data-driven by utilizing machine learning models.
We also see a trend where companies are immaturely scaling their data science practice. Now, it is easier quickly build a model, deploy it and start making decisions, but it is not wise to deploy models in production without ensuring that the underlying logical assumptions made by the model align with the business logic.
We have also found that just relying on available metrics that underscore the health of the model is not enough. Instead, a deep dive into the model behaviour and validating its underlying logic is required - especially if the model decision carries monetary or tangible value for decision-makers.
There are three pre-production steps that data scientists need to take to ensure our model follows the business logic and has the trust of the decision-makers. Otherwise, if the model fails, the blame is usually dropped on the shoulders of the data scientists.
Let’s talk about each one by one.
Business-focused feature selection & engineering
Building good models require extensive collaboration with business stakeholders. A crucial part of that collaboration is figuring out which features to feed the model and what new features to create that will reflect the causal behavior of our model. So data scientists should not do feature engineering in isolation but, with business leaders who have expertise in the business domain. Our research shows that including wrong features can make or break an AI system (well, everyone knows that but people still tend to fit more and more variables into their AI models unnecessarily)
If you can’t explain it, you don’t understand it well. This aphorism has a lot of weight, especially in high-risk industries. While performing our customer discovery, we found a clear pattern that companies are more likely to settle for simpler, explainable models compared to using complicated but more accurate blackbox models, especially in the high-risk use cases. Luckily, there are tools like explainX.ai, out there that level the playing field and allow companies to use the blackbox models without taking on the downside .i.e. absence of interpretability and trust. By using tools like explainX, data scientists can explain how these complicated models like neural networks arrive at their decisions, whether or not these models capture the underlying business logic, and whether or not these models are biased towards certain features in the data. These insights are crucial to extract because if the data scientists and the business users understand deeply how their models work, their quality of decision will improve drastically. More than that, deploying such models will be more fruitful and less risky.
Model behavior stress-test
I can’t stress this enough (pun intended) how important it is to mock test your model before deployment. We have found the shadow model approach to be very fruitful and widely accepted in data science teams. This approach requires companies to deploy their models online and generate random combinations of feature values (aka simulating real data) and evaluate model predictions. Although this model validation approach is efficient, it comes with cloud computing costs that might get a little expensive for a few companies. A much cheaper, equally effective, and more collaborative method is to use a what-if analysis. It is not the most scalable method but, its function is to test out the edge cases and figure out whether the model complies with the business logic in production.
In practice, the ideal way is for the data scientists to sit down with the business stakeholder and spend time on simulating multiple scenarios. The role of the domain expert is crucial in this because he or she can force the model to behave abnormally by entering values outside of what the model has seen. The domain expert will also provide insights into how the model should behave that will allow the model developer to ensure the model downside against such cases.
Apart from feature engineering, explainX can help data scientists interpret and stress-test their models at scale and speed.
While building and deploying hundreds of models in production, we have faced the troubles of finding the right framework for model interpretability, customizing it to fit our frameworks and then optimizing it for speed and scale. That trouble inspired us to launch explainX and solve this problem by building a model-agnostic model interpretability framework that we believe will accelerate the time it takes to go from raw data to model deployment without risking quality and performance. Try explainX for yourself and let us know how we can further improve.