Technical Debts of Machine Learning Systems

Common problems of machine learning system and sharing of personal experience

Chen Yanhui
The Startup

--

With the democratization of Machine Learning (ML) technologies and the advancement of ML tooling and frameworks, developing and deploying ML systems has become fast and cheap. However, maintaining such systems over time proves to be difficult and expensive, due to the peculiarity and unique characteristics of ML systems.

I am going to briefly summarize some of the technical debts of ML systems from 3 perspectives, which are Data Dependency, ML System Peculiarity and System Architecture and Design respectively. After that, I will try to relate some of them to a ML project that I have done previously.

ML System Technical Debts

Due to Data Dependency

ML System’s Data Dependency is the root of all evils why ML systems are so much more difficult to maintain. In addition to the normal problems of software systems, there are more complicated problems caused by data dependency.

The best practices of software engineering use encapsulation and modular design to create maintainable code. The strict boundaries among the software components express logical, consistent and deterministic relationship between inputs and outputs.

However in ML system, any change in the input data would cause changes in code logic and parameter values in ML system. It is a Change Anything Change Everything (CACE) system. In other words, the software and the data dependencies are in Entanglement. And even worse, the data dependencies are much more Unstable than code dependencies.

The solution to the data dependency problem requires a mindset change: we need to treat data as a first-class citizen in ML system, on par with source codes and infrastructure. In other words, we need to have corresponding tooling to continuously monitor and validate data throughout the whole lifecycle of ML system.

The most important thing is to have an effective and flexible Monitoring and Testing mechanism that could help check for all kinds of data problems such as Training-Serving Skew, Data Drifting, etc. This mechanism should also allow users to plug in their own monitoring metrics, which are then visualized on the dashboard.

Another specific approach is Static Analysis of Data Dependencies, which builds data dependency graphs and enables data sources and features to be annotated. Automated checks can then be run to ensure that all dependencies have the appropriate annotations, and dependency trees can be fully resolved.

One more approach is called Model Unit Testing, which checks for errors in the training code using synthetic data generated through data schema. The data schema is not fixed and it has a lifecycle empowered by Data Analyzer and Data Validator with certain degree of human intervention and domain knowledge.

Due to ML System Peculiarity

ML system is different from normal software system in that it is experimental in nature. The ML model need to be re-trained continuously given the new training data and labels. However, the new data might be the result of the old system, which means that any error in the old model will be amplified in the new system (Feedback Loop).

Classification model needs to set a fix threshold for decision boundary. Most of the time this threshold is manually set and becomes a problem in a dynamic environment (Fix Threshold).

The solution to the Feedback Loop problem has to come back to Monitoring and Testing again. It can detect the change in data as well as the deterioration of the model performance. Also the Model Unit Testing framework together with Data Analyzer and Data Validator mentioned above could give you an early warning.

The other approach is to have an Experimentation Platform. After all, all the work that you have done before deploying is hypothetical. Having an Experimentation Platform for A/B Testing not only helps you release your model with high-degree of confidence and gathering effective feedbacks, but also minimizing damage for erroneous models. Specific for the problem of Fix Threshold, you could dynamically adjust your threshold by running experiments through the platform.

Due to System Architecture and Design

ML System make use of data from many sources to construct features for the model, sometimes incrementally as new sources and features are identified. Without careful consideration and design, the ETL and pre-processing create a jungle or a spider web of intermediate steps with files and outputs. Managing, testing and maintaining these pipelines becomes a nightmare and costly (Pipeline Jungle).

Data needs to be pre-processed before passing to ML model. In some ML packages or the models developed in-house, a large chunk of supporting code is bundled together with the ML model source codes (Glue Code). This makes it much harder to maintain and also reduce its extensibility. We should be consciously be aware of the difference between the supporting and model source codes during model development and separate them into different packages.

The solution to Pipeline Jungle is to think holistically about data collection and feature extraction. Firstly you need to organize pipelines into logical groups and standardize the flow of steps for each group. Secondly you need to constantly examine the pipelines to extract the common logics and build them into reusable components across the pipelines. Thirdly as new sources and features are added, you might want to re-organize the pipelines from time to time into more manageable logical groups.

Personal Experience

Previously I was leading a team consists of both data scientists and data engineers in a utility company. My team’s responsibility was to apply machine learning to solve energy and operation problems. We take care of the whole lifecycle of machine learning, all the way from the design and develop of the ML models and pipelines to the deployment and maintenance of them.

In the following I will share one of the ML solutions for the False Alarm Reduction of metering alarms.

The Problem

The company’s meter technical support unit sends personnel to inspect electricity meter of every household bi-monthly. And the metering system will generate an alarm ticket if it suspects there is something wrong with the meter reading. The unit has to send personnel to inspect the meter device again for every alarm ticket.

It turns out that more than 90% of the tickets were false alarms out of the 3000 monthly tickets. It not only incurs unnecessary financial costs but also casts a big stress on human resources.

The Challenges

We have to come up with a ML solution to reduce the false alarms while at the same time try not to miss any true alarms.

All of the data are in SAP transactional systems. It is very difficult to build ML pipelines and models that are directly interfacing with SAP.

Our newly set-up team was the only ML team in the company. There was no machine learning infrastructure at all. We had to build the ML infrastructure from scratch for effective training, testing and deployment of the models.

The Solution

Prerequisite

Before investing lots of efforts into a ML project, you have to ask yourself many questions from business and technical & operational perspectives.

Some business questions are like: Is there any more straight-forward and heuristic approach? Is there enough room for improvement? Does the P&L make sense? Will the estimated benefits outweighs the costs of development, maintenance, and operation?

Some technical & operational questions are like: Does enough labelled data exist? Where does the data come from? How easily is it to get the data? How do you evaluate the model? What are the success and fail metrics? How long does it take to know the model fails?

Develop

Then we started to gather training data from various stakeholders for exploratory data analysis and model development. Once the model proved working on the testing data, we started discussions and cooperation with various SAP operation teams to build the ML pipeline for data ETL as well as model scoring and output.

In anticipating that the data would be needed by many other use cases, we built from scratch a data warehouse and feature marts for the model to consume from. We also packaged the model as a restful service as a step in the pipeline.

As the model was developed and improved iteratively, new sources and features came into picture incrementally. The data pipelines became increasingly complex and hard to manage. We encountered the problem of Pipeline Jungle.

However with Apache Nifi, we managed to separate the the pipelines into logical groups and standardize the flow of steps in each group. At the same time we extracted the commonalities of the processing logics into Custom Nifi Processors as reusable components in the pipelines. In the meanwhile, we scrutinized and refactored the model to separate away the pre-processing and supporting logic to avoid the problem of Glue Code.

Deploy

The deployment of the model impacted new training data and labels, which means we couldn’t get any new untainted training data any more once the model is deployed. This is the typical problem of Feedback Loop.

We have put some consideration into it at that time. But because there were other mechanisms for users to report the status of their metering devices, and there was no proper Experimentation Platform, we went ahead with the model deployment.

However, I think we should have done A/B testing to validate the model in production despite all the difficulties. This was a lesson learnt. Luckily the overall result was very good. We managed to reduce a significant amount of false alarms every month while keeping almost all of the true alarms. And our effort was recognized by both the stakeholders and the company.

Conclusion

In this article I have discussed briefly some common problems of ML systems and relate a few of them to one of my previous projects. Maintainability is one of the most important aspects of any software system, and it is even more so for ML system due to its data dependency and experimental nature.

Fast and cheap development and deployment of ML projects is a worrying trend with accumulated technical debts. A good and comprehensive ML infrastructure plus careful considerations on the entire lifecycle before starting a ML project will improve the maintainability of a ML system tremendously.

References

  1. Hidden Technical Debt in Machine Learning Systems
  2. Data Validation for Machine Learning

Related

  1. Enable Model-Driven Business with Machine Learning Platform

--

--