Hands-on Tutorials

Microkernel Architecture for Machine Learning Library

An Example of Microkernel Architecture with Python Metaclass

Chen Yanhui
Towards Data Science
4 min readFeb 10, 2021

--

What is Microkernel Architecture

The Microkernel Architecture is sometimes referred as Plug-in Architecture. It consists of a Core System and Plug-in components.

Th Core System contains the minimal functionality required to make the system operational. The Plug-in components are standalone and independent from one another, each of which enhances or extends the Core System with additional capabilities.

The Core System also maintains a plug-in registry, which defines information and contracts between the Core System and each Plug-in component, such as input/output signature and format, communication protocol, etc.

Image from O’Reilly: Software Architecture Pattern

The pros and cons of the Microkernel Architecture is quite obvious. On one hand, the separation of concerns makes it very flexible and extensible as well as easy to maintain, test and evolve. It is especially suitable for product-based applications, in which you could provide a MVP to customers and adding more releases and features along the way with little changes.

On the other hand, this architecture is not suitable for system in which it requires frequent communications and dependencies among different components. You could certainly add dependencies among the Plug-in modules, but the maintenance cost will be exponentially high with the number of dependencies, which outweighs the benefits it brings.

Why Microkernel Architecture for Machine Learning Libraries

Besides training and prediction, there are many concerns that a typical machine learning library has, such as configurable source/sink of the data sources, and cross-cutting concerns like standardization, monitoring, security, etc. These aspects and concerns are isolated and independent from the modelling techniques and algorithms of the data scientists. And they are best addressed by other engineers.

Microkernel Architecture is a natural fit for such a purpose. It enables paralleled developments of both data science and the Plug-in engineering components, while at the same time maintains a high degree of flexibility and maintainability.

An Example of Microkernel Architecture with Python Metaclass

What is Python Metaclass

In Python, a metaclass is the class of class. It defines how a class behaves and how the instances of that class are instantiated. You can view metaclass as a class-factory, which allows you to do something extra when creating a class. In our case, we create a metaclass for registering Plug-in components with the main model class. And thus separating the modelling instances from engineering modules. Data scientists could just focus all their attention on the modelling stuff, while the engineers could work on the engineering components in parallel.

The Example

I have created a PyPi package mkml for this Microkernel architecture. Please refer to the github repository for how to use it.

ML Library with Microkernel Architecture

For demonstration purpose, I created 3 plugin components, which are Standardization, Monitoring and Data Source plug-in respectively:

  • StandardizationPlugin enforces the method signatures of the model class. In our case, the model class must have fit, predict and score methods
  • MonitoringPlugin monitors all the functions of the model class. In our case, it logs the input parameters, exception as well as the duration of each function in the model class
  • LocalDataSourcePlugin helps loading the data locally. It dynamically ingest data-loading functions into the model class to help data scientist retrieve the data without them worrying how to retrieve it

Install from PyPi

pip install mkml

Create your own model by extending from BaseModel that is linked to the custom metaclass

from sklearn.linear_model import LinearRegression
from mkml import BaseModel

class UserModel(BaseModel):
def __init__(self):
self._model = LinearRegression()

def fit(self, features, labels):
self._model.fit(X_train, y_train)

def predict(self, features):
self._model.predict(features)

def score(self, features, labels):
return self._model.score(features, labels)

Instantiate the model class instance and load features and labels for training and prediction

um = UserModel()

features = um.get_local_data(feature_mart_location='data', group_id='train_features')
labels = um.get_local_data(feature_mart_location='data', group_id='train_labels')

um.fit(features, labels)

test_features = um.get_local_data(feature_mart_location='data', group_id='test_features')
test_labels = um.predict(test_features)

Create your own custom Plug-in module (ie. Plug-in for Remote DataSource)

from mkml import BasePlugin

class RemoteDataSourcePlugin(BasePlugin):

def __init__(self, name):
self._name = name

def apply(self, attrs, **kwargs):
logger.debug('Entering data source plugin')
attrs['get_remote_data'] = self._get_remote_data

def _get_remote_data(self, feature_mart_location, group_id):
# To be implemented
pass

Register the custom plug-in module with the metaclass

## You can add additional instantiation parameters to the Plug-in class as well	
MKMLMetaclass.register('remote_datasource', RemoteDataSourcePlugin, 'remote_ds_plugin')

The metaclass MKMLMetaclass controls the instantiation behavior of the UserModel class. Therefore once RemoteDataSourcePlugin is registered with the metaclass, the UserModel class will be equipped with the capability that the plugin brings.

Use the new Remote DataSource Plug-in to retrieve features and labels

um = UserModel()

features = um.get_remote_data(feature_mart_location='http://fm', group_id='train_features')
labels = um.get_remote_data(feature_mart_location='http://fm', group_id='train_labels')

Please refer to this notebook for sample usage.

Conclusion

In this article, I have introduced what Microkernel Architecture is and I have also created an PyPi package mkml to demonstrate how such architecture could bring benefits to the ML Library designs.

In my opinion, the biggest advantage of the Microkernel Architecture is the decoupling between Data Science and Engineering workloads, while at the same time maintaining a very high-degree of extensibility and maintainability of the system. A good architectural design always strikes for a clear separation of Control, Logic and Data. This architecture clearly demonstrates such a characteristic.

One final note is that although Python’s metaclass is used for the implementation and it is a very powerful tool, be extremely careful when trying to use it as it will cause many problems when used inappropriately.

--

--