MLOps Frameworks & Custom YAML

A typical ML workflow involves running lots of experiments. Looking at results in the context of other results is much more meaningful than looking at a single experiment alone. Looking across lots of experiments at once gets messy quickly. There are lots of inputs changing and lots of different possible outputs. Some runs inevitably fail early. Different experimentation styles lead to different workflows, but logging every metric that seems significant and tagging experiments with a few consistent tags can keep things much more organized later.

There are various aspects of experiment tracking – Data based tracking, Model based tracking, Output and comparison visualizations across metrics/ factors for each experiment.

There are two possible solutions, utilizing platforms satisfying our requirement to track all important details mentioned above will be best viable approach also helping save time. In case none of the platforms can be used for logging all details than we can go for a custom solution along with these platforms. This blog is focused towards generalized understanding towards available platforms and why there can be need for a custom solution to implement and how it can be achieved.

External MLOps Platforms

There are many MLOps platforms available which can be used with simple plug and play functionality for all experiments during model training. Some of the best platforms available in market are MLFlow, Weights and Biases and Tensorflow Extended amongst other great choices.

Out of these MLFlow is open source whereas weights and biases is paid beyond a use for single user and paid for enterprise usage. Both of these can be easily integrated with almost all of significant libraries needed in ML Experiment and provides detailed view in terms of Model tracking. Pertaining to change in source data, weights and biases automatically stores source data along with experiment id respective to each experiment however in MLFlow you can add line of code to save data in default available directory structure. But, this is same as saving your input data locally for record as there is no analysis/ statistics available on input features and analysis of actual logging change in input data itself amongst experiments performed.

Tensorflow Extended (TFX) is an open source end-to-end platform for deploying production ML pipelines. TFX is very well suited for logging everything including input data changes but it performs well and can be easily used only if we utilize its full extent and all modules. It’s a great way to maintain your pipeline using one complete platform which takes care of everything be it model monitoring, data validation, data tracking, feature significance etc. Like above two, TFX can also be easily integrated with almost all of significant libraries needed in ML Experiment. However, migrating every aspect of your current architecture to all TFX modules can take lot of effort and time.

If we exclude data based tracking and input features analysis each of these platforms can be easily used for other logging aspects.

Therefore, developing a custom solution for data based logging along with using any one of these platforms centralized solution for model experiment tracking can be a great solution brining in ease and feasibility for enterprise or individual.

Custom Solution for Data Tracking

For custom solution though we can rely on any language but YAML being one of the most popular and powerful human-readable programming language for configurations we would be using that to build this simple custom solution. Also the fact, YAML can be easily used in conjunction with other programming languages and interfaces can be very helpful.

Definitely we can build a great UI but this is meant for saving a lot of time so we will log all parameters and simple changes in features along with complicated statistics as the end result of this custom YAML solution.

Sample YAML output –

Any number of statistics and parameters can be stored in this configuration file, following a standardized structure surely adds value.