Most systems today are static

Many modern day systems are static. In order to improve the user experience of these systems, machine learning algorithms are popular and have been plugged in to provide a more appealing user experience.

In most cases, data scientists develop complex algorithms based on data and get data engineers to implement these models into a production environment. Once in production, data scientists will move onto their next project and leave the models in production to the data engineers to monitor and scale as needed.

We are developing even more static systems

These machine learning models in production, however, tend to also be static. They were created by data scientists on historical data. New data trends will not get picked up or incorporated into the existing models. In order to incorporate these new trends, data scientists would need to retrain their models on a fairly regular basis.

Ideally, this would be done daily or even more frequently, depending on the data being used. This is however not very efficient and the small gains in model efficiency don’t out-way the time spent by the data scientists to continually train new models. So systems are being left as they are.

How can we move beyond static systems?

Learning how to implement self-learning algorithms is essential. Before even creating a model we should come up with a strategy of how to create a repeatable process so that future data can be used to update or retrain the current model. Several strategies are worth considering:

1. Create a new model on a regular basis incorporating the new data and switch the new model with the old one in production. The disadvantage of this is that retraining a model can take quite some time and resources and by the time a new model has been trained, it might no longer be up to date. Obviously, this depends on the size and complexity of the model and the time needed to actually train it.

2. Implement a self-learning algorithm that ingests batches of new data. New data can then be added to the existing model on a regular basis. The disadvantage of this is that there aren’t many out of the box algorithms that support this type of retraining.

3. Implement a self-learning algorithm that ingests new data as it becomes available. Ready to use options for this is are also limited but you could always develop your own custom solution.

Problems with self-learning systems

Automatically trained algorithms are more difficult to fine-tune, over-fitting can be a great concern and model stability is a major issue. Your model shouldn’t be giving you drastically different results every time it is re-trained. If this is happening then your algorithm is not stable enough and as a result of not learning larger trends in your underlying data. These problems can be harder to debug and fix with automatically re-trained models.

Is it worth implementing a self-learning system?

The answer is yes. It is worth implementing a self-learning system every time. It will take you more effort to develop the system and put it into production but in the long run, it will save you time and energy. Revising a system is time-consuming. Having a system in place that updates machine learning models automatically gives you peace of mind and allows systems to be accurate and reliable in production for much longer periods of time.

Lessons learnt from implementing self-learning systems

These are some personal tips for those wanting to implement their first self-learning models in a production environment.

1. Have a comprehensive data processing pipeline in place so that new data can easily be added to your model.

2. Set-up a separate system for model training that cannot affect your production models in case training fails.

3. Use a solid metric to test model performance after every training cycle.

4. Have a fallback process in place in case your model no longer performs favourably on your metric.

5. Always test the stability of your models. New data should make your system more accurate not drastically change how it behaves.

6. Set-up alerting for your system. You want to be updated on any abnormal behaviour. Make sure the alerts don’t get triggered too often or you will stop caring and ignore early warning signs.

7. Review detailed statistics of your model performance regularly, at least once a month.

8. Go on creating other models with confidence knowing that the ones you created already are being updated regularly.

How does Spot Intelligence use self-learning systems

Spot Intelligence is a deep learning company that helps other companies extract useful information from computer generated or scanned documents using deep learning. We build custom models for our clients based on their own proprietary data.

For our clients, accuracy is very important. They need to be able to trust that our system produces accurate and consistent results. We do this by flagging low confidence extractions and making them available for manual review. Once this content is reviewed it is sent back to our self-learning models and the system improves itself automatically over time.

How important are self-learning systems to you?

As a startup, creating the best possible product for our clients is important to us. Are self-learning systems import to you? Have you implemented your own? Let us know how your systems work.

← Read more from the blog