Professor Neil Laurence has proposed a concept of Data readiness levels. The highest level of Data readiness represents Data which is most useful to make predictions i.e. “Can we use this data to prove the efficacy of a drug?”
In many cases, start-ups do not have data that is useful for making predictions. This applies very much to AI start-ups.
AI is based on Deep Learning algorithms. Deep Learning involves automatic feature detection from data. To do so, by definition, we need a lot of Data. More specifically, we need a lot of labelled data to train the Deep Learning algorithm layers.
Many start-ups/companies do not have this data – and hence may not be able to solve the problem they set out to solve. Hence, one could argue that most AI start-ups are actually not Data ready.
I believe that there are various ways to address this problem
Data readiness strategies
- Unsupervised learning ex autoencoders which can be used to create a structure similar to PCA for example the image processing example using autoencoders
- Semi supervised learning: Using unlabelled data with small amounts of labelled data explained in a good paper by Yoshua Bengio
- Newer solutions like nanonets
- Synthetic data strategies
- Free or available data to initially train the model
- Model zoos
- With less data, one would run a mix of Deep learning and machine learning algorithms – so feature selection and transformation strategies would apply
My overall impression is:
AI is a very new field and there is competitive advantage to first movers. Thus, many companies are adopting variants of the above strategies and will move forward even when they have limited data initially. But, by the same token, companies must have a clear set of strategies in place as they address investors.
Izaskun Larrea Manzarbeitia