Artificial intelligence and machine learning requires huge amounts of data: after all, more data beats better algorithm. One of the major competitive advantage of players such as Google in the machine learning space is the massive amount of data they have.

But traditional companies, such as manufacturing, simply do not gather enough data to train algorithms effectively and do not have the internal necessary skills. Industrial plants have hundreds of machines, each equipped with hundreds of sensors (the so called Industrial Internet of Things). They produce a LOT of data, but the insight generated would grow exponentially if it was possible to cross-analyse and compare the data from MANY DIFFERENT plants and companies.

One solution would be for these traditional companies to allow third party big data companies to access, aggregate and analyse their data, and develop algorithms for them. However, big data companies tell us that their clients do not allow them to reuse the data for developing new algorithms and products, but only only for performing one-off, customised analysis. Some even say that this lack of data reuse is the main barrier towards achieving AI-led industrial plants (the industrial equivalent of the self-driving cars). There are pilots, such as data innovation spaces or industrial platforms, but they haven’t yet reached a critical mass.

Why so? Companies do not allow third parties to access and reuse their data mostly because they perceive the potential risks as higher than the advantages. In particular, the main perceived risks are twofold:

  • that the third party data company builds products that enable their competitors to learn from their best practices, and hence reduce their competitive advantage;
  • that the third party data company enters the business of running industrial plants thanks to the algorithms developed, and becomes a direct competitor.

In the context of servitisation and increased cross-sector competition, these risks are not without foundation. And big data solutions are still in the “promising” area, they have not yet delivered breakthrough. Yet the reluctance to share data can itself prevent developing these AI innovative solutions.

What do you think? Are traditional companies right in not allowing data access and reuse by third parties? How do we break the vicious circle of no data sharing – no AI progress?