One of the refrains in the big data narrative is the “end of theory”, from an influential article of Chris Anderson.
Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Anderson is able to convey and popularize a highly scientific notion – and as such it is prone to oversimplification. But there is a lot of serious science confirming the same trend, such as the article “Here is the evidence, now what is the hypothesis?”.
As usual, this is not an revolution: it is not totally new, and theory is very much necessary. As some scholars point out (h/t Giuseppe Veltri), the use of data correlation as a discovery tool is not new, but the (petabyte) scale of it is. More importantly, while one can say that Google PageRank (the greatest example of a data-driven approach) “simply” uses links data to assess the value of a webpage, in reality the idea to use links as a preference can be considered a theory.
In any case, data-intensive science calls for a greater role of inductive, rather than deductive, methods.
Thinking about this, it seemed obvious to me that there should be some kind of software that helps building the hypothesis in an automated way from the data: “hypothesis as a service”. I had the idea, and then as usual turned to Google to see because I am sure that someone in the world already had made it. I searched for “hypothesis formulation tools” and I came across this.
Introduction: what is hypothesis formulation technology?
The DMax Assistant™ product family is a collection of software tools that help researchers to extract hypotheses from scientific data and domain specific background knowledge.
Let me now wander off the track a bit. To me, this is one of the cases where you see the Hegelian Weltgeist (spirit of the world) made real. One can imagine such tool for social sciences as well. It’s a kind of “augmented scientific process”. Instead of a Google Glass, imagine a Google Microscope. You see the image, and it proposes related images, relevant theories and articles, and hypothesis emerging.
Finally, an easy prediction. Automated hypothesis building is just another tool – it augments but does not substitute human brain. Scientists are needed to make the best of it: for instance to check what datasets to merge. But technology is reducing the need to “choose the datasets”…
Will we need less scientists in the future? Because you know, we’re also seeing “randomize trial as a service” tools (1, 2)…