Policy and technology: a "longue durée" view

Random thoughts on policy for technology and on technology for policy



The determinants of science 2.0 adoption #openscience #futurescience20

During the discussion I had at the European Astronomical Society, I realise how much science 2.0 implications vary across disciplines.
This made me sketch out a set of factors that deeply shape how science 2.0 deployment plays out in a specific scientific field:

  • the involvement of industry in funding research: if there is strong industry involvement, there are stricter IPR regimes and less willingness to share. Moreover, in disciplines such as astronomy, with little industry involvement, scientists have less possibility to get rich and are more likely to be motivated purely by curiosity and passion. Hence, science 2.0 can be expected to have a greater impact where industry involvement is smaller
  • the kind of data sources: if data are mainly collected through large observatories, as in the case of astronomy, it is often those observatories that decide on data sharing, and it is certainly easier to have highly structured, high quality and curated data, shared through common repositories. In other fields where data gathering is fragmented, there are less central repositories, and data sharing is more costly and difficult
  • the public appeal: astronomy is fascinating for everyone, and it is easier to have citizen science initiatives such as GalaxyZoo.
  • Big vs small science: it is certainly more common for publications in big science to be reproducible, than it is the case for small science.
  • applied vs basic research: related to the previous point, I am not sure how this plays out, but it is possible that basic research is more curiosity driven and therefore keener to openness.

This is obviously just an initial list. What do you think?

Indicators for data reuse: it’s not how many, it’s who #opendata

I am in Rolle, Switzerland, on the beautiful Geneva lake, getting ready for a speech on Science 2.0 to the European Astronomical Society. As usual, travelling makes me read, and think. In this case, a great paper on the reuse of scientific data: If We Share Data, Will Anyone Use Them? 

One of the topic I am interested in is reuse of open data. In the domain of open government, the current EU eGov action plan one of the key actions is on indicators for PSI reuse. This is critical: after many years fighting to have open government data, we now need to show they are actually getting used and reused. Just as for online public services, there is a sense of disappointment with the low rate of open data reuse, typically measured by number of downloads of datasets or number of users downloading datasets. Somehow there was the expectation that citizens would rush to play with government data, once they became available.

In my opinion, this is a mistaken expectations. Citizens by far are not interested in government data, and certainly not in directly manipulating them. What matters is not how many people download them, but what do with it the few people who care. It does not matter if spending data are downloaded by few people: what matters is that among those few, someone is building great apps and services, used by millions, generating social and economic benefits.

Based on the literature on eGovernment, we somehow expect that UPTAKE indicators anticipate IMPACT indicators. If you have few users downloading, you expect the impact to be low, and viceversa. But the reality is that the success stories of open data happen when “data meet people”, when the right people come across the right data. When it comes to innovation, uptake is not a proxy for impact. What matters is not how many, but who. Number of downloads and number of users should not be taken as headline indicators to measure the impact of open government.

The same is true in science. Publishing scientific data will not lead to thousands of scientists replicating the findings of other scientists. But we know from the Rheinhart Rogoff case that we simply need one student to reuse the data in order to achieve a huge impact, in this case to uncover the mistaken evidence behind the most important economic decisions of our time.

An Open Strategy, in any domain, should not be aiming to generate massive participation, but at enabling and facilitating the job of those few that actually care about them. That’s design for serendipity.

Findability of the data is key and this is why metadata and standards are crucial to grasping the benefits of open data. Because they facilitate the serendipitous encounter of the right people with the right data.

2030: disappointing rates of scientific data reuse and open reviews in #futurescience20

In the context of the brainstorming #futurescience20, let’s continue with some more negative scenarios.

The argument about science 2.0 and open science is based on the assumption that if you open up scientific data:

– many researchers will reuse it, thereby accelerating the rate of new discoveries and improving the “return on data collection”;

– researchers will be able to quickly replicate and check whether the findings of the analysis are robust, thereby uncovering scientific errors and discouraging publication of false claims

Similar reasoning can be applied to open reviews and reputation management: rather than having professional peer reviewers (a slow and ineffective process), it is better to have open contribution from anyone interested, so that the real experts (rather than “appointed”) can properly judge the value of a paper.

Or as in the PlosONE approach, perhaps the basic assessment of publishability can be done by editorial committees, but the judgement of value and importance could be left to the “open market”, so that the number of downloads and citation will be sufficient assessment of its value. It’s the new “publish then filter approach” against the traditional “filter than publish”.

However, we learnt from the history of open government that reuse and participation are hard to achieve.

After all the effort for opening up government data, reuse of such data is still disappointing . From the transparency point of view, it certainly it did not transform government. Certainly, citizens did not eagerly await to examine open government data in order to scrutinise government. Even the fears of potential misuse of open data have not been

In terms of creating jobs and growth through open data reuse, results have also not been living up to the promises. As I perceive it, there is a general feeling of disappointment with the economic impact of open data – perhaps because we raised too many expectations. Certainly, the results are not to be expected in the short term.

When it comes to participation and e-democracy, we know very well how difficult it is to engage citizens in policy debate. Participation remains hard to reach. High quantity and high quality of participation remain the exception rather than the rule, and certainly can’t be considered as a sustainable basis for sound policy-making. High quantity typically occurs when dealing with inflammatory debates or NIMBY-like themes. When ideas are crowdsourced, the most innovative ideas are not the most voted.

If we transpose this reality to the future of science 2.0 , it is therefore to be expected:

  • that researchers will not rush to analyze and replicate other researchers’ studies. Replication will mainly be driven by antagonistic spirit. Most datasets will simply be ignored, either because they are partial or because they are not curated enough.
  • that researchers will certainly not provide reviews (especially public reviews). The Nature experiment with open peer review clarified that:

A small majority of those authors who did participate received comments, but typically very few, despite significant web traffic. Most comments were not technically substantive. Feedback suggests that there is a marked reluctance among researchers to offer open comments.

  • that an assessment of the “importance” based on downloads and citation only, rather than on peer-review, is likely to lead to greater homogeneity of science and reduce the rate of disruptive innovation. The attention will disproportionally focus on the most read articles; because reputation is based on this, scientists will focus on “popular” topics rather than on uncomfortable and less popular disruptive discoveries.

In summary, the full deployment of science 2.0 could lead to a reduction in the quality and quantity of scientific discoveries. Scientists will not spend their time in evaluating other researchers’ work, and when they do it will be with antagonistic spirit, thereby making the open assessment model conflictual and unsustainable. They will focus on being read, tweeted and downloaded – in other words, to be popular, thereby reducing the incentives to disruptive, uncomfortable innovation.

From unbundling to rebundling: the walled gardens of #futurescience20

In the course of this open brainstorming on the future of science 2.0, there is the clear risk of techno-optimism (or technological solutionism).

In this post, we look at the dark side. What could go wrong? To do so, we extrapolate from what happened in other related domains.

For instance, our prevision on the unbundling of science (data production and publication separated from articles and from reputation measurement services; individuals vs institutions; articles vs journals) was envisaged as a liberation of current lock-ins. Data will flow freely from researcher to researcher and from papers to data repository through interoperable open formats.

What is happening on the web today tells a different story. Surely, unbundling weakened the current gatekeepers, such as telecom providers, newspapers, music labels. But rather than a fully anarchical, interoperable economy based on open standards and open API, new gatekeepers and walled gardens emerged under the name of “platforms”: Apple, Facebook, Google, Amazon. Some even said that “the web is dead“. The recent demise of RSS by Google in order to favour GooglePlus is a reflection of this trend. As Wolff puts it in the same feature, “chaos isn’t a business model“.

Even when interoperability is ensured technologically, lock-in is ensured by network effects, preferential attachment or personal data ownership. The Internet, the world wide web, citation networks, and many social networks all are scale-free networks showing power law distribution. In other words, the rich get richer.

So it is possible and likely that either for natural development or for the invisible hand of managers, future science 2.0 will not be totally unbundled and fully interoperable. Instead, it will be divided into walled gardens. Already now, we see platforms such as Mendeley, Google Scholar, Researchgate and Figshare extending their services to what could be considered a kind of vertical integration. For instance, they all try to gather your publications in one place and act as your academic identity.

Just as the free web is damaging newspapers, so openness will weaken existing publishing powerhouses, which by the way are one of the European strenghts. New players will outcompete the “European Champions” of scientific publishing.

Future Science 2.0 will then be platform based. New players will integrate the value chain and build walled platforms. It could for example be that Amazon will build a platform around the Kindle for scientific publishing, including reputation management. Based on the unique data they have from what people read and highlight, they will be able to lock-in researchers and provide finely grained real time reputation based on what people download, read and highlight. They will enable direct publishing (they already do) and even provide scientific crowdsourcing platform for citizen science based on the Mechanical Turk.

Can the Kindle do to scientific publishing what the iPod did to music?

Or it could be data publishers such as FigShare, or reference management systems such as Mendeley. In any case, the lock-in will be based on  ownership of the personal data of researchers: what they read, cite, highlight, what data they gather, what they analyse and publish. Different platforms will provide different, competing reputation measures and identity. Imagine a data publication service telling you: “Researchers who analysed this dataset also analysed these others”.

Researchers will have been emancipated from publishers and institutions only to fall in the slavery of future science platforms.

As a result, scientific reputation will become less reliable; existing publishers will disappear or be bought (imagine, in the future Mendeley will buy Elsevier); data interoperability will be reduced because of different standards.

What do you think? Do you see a future of scientific walled gardens? What will be the future Science 2.0 platforms?

Reproducibility by design in 2030 #futurescience20

Continuing the brainstorming on the future of science 2.0, we can envisage that by 2030 reproducibility will be considered a fundamental requirement for scientific publication. Research papers will be expected to contain all the necessary information to reproduce the experiment and validate the results.

Reproducibility will be formalised in a set of rules. A minimum standard of reproducibility will be required for publication (just as PLOS one does); or maybe publications will have a “reproducibility label” assigned by third party services. Reproducible findings will be considered as higher quality, and the label will enable also non experts (e.g. policy makers) to appreciate it.

Making research reproducible is costly, in terms of documenting the experiments and curating the data, especially when it needs to be retro-fitted after the research. This is why, increasingly, research protocols and methods will be formalised as templates and through tools that facilitate gathering the necessary information to enable reproducibility.

The very fact of making research reproducible will automatically reduce the amount of false findings (both in bad and good faith),  by reducing the incentive to cheat and by introducing more formalised analytical methods.

Integrating scientific evidence with policy analysis in #futurescience20

In this post of the brainstorming future science 2.0 series, we expand the idea of the integrated article of the future to the actual usage of the scientific evidence in policy-making. In 2030, good governance practice will require politicians and civil servants to make explicit reference to scientific evidence integrated in any policy decisions. Not only scientific articles, but strategic documents and politicians speeches will enable readers direct access to the evidence (articles and data) that justify a certain claim, including critical pieces of evidence. It will be possible to seamlessly browse between the policy documents and the underlying data through open standards, which will further enable the flourishing of fact-checking apps to discuss on the validity of the statements. Even more, this “seamless” process will enable to directly show the stakeholders positions and discussions on the specific topic – expanding towards the policy debate.

Sin título

scenarios 2030: microtasking for scientists #futurescience20

In this 5th post of the series on brainstorming future science 2.0, I spell out some ideas on how scientists will work in the future.

The unbundling of the relation between data, code, workflows and analysis will lead to greater collaboration. Researchers will be activated on specific tasks, even micro-tasks, to work on some parts of the problem to be addressed. Large, collaborative efforts (genome-like) are commonplace. This micro-tasking can already be seen in the solution to the Polymath problem, where individual scientists were contributing with ideas to solve the mathematical problem. Or in Innocentive, where anyone (not only researchers) can bring their own solution to challenging real world problem. The Open Innovation paradigm will be applied to science, and scientists will outsource part of their work to specialised services such as Science-Exchange. There will be also greater involvement of micro-expertise from amateurs.

Scientists will carry out large part of their work by contributing to other people’s research project. And they will be rewarded for it. The researchers’ activity on these collaborative efforts will be tracked and measured, adding to their reputation. It’s possible to envisage that platform such as ImpactStory will also include the reputation of the scientist on Innocentive.

Reputation management system will provide the fundamental incentive for collaboration between scientists, not only by rewarding the best but also by indicating the micro-expertise needed. Because reputation will include non-scientific work such as Innocentive-like platforms, there will be greater scope for non-career scientists to develop an academic career by being appointed for specific micro-courses in universities, for example using Massive Open Online Courses.

from altmetrics to instant impact factor #futurescience20

Here comes post 4 in the #futurescience20 brainstorming series. It’s about the future of the impact factor.

Today, impact of research is fundamentally measured by citation in prestigious journals. There are a number of problems about it: mainly that this implies strong delays (up to 5 years) , that “Citations are only a small fraction of how a paper is reused”, and that articles are only one of the kind of scientific outputs [Buschman, M., & Michalek, A. (2010). Are Alternative Metrics Still Alternative ?].

In 2030, impact factor will be multidimensional, granular and real time.

It will consider not just the article citation, but other measures such as actual downloads, views, book holding; number of likes, favourites, “read-later” buttons; mentions on social media and wikipedia. These measures are already starting to become available through services such as and . It will go even further. It will measures to what extent people are reading and highlighting sections of the articles through electronic readers – as already made visible by

This will enable everyone to map not only article metrics, but actual sub-sections of the article as well as of the dataset (granularity).

And all this will be made available in real time. Scientists will be informed in real time about discussion happening on the web about their work. They will be able to connect to other scientists that “think alike” and discover serendipitous connections. The impact factor metrics will become not only a management and reputation tool, but an actual service to scientists.

#futurescience20: the great unbundling

This is post #3 of the brainstorming on the future of science 2.0. Previous posts are here and here. Remember this is a creative brainstorming, not a rigorous analysis.

We previously argued that science 2.0 is more than open access: it affects the full research cycle, from literature review, to hypothesis creation, data collection etc. Moreover, there are today available tools and standards for most of these activities.

One of the implications of these emerging ecosystem is the decoupling and unbundling of these services.

Today, services from data repository, to data curation, to paper publication, to access metrics, are all managed by the publisher. Data are published (if ever) alongside the article; metrics are provided in the same website through proprietary system.

This is not an accident, but part of the fundamental business model of publishers. It is telling that one of the justifications for Elsevier to take down articles posted by academics is that:

One key reason is to ensure that the final published version of an article is readily discoverable and citable via the journal itself in order to maximise the usage metrics and credit for our authors, and to protect the quality and integrity of the scientific record. The formal publications on our platforms also give researchers better tools and links, for example to data

Findability, reputation and metrics – as well as access to data – are mentioned as key services provided by publishers. It will not be like this in 2050: there will be different providers by different services, which will interoperate through standards (possibly open standards).

There will be a vertical disintegration of the value chain, new players will enter the market. However, this openness will not last forever. The new players will try to lock-in customers in a similar way, and services will be re-aggregated around new players. For instance, it could be that data-publishers will be the new gatekeepers, which will also provide access to publications and metrics.

UPDATE 30/12/2013: the unbundling should not be seen simply from the perspective of the publishers, but also of the individual scientist. By 2030, it will not normally be the same researchers which creates the datasets, builds the programme and publishes the results. Scientists will reuse and build on the datasets and code (as well as other intermediate information) of other scientists. The gatekeeping role of the researchers will also be reduced , and this could be huge. According to a study, 905 of the data still reside on the researchers own computers.

Blog at

Up ↑