Search

Policy and technology: a "longue durée" view

Random thoughts on policy for technology and on technology for policy

Month

December 2013

Reproducibility by design in 2030 #futurescience20

Continuing the brainstorming on the future of science 2.0, we can envisage that by 2030 reproducibility will be considered a fundamental requirement for scientific publication. Research papers will be expected to contain all the necessary information to reproduce the experiment and validate the results.

Reproducibility will be formalised in a set of rules. A minimum standard of reproducibility will be required for publication (just as PLOS one does); or maybe publications will have a “reproducibility label” assigned by third party services. Reproducible findings will be considered as higher quality, and the label will enable also non experts (e.g. policy makers) to appreciate it.

Making research reproducible is costly, in terms of documenting the experiments and curating the data, especially when it needs to be retro-fitted after the research. This is why, increasingly, research protocols and methods will be formalised as templates and through tools that facilitate gathering the necessary information to enable reproducibility.

The very fact of making research reproducible will automatically reduce the amount of false findings (both in bad and good faith),  by reducing the incentive to cheat and by introducing more formalised analytical methods.

Integrating scientific evidence with policy analysis in #futurescience20

In this post of the brainstorming future science 2.0 series, we expand the idea of the integrated article of the future to the actual usage of the scientific evidence in policy-making. In 2030, good governance practice will require politicians and civil servants to make explicit reference to scientific evidence integrated in any policy decisions. Not only scientific articles, but strategic documents and politicians speeches will enable readers direct access to the evidence (articles and data) that justify a certain claim, including critical pieces of evidence. It will be possible to seamlessly browse between the policy documents and the underlying data through open standards, which will further enable the flourishing of fact-checking apps to discuss on the validity of the statements. Even more, this “seamless” process will enable to directly show the stakeholders positions and discussions on the specific topic – expanding towards the policy debate.

Sin título

scenarios 2030: microtasking for scientists #futurescience20

In this 5th post of the series on brainstorming future science 2.0, I spell out some ideas on how scientists will work in the future.

The unbundling of the relation between data, code, workflows and analysis will lead to greater collaboration. Researchers will be activated on specific tasks, even micro-tasks, to work on some parts of the problem to be addressed. Large, collaborative efforts (genome-like) are commonplace. This micro-tasking can already be seen in the solution to the Polymath problem, where individual scientists were contributing with ideas to solve the mathematical problem. Or in Innocentive, where anyone (not only researchers) can bring their own solution to challenging real world problem. The Open Innovation paradigm will be applied to science, and scientists will outsource part of their work to specialised services such as Science-Exchange. There will be also greater involvement of micro-expertise from amateurs.

Scientists will carry out large part of their work by contributing to other people’s research project. And they will be rewarded for it. The researchers’ activity on these collaborative efforts will be tracked and measured, adding to their reputation. It’s possible to envisage that platform such as ImpactStory will also include the reputation of the scientist on Innocentive.

Reputation management system will provide the fundamental incentive for collaboration between scientists, not only by rewarding the best but also by indicating the micro-expertise needed. Because reputation will include non-scientific work such as Innocentive-like platforms, there will be greater scope for non-career scientists to develop an academic career by being appointed for specific micro-courses in universities, for example using Massive Open Online Courses.

from altmetrics to instant impact factor #futurescience20

Here comes post 4 in the #futurescience20 brainstorming series. It’s about the future of the impact factor.

Today, impact of research is fundamentally measured by citation in prestigious journals. There are a number of problems about it: mainly that this implies strong delays (up to 5 years) , that “Citations are only a small fraction of how a paper is reused”, and that articles are only one of the kind of scientific outputs [Buschman, M., & Michalek, A. (2010). Are Alternative Metrics Still Alternative ?].

In 2030, impact factor will be multidimensional, granular and real time.

It will consider not just the article citation, but other measures such as actual downloads, views, book holding; number of likes, favourites, “read-later” buttons; mentions on social media and wikipedia. These measures are already starting to become available through services such as altmetric.com and http://www.plumanalytics.com/metrics.html . It will go even further. It will measures to what extent people are reading and highlighting sections of the articles through electronic readers – as already made visible by kindle.amazon.com.

This will enable everyone to map not only article metrics, but actual sub-sections of the article as well as of the dataset (granularity).

And all this will be made available in real time. Scientists will be informed in real time about discussion happening on the web about their work. They will be able to connect to other scientists that “think alike” and discover serendipitous connections. The impact factor metrics will become not only a management and reputation tool, but an actual service to scientists.

#futurescience20: the great unbundling

This is post #3 of the brainstorming on the future of science 2.0. Previous posts are here and here. Remember this is a creative brainstorming, not a rigorous analysis.

We previously argued that science 2.0 is more than open access: it affects the full research cycle, from literature review, to hypothesis creation, data collection etc. Moreover, there are today available tools and standards for most of these activities.

One of the implications of these emerging ecosystem is the decoupling and unbundling of these services.

Today, services from data repository, to data curation, to paper publication, to access metrics, are all managed by the publisher. Data are published (if ever) alongside the article; metrics are provided in the same website through proprietary system.

This is not an accident, but part of the fundamental business model of publishers. It is telling that one of the justifications for Elsevier to take down articles posted by academics is that:

One key reason is to ensure that the final published version of an article is readily discoverable and citable via the journal itself in order to maximise the usage metrics and credit for our authors, and to protect the quality and integrity of the scientific record. The formal publications on our platforms also give researchers better tools and links, for example to data

Findability, reputation and metrics – as well as access to data – are mentioned as key services provided by publishers. It will not be like this in 2050: there will be different providers by different services, which will interoperate through standards (possibly open standards).

There will be a vertical disintegration of the value chain, new players will enter the market. However, this openness will not last forever. The new players will try to lock-in customers in a similar way, and services will be re-aggregated around new players. For instance, it could be that data-publishers will be the new gatekeepers, which will also provide access to publications and metrics.

UPDATE 30/12/2013: the unbundling should not be seen simply from the perspective of the publishers, but also of the individual scientist. By 2030, it will not normally be the same researchers which creates the datasets, builds the programme and publishes the results. Scientists will reuse and build on the datasets and code (as well as other intermediate information) of other scientists. The gatekeeping role of the researchers will also be reduced , and this could be huge. According to a study, 905 of the data still reside on the researchers own computers.

#Futurescience20: integrating articles with data by default

This is a new post in the context of the brainstorming of future scenarios of science 2.0. The first post was here.

We here analyze the in depth integration of the scientific paper with data and code. We envisage that by 2050, each article will by default be published with access to the underlying data and code. Researchers will expect to be able dig deeper into articles and be able to directly elaborate on the available data, just as changes in the music industry were driven from a change in consumer expectations to be able to access content without limitations.

Data and code sharing will become the norm, rather than the exception. Articles which won’t provide datasets and code will be automatically considered less reliable and robust.

There will be consistent standardized formats for publishing data, code, workflows, just as today for articles. Most importantly, there will be standardized format for integrated publishing. It will not matter if those material are published alongside the paper, in the same repository. Linked data will make this content discoverable and trackable regardless of the actual hosting. Data, code and articles as well as access metrics will be decoupled but linked.

There are already examples of this approach, from the OECD Statlink tool which provides DOI for every chart in a report, to the “article of the future” of Elsevier, to the examples presented via the Beyond the Pdf conference and hashtag.

my new favourite sport: @asana tennis

As I wrote before, I am a big fan of Asana, mainly as a first great example of post-email application and social app for productivity.

Sometimes, in using it with co-workers, a kind of asana-tennis starts. I give a task to you, you comment and give it back to me, I come back to you etc.

Brainstorming the future of science: will we have dataset-index for scientists?

One of my latest assignment is a forward looking study on science 2.0: what will science look like in 2050 based on the 2.0 paradigm?

I’d like now to start a collective brainstorming exercise. Let’s start thinking together about a fully deployed science 2.0 world. What will it look like? What will be the main differences? What are the risks and opportunities?

To kick-off the discussion, here’s a first “scenario snippet”.

In the future, the reputation of scientists will not only be based on their papers. Reuse of scientific data, for purposes such as replication and meta-review , will be a normal part of the scientist work. By default, scientist will publish their datasets, duly curated, in common repositories for other scientists to access.

These datasets will be published as linked data, which will facilitate reuse, but will also facilitate tracking of this reuse through data citation mechanisms. You will know who has reused the datasets, and what conclusions they have drawn from it.

Tracking of reuse and data citation will enable building new reputation mechanisms beyond impact factor and H-index. These new reputation mechanism will reward scientists who produce datasets that are reused by many other scientists. Data citations will be as worthy as article citations. This will encourage further data sharing as scientists who share data will actually gain from it, in terms of career.

Just as, in the music business, streaming services such as Spotify provide a far more accurate measurement of popularity than the old “top of the pops” by actually tracking the act of listening to a song down at the level of individual, so, in science, data citations will allow for more accurate measures of the reputation of scientists.

Now, it’s your turn. Provide your ideas, wild as they can be. It’s not a time to criticize and get it right; it’s the time to think aloud. Share your snippets of the science 2.0 of the future!

From task management to template process building: an idea for #bigdata application

In managing my company, I have to rely more and more on template processes, checklists and so. We need to structure our work better, and I make up these “process templates” that build on what we learn from projects. After the third online engagement website you build, you learn what you should always check. Here’s the importance of checklists.

I recently read Big Data, the book by Culkier. Good way to make you “think data”. Here are my “data thoughts” on task management.

I think companies such as Asana (which I love) will be the future of business consultancy. They will substitute the big consultancies leveraging big data, just like Coursera will displace traditional course providers by leveraging information on how people use training and using this information to design more effective training. Business consultancy could become a data product, delivered “as a service” for much lower costs. Asana will outcompete Accenture, Deloitte etc. because it will not have to collect data on the processes, it will have them already. Ok, let me put it in a better context: Accenture will most likely remain for large companies, but Asana will be able to open up business consultancy to small companies.

If I were the lead of Asana, here’s what I’d develop.

  1. BUILD PROCESS TEMPLATES: The most basic level is allowing you to create “templates” of processes composed by different tasks, that you can reuse on different projects.
  2. SUGGEST PROCESS TEMPLATES FROM YOUR DATA. A service that analyses tasks created by individual/groups/organizations. It detects patterns in tasks design: what tasks are created consistently within an organisation. Who are they assigned to. From this, it proposes to the client possible templates/checklist of projects: we noticed that when you build a website, you tend to structure the work as follows: prototype, meeting with client, revision etc. So, here’s a template process for you, a checklist that people in your organisation should follow. Of course, anyone can build template and checklists: there are dedicated websites just for this. But using big data, you are not creating them in an abstract way: they emerge directly from your behaviour. 
  3. SUGGEST PROCESS TEMPLATES FROM ALL DATA. Asana could build these process templates by leveraging millions of tasks for all its customers, not just from you.
  4. INTELLIGENCE: which kind of tasks get delayed more, and why. And propose changes, again based on what other companies are doing.

Probably, once again, this kind of intelligence is not new: it is already standard practice for large companies, which have internal “Asanas” and more sophisticated ERP systems.. The web is just opening this up by lowering the barriers to entry. Yet I expect that a change in quantity will lead to a change in quality.

 

Blog at WordPress.com.

Up ↑