This is a new post in the context of the brainstorming of future scenarios of science 2.0. The first post was here.
We here analyze the in depth integration of the scientific paper with data and code. We envisage that by 2050, each article will by default be published with access to the underlying data and code. Researchers will expect to be able dig deeper into articles and be able to directly elaborate on the available data, just as changes in the music industry were driven from a change in consumer expectations to be able to access content without limitations.
Data and code sharing will become the norm, rather than the exception. Articles which won’t provide datasets and code will be automatically considered less reliable and robust.
There will be consistent standardized formats for publishing data, code, workflows, just as today for articles. Most importantly, there will be standardized format for integrated publishing. It will not matter if those material are published alongside the paper, in the same repository. Linked data will make this content discoverable and trackable regardless of the actual hosting. Data, code and articles as well as access metrics will be decoupled but linked.
There are already examples of this approach, from the OECD Statlink tool which provides DOI for every chart in a report, to the “article of the future” of Elsevier, to the examples presented via the Beyond the Pdf conference and hashtag.