The image that makes the case for #openscience better than a thousand words

This is the coding mistake in Reinhart-Rogoff 2010 paper that was uncovered in 2013 by a student who finally obtained the original datasets. It’s easy to see the missing cells which include Belgium, a country with high debt and high growth.

Three years passed between the publication of the report and the uncovering of the mistakes. During this time, the report had been quoted extensively by scholars and politicians to justify austerity.

If releasing data were the default option in any such paper, such mistake would have been easily detected .

The image that makes the case for #openscience better than a thousand words

Podemos in power / 2

Just noticed today in the press that the former mayor envisages a possible alternative to Ada Colau, through a “saint alliance” of basically all the other parties.

What is interesting in this is that the single point of substance mentioned by the mayor (“what most worries me”) is precisely the Mobile World Congress in Barcelona. He says that many cities “would fight for such an event and that we need to handle it with care”.

UPDATE: on may 29th, Colau validated the offer from the previous alcalde for the MWC. Apparently her party “has always been in favour of MWC”. However, in her first press conference, she said she was in favour “but that its benefits should spread all over the city, the jobs should not be temporary, and all the social benefits of the the apps should be exploited”. A bit of bipartisanship and diplomacy emerging here.

Podemos in power / 2

Correlation is not causation. Is non-correlation non-causation? #offthetopofmyhead

Caveat: this is another “off the top of my mind” posts.

“Correlation is not causation” is a concept well familiar to all analysts – and to policy analysts in particular. It is so mainstream that it has its own wikipedia entry. The web is full of examples of weird correlations.

However, can we say that the absence of correlation indicates the absence of causation?

Certainly not: there might be other factors in place that affect a phenomenon and thereby “hide” the correlation.

At the same time, the absence of correlation is a much more reliable sign of the absence of causation, than its presence. It is much more likely that no correlation is confirmed (after more in depth analysis) as no causation, than correlation is confirmed as causation.

I would even say (as a rule of thumb) that the majority of non-correlations turns out to be non-causation, while the minority of correlations turns out to be causations.

Correlation is not causation. Is non-correlation non-causation? #offthetopofmyhead

Podemos takes the power: notes from Barcelona

In the elections this sunday, Ada Colau’s party  (affiliated to Podemos) was the most voted.

It is likely that she will be the new major of Barcelona, my current city.

This will be VERY interesting. Today, Barcelona is very much startup and business oriented, focussing on design and innovation, and the Mobile World Congress is probably its foremost symbol. Just as a sign of this culture, our previous building was recently renamed “Barcelona Growth Center”.

Obviously, we expect this to change quickly. It is a unique opportunity to see to what extent a city can change, in case of a power shift from right to radical left.

From today onwards, I will start taking notes to document this change.

The first sign appeared today in El Pais: it appears that the Mobile World Congress is less sure to remain in Barcelona, as the new “alcaldesa” warns that is should “provide benefits to the whole city”.

Podemos takes the power: notes from Barcelona

Quantifying the quantified self

5DB81B27517CB36EA79AFAD537973DD15CBB1D5882249BA10Apimgpsh_fullsize_distr

I finally bought a smart wristband, a device that measures how much you sleep and how much you walk.

After one month, I like it more then ever. It’s not very accurate, but it gives me a rough idea of how much I walk and rest. It nudges me in doing the right thing: if I see that I did not walk much, I’m more likely to choose not to take the bus back and walk instead.

What matters is not the actual number of steps, but the capacity to be aware of myself. I can easily see if I am more or less active compared to my normal rate. I have trend data, and that’s all I care. I don’t care much for comparing against other people. But its great to be able to detect whether I’m having a lazy or active day.

And that’s probably true for big data in general. Despite the hype, it’s very very difficult to make sense of many large datasets. Cross analyzing data, identifying patterns and correlation is harder than we expected [1,2]. Basically, the problem is to make sense of big data.

But big data means also that as many things get measured, you will soon start having trend data for everything. You will uncover anomalies much earlier because everything will be measured.

In other words, while it remains difficult to cross analyze a huge amount of large datasets to uncover correlation, it will become much easier to simply uncover anomalies by comparing new data with old data. This suggests that big data will become much more important in the data: we’re today deploying sensors and struggling to make sense of the data, but in the future trend data from these sensors will become available, and simply detecting an anomaly will raise attention on potential problems.

Speaking of which, I am currently looking for data on the market for “quantified self”. How many people have smart wristbands? how many have fitness or health apps? Where can I find key data points?

Quantifying the quantified self

Despite the rethorics, planning is more important than ever

This is a “off the top of my mind” post. Don’t expect anything useful.

There’s lots of talk about the need for emergent structures as opposed to ex-ante planning. About the death of traditional planning, the illusion of control, unpredictability and the role of black swans.

But in my daily worklife, I see planning becoming more, not less important.

Professional project management practices is becoming more, not less, important. It is percolating into more and more areas. Project management is a more important skill in the workforce, across sectors.

Great, modern startups and organisations are super professional in the way they manage projects. They always use templates and structured processes.

True, management is changing, becoming more flexible and less imposing (read: from MS Office to Asana). But scale matters more than ever, and you can’t scale without structured processes.

To sum up: it’s true that we moved from MS Project to Asana. Despite or because of that, project management is more pervasive and mainstreamed than ever.

Despite the rethorics, planning is more important than ever

Open Data in real life: some reasons to keep it closed

I have just registered my kid for school in Barcelona, it was an exhausting experience. Basically, if you have a two years old and live in Barcelona, chances are you’ve devoted the last three months mostly to this.

By the end of the process, I learnt what data are most used for decision:

  • number of teachers per pupil
  • number of external activities per trimestre
  • non-usage of textbook in primary school (indicator of openness/modernity)
  • socio-economic standing of the families
  • facilities (garden, sport, etc) but even basic things such as windows in rooms (some schools don’t have it)
  • own kitchen
  • additional services (e.g. logopedy)
  • average results of kids in secondary school test

On top of this basic data, most used it the feedback of other parents – provided think like you.

But because of the procedure used for subscription, the fundamental datum is oversubscription rate. You should choose a school that is not oversubscribed, and therefore the oversubscription rate is fundamental.

Obviously, these data are often not available and you have to chase them. I expect soon some app becoming available gathering such data.

But the fundamental point is that these data are not being published in easily readable format on purpose. The argument is that if data, and in particular the average results of pupils, were made available, parents will flock to the best performing schools and enhance the inequalities of the system. 

In other words, transparency would create a “rich getting richer” effect. The difficulty in getting the data ensures that only those who are really interested are getting this information.

Is this argument sound? Is it applicable to other domains? What are the main counterarguments? For sure this can be applied to mortality rates of the hospitals.

It reminds me of another apparently sound argument for limiting the openness, i.e. the embargo period for scientific publication coming from specific datasets in order to allow the author of the dataset to write his conclusions. This is also recognized as a valid limitation, but is it really so?

I wrote in the past on the need for “sharing literacy” rather than sharing culture , because one should know when and where to be open.

Can we identify a set of valid arguments to limit openness across domains?

Open Data in real life: some reasons to keep it closed