Data Visualization Literacy

Supporting data visualization literacy is key to drive the development of data visualization as a tool to enhance the understanding of critical information by the public. As explained in this Financial Times article: “only 63 per cent of American adults can correctly interpret a scatter plot”. I guess the figure would be similar across other western countries, and even lower in the developing world. Expanding the available vocabulary to data journalists would allow for improved visualizations making sense of complex issues.

The Financial Times Visual Vocabulary

The article mentions the Graphic Continuum as a learning tool, which the Financial Times used as a basis to develop their own Visual Vocabulary. But these attempts fall short of the real objective, which should be educating the general public. The 37% of American adults who can’t understand what a scatterplot is showing won’t turn to these tools for help. Educational efforts should be part of the mission of data journalists to ensure their work is widely understood. This mission can only be achieved on a daily basis, developing our visualizations with the end user in mind. Not to dumb them down, but to provide tools to ensure they can be understood and navigated by anyone.

Data visualization literacy tool from the Financial Times

Storytelling with data

Data journalism can be much more than an impressive, interactive visualization or an inmmersive longform piece. There’s also the option of letting the data and the visualizations lead the storytelling, allowing for a much deeper comprehension of the subject at hand, as it’s the case with this work from the Tampa Bay Times. There, visualizations lead the story to show us the case they are investigating and explain to us why it’s important, taking us through each step.

Visit the story to see for yourself.

tampabaytimes-data-storytelling

Data visualization of electoral census in Galicia, Spain (II)

Galicia is holding its Autonomy elections in October 21st, and the National Statistics Institute has released a small set of data from the electoral census. Working with this data we first saw how councils in Galicia have a varying percentage of their constituency living abroad, with some councils having more than 50% of their voters living outside Spain. We’ll continue to explore that data in the coming days, but today I want to take a look at how age also defines the voters profile in Galicia.

Migration is one of the key factors influencing the distribution of the galician electoral census. The other one is age, which is also a consequence of the first one. In the first graph we see the total census distribution by province and age. We can see clearly that the western provinces (A Coruña and Pontevedra) have much more weight in the census than the eastern ones (Lugo and Ourense). We can also see that the differences between these two groups are much more evident in the younger half of the census, but we can’t see clearly how important age is in the census distribution in Galicia.

In this second graph we can see the percentage of voters in each province by age, so we can more clearly see the weight of age in each of the four provinces. Lugo and Ourense in the east are the more aged provinces, both because of demographic trends and because of young people leaving for the richer provinces of Pontevedra and Coruña, other parts of Spain and abroad.

Relative data, as percentages, allow us to compare provinces and see the weight of each age group more clearly than when working with absolute numbers like the total number of voters.

Data visualization of electoral census in Galicia, Spain (I)

Galicia, an Autonomy inside Spain, is having elections on October 21st. The spanish National Statistics Institute has released some data about the electoral census of the Autonomy, specifically regarding its age and population living in foreign countries. Galicia has a strong migration history and almost 15% of the electoral census lives abroad, although only 3,68% will vote, due to electoral laws restricting and difficulting the voting rights of a collective that registered participation levels above 30% in the past.

In the map above we can see the percentage of electoral census living abroad by council. The data ranges from 0.88% in Burela, to almost 55% in Avión, Bande and Gomesende.

Digital Currency Timeline

I’m currently writing an article on digital currencies for a future, small-run magazine edited by Crazy Little Things. To complement the article, and to try to learn some new data-journalism skills I decided to do also a timeline of the most relevant digital currencies for the past 20+ years. This is the result:

How it’s done

I used Inkscape to draw the SVG file: timeline, bars and text, and create the layout. Then I added basic interactivity by hand using a text-editor. The content is based on my own research for the article. I plan to include in further releases a csv with the source data used in the timeline so it’s easier for others to replicate it using other tools.

Regarding interactivity, right now you can uncover some contextual information hovering your mouse over certain years, and click on the names of the digital currencies to go to their website or get more information. I plan to add more contextual information on the currencies, explaining the type of currency and the reason it dissapeared, if needed.

Improve it

The project (just an SVG file) is hosted on Github. You can download it, fork it, open a new issue, send ideas or suggestions. The project is under a NC-BY-SA Creative Commons licence. This is my first time using Git and Github for a project like this, and I’ll share my experience in a separate post. I can tell you now that I’ll definitely keep using it.

I’m also open to criticism on the timeline content: Did I miss a critical digital currency project? Should I remove something from the timeline? Is any of the data wrong? I’m all ears.

CartoDB workshop in Barcelona

Last Monday I attended a workshop about CartoDB, organized by Media140 (with whom I’ve also collaborated: 1 and 2) and presented by Sergio Álvarez.

CartoDB is a powerful and open source geospatial data management and visualization tool. It does everything Google Fusion Tables does, and more. If you’re comfortable with SQL queries and CSS (CartoDB uses Carto, a stylesheet language from Mapbox similar to CSS), you can get amazing results, including hexagonal density grids, or editable and interactive maps. They have more case examples in the gallery, and you can find many more examples online.

In the workshop we learned how to do the basic stuff: upload different kinds of data, visualize them, merge them and tinker a bit with the SQL queries and Carto stylings. We did two maps during the workshop: Spanish unemployment by provinces and life-expectancy rates by country:

I had a CartoDB account way before the workshop but I never got around to try it. I don’t know why, but I thought it was harder to use than Google Fusion Tables, so I was pleasantly surprised to discover how easy it was to work with it. Now I’m looking forward to see how can I use CartoDB in my projects.

Barcelona: income by neighborhood

Red hues: Below average
Green: Around average
Blue hues: Above average

I see a trend around the Diagonal. Urbanism influenced wealth distribution around the city, or the other way around? What has been the impact of gentrification (Olympic Games, Fórum, 22@) in Vila Olímpica, Diagonal Mar, and Parc i Llacuna del Poblenou?

This map is going to be one of the exercises I’ll teach at a data visualization workshop organized by Media140, on November 7th, at Vilaweb, in Barcelona.

#adoptaunsenador: an experiment in crowdsourcing

Spanish Congress and Senate members released four days ago statements about their owned property and assets, as well as any other employment or line of work besides public service. But there was a catch. They released it in single PDF files, one per each Senator and Congressman. Quickly, developer David Cabo, member of the ProBonoPublico collective and a key member of the data visualization community in Spain, created a hashtag (#adoptaunsenador) and opened a Google Docs spreadsheet to crowdsource the data extraction process from the individual PDF files into a single structured file.

The process was completed after just 4 days, but the experience was not entirely satisfactory:

  • The spreadsheet was too slow to be used with more than 50 people editing simultaneously.
  • Anonymous editing was allowed from the beginning, but it was difficult to cope with erased / lost data. Because of simultaneous editing, recovering an earlier instance of the spreadsheet meant losing data as well from more recent edits.
  • At some point, spam finally came and anonymous editing was closed. The amount of people working on the spreadsheet dropped to 12-15.
  • Anonymous edits were also the laziest: unfinished sentences and skipped sections.
  • Data columns for currencies or dates need to be formatted before the spreadsheet is made public to avoid confusion and different formattings by the contributors.
  • An editor is needed to overview the process, secure that there is a standard in the transcription process and ensure that there’s no missing data.

There’s another crowdsourcing process going on with the objective to put together all the properties and assets data from members of the Congress: spreadsheet and hashtag.

Taller de periodismo de datos en Media140

Esta es la versión beta de la presentación que utilicé ayer en el taller de periodismo de datos que la gente de Media140 me ofreció presentar. A última hora hice algunos cambios en la estructura y el orden de la presentación. Intentaré actualizarlo el viernes con la estructura que usé en el taller, notas y la referencia a Florence Nightingale que se me pasó incluir.