Juan Francisco Caro (Extremadura en Datos), speaking at the II Jornadas de Periodismo de Datos in Barcelona:
Data journalism is not just getting data out there. You have to verify sources and master statistical concepts to avoid publishing mistaken assumptions and interpretations. If not, information becomes disinformation.
Nicola Hughes (The Times), speaking at the II Jornadas de Periodismo de Datos in Barcelona:
If you can write it, record it or film it it’s not from the web, you’re putting something else on the web
Data journalism is becoming too popular, in the sense that some people think that it’s enough to do some line charts, bar charts, just putting data out there, but they are not telling the story. There’s a need for storytelling.
The internet is transient, there’s no control over the tools you use, they can disappear. But knowing how to code solves that. And it also helps to document, backup, reproduce projects, and reuse tools in different projects.
The problem right now is not that information is scarce, it’s the opposite, organisations and institutions publish a ton of information, and because most journalists only look for press releases and copy to rewrite, interesting things become hidden in the deluge.
Advice to journalists: Take risks. Use your imagination. Think of yourself as a craftsman.
There’s no such thing as “I don’t know”, just “I haven’t googled it yet”.
Do one coding course, just one, and then start building things. You have to write a lot of bad poetry to start writing good poetry. it’s very much a craft.
No han echado a Pedro J. porque los ministros no fuesen a sus entregas de premios. Eso era solo simbólico. Las presiones han sido mucho más sencillas: han consistido en cortar el grifo de la publicidad institucional. Según cálculos internos de Unidad Editorial, la guerra desatada por el escándalo Bárcenas –y especialmente por los SMS del presidente del Gobierno al extesorero del PP– le ha costado al grupo unos 14 millones de euros en publicidad institucional.
Todas las administraciones gobernadas por el PP, desde el Ministerio de Empleo hasta el Ayuntamiento de Sevilla, pasando por Castilla-La Mancha o la Comunidad de Madrid, han cortado el grifo de las subvenciones a El Mundo. Todo ese dinero público, que el PP reparte arbitrariamente y utiliza para domesticar a los medios de comunicación, ha pasado de El Mundo al ABC. Y de la misma manera que hace unos años Esperanza Aguirre se cargó a José Antonio Zarzalejos, hoy Mariano Rajoy ha desbancado a Pedro José.
Galicia is holding its Autonomy elections in October 21st, and the National Statistics Institute has released a small set of data from the electoral census. Working with this data we first saw how councils in Galicia have a varying percentage of their constituency living abroad, with some councils having more than 50% of their voters living outside Spain. We’ll continue to explore that data in the coming days, but today I want to take a look at how age also defines the voters profile in Galicia.
Migration is one of the key factors influencing the distribution of the galician electoral census. The other one is age, which is also a consequence of the first one. In the first graph we see the total census distribution by province and age. We can see clearly that the western provinces (A Coruña and Pontevedra) have much more weight in the census than the eastern ones (Lugo and Ourense). We can also see that the differences between these two groups are much more evident in the younger half of the census, but we can’t see clearly how important age is in the census distribution in Galicia.
In this second graph we can see the percentage of voters in each province by age, so we can more clearly see the weight of age in each of the four provinces. Lugo and Ourense in the east are the more aged provinces, both because of demographic trends and because of young people leaving for the richer provinces of Pontevedra and Coruña, other parts of Spain and abroad.
Relative data, as percentages, allow us to compare provinces and see the weight of each age group more clearly than when working with absolute numbers like the total number of voters.
Galicia, an Autonomy inside Spain, is having elections on October 21st. The spanish National Statistics Institute has released some data about the electoral census of the Autonomy, specifically regarding its age and population living in foreign countries. Galicia has a strong migration history and almost 15% of the electoral census lives abroad, although only 3,68% will vote, due to electoral laws restricting and difficulting the voting rights of a collective that registered participation levels above 30% in the past.
In the map above we can see the percentage of electoral census living abroad by council. The data ranges from 0.88% in Burela, to almost 55% in Avión, Bande and Gomesende.
I’m currently writing an article on digital currencies for a future, small-run magazine edited by Crazy Little Things. To complement the article, and to try to learn some new data-journalism skills I decided to do also a timeline of the most relevant digital currencies for the past 20+ years. This is the result:
How it’s done
I used Inkscape to draw the SVG file: timeline, bars and text, and create the layout. Then I added basic interactivity by hand using a text-editor. The content is based on my own research for the article. I plan to include in further releases a csv with the source data used in the timeline so it’s easier for others to replicate it using other tools.
Regarding interactivity, right now you can uncover some contextual information hovering your mouse over certain years, and click on the names of the digital currencies to go to their website or get more information. I plan to add more contextual information on the currencies, explaining the type of currency and the reason it dissapeared, if needed.
The project (just an SVG file) is hosted on Github. You can download it, fork it, open a new issue, send ideas or suggestions. The project is under a NC-BY-SA Creative Commons licence. This is my first time using Git and Github for a project like this, and I’ll share my experience in a separate post. I can tell you now that I’ll definitely keep using it.
I’m also open to criticism on the timeline content: Did I miss a critical digital currency project? Should I remove something from the timeline? Is any of the data wrong? I’m all ears.
Know that the most important part of data journalism is… journalism. Reporting. In other words, you know how to report a story, you understand how to treat data as a source. You know how to pick up a phone, and not just assume that everything you get in data form (especially government data) is complete and accurate.
You have at least basic data skills — meaning, you know your way around a spreadsheet. You can figure out for yourself how to import data, and do something with it. You also understand the basics of data analysis: rates, ratios, sums, averages, medians, and how to use them.
You have command of more advanced data analysis skills, such as GIS, basic statistics, advanced SQL, etc. You also may know some basic programming techniques (using the language of your choice… Python, Perl, Ruby. ILENE.. shoot, even .NET) to scrape the web, get and clean data.
You have some skills with a web framework (Django, Rails, Grails) in order to enhance your reporting online through data-driven applications that you create from scratch and host.
CartoDB is a powerful and open source geospatial data management and visualization tool. It does everything Google Fusion Tables does, and more. If you’re comfortable with SQL queries and CSS (CartoDB uses Carto, a stylesheet language from Mapbox similar to CSS), you can get amazing results, including hexagonal density grids, or editable and interactive maps. They have more case examples in the gallery, and you can find many more examples online.
In the workshop we learned how to do the basic stuff: upload different kinds of data, visualize them, merge them and tinker a bit with the SQL queries and Carto stylings. We did two maps during the workshop: Spanish unemployment by provinces and life-expectancy rates by country:
I had a CartoDB account way before the workshop but I never got around to try it. I don’t know why, but I thought it was harder to use than Google Fusion Tables, so I was pleasantly surprised to discover how easy it was to work with it. Now I’m looking forward to see how can I use CartoDB in my projects.