Data visualization of electoral census in Galicia, Spain (II)

Galicia is holding its Autonomy elections in October 21st, and the National Statistics Institute has released a small set of data from the electoral census. Working with this data we first saw how councils in Galicia have a varying percentage of their constituency living abroad, with some councils having more than 50% of their voters living outside Spain. We’ll continue to explore that data in the coming days, but today I want to take a look at how age also defines the voters profile in Galicia.

Migration is one of the key factors influencing the distribution of the galician electoral census. The other one is age, which is also a consequence of the first one. In the first graph we see the total census distribution by province and age. We can see clearly that the western provinces (A Coruña and Pontevedra) have much more weight in the census than the eastern ones (Lugo and Ourense). We can also see that the differences between these two groups are much more evident in the younger half of the census, but we can’t see clearly how important age is in the census distribution in Galicia.

In this second graph we can see the percentage of voters in each province by age, so we can more clearly see the weight of age in each of the four provinces. Lugo and Ourense in the east are the more aged provinces, both because of demographic trends and because of young people leaving for the richer provinces of Pontevedra and Coruña, other parts of Spain and abroad.

Relative data, as percentages, allow us to compare provinces and see the weight of each age group more clearly than when working with absolute numbers like the total number of voters.

Data visualization of electoral census in Galicia, Spain (I)

Galicia, an Autonomy inside Spain, is having elections on October 21st. The spanish National Statistics Institute has released some data about the electoral census of the Autonomy, specifically regarding its age and population living in foreign countries. Galicia has a strong migration history and almost 15% of the electoral census lives abroad, although only 3,68% will vote, due to electoral laws restricting and difficulting the voting rights of a collective that registered participation levels above 30% in the past.

In the map above we can see the percentage of electoral census living abroad by council. The data ranges from 0.88% in Burela, to almost 55% in Avión, Bande and Gomesende.

Digital Currency Timeline

I’m currently writing an article on digital currencies for a future, small-run magazine edited by Crazy Little Things. To complement the article, and to try to learn some new data-journalism skills I decided to do also a timeline of the most relevant digital currencies for the past 20+ years. This is the result:

How it’s done

I used Inkscape to draw the SVG file: timeline, bars and text, and create the layout. Then I added basic interactivity by hand using a text-editor. The content is based on my own research for the article. I plan to include in further releases a csv with the source data used in the timeline so it’s easier for others to replicate it using other tools.

Regarding interactivity, right now you can uncover some contextual information hovering your mouse over certain years, and click on the names of the digital currencies to go to their website or get more information. I plan to add more contextual information on the currencies, explaining the type of currency and the reason it dissapeared, if needed.

Improve it

The project (just an SVG file) is hosted on Github. You can download it, fork it, open a new issue, send ideas or suggestions. The project is under a NC-BY-SA Creative Commons licence. This is my first time using Git and Github for a project like this, and I’ll share my experience in a separate post. I can tell you now that I’ll definitely keep using it.

I’m also open to criticism on the timeline content: Did I miss a critical digital currency project? Should I remove something from the timeline? Is any of the data wrong? I’m all ears.

Top 5 essential skills for a data journalist

New York Times’ Aron Pilhofer answer to that question on the NICAR-L mailing list:

My top five (in order of importance):

  1. Know that the most important part of data journalism is… journalism. Reporting. In other words, you know how to report a story, you understand how to treat data as a source. You know how to pick up a phone, and not just assume that everything you get in data form (especially government data) is complete and accurate.
  2. You have at least basic data skills — meaning, you know your way around a spreadsheet. You can figure out for yourself how to import data, and do something with it. You also understand the basics of data analysis: rates, ratios, sums, averages, medians, and how to use them.
  3. You have command of more advanced data analysis skills, such as GIS, basic statistics, advanced SQL, etc. You also may know some basic programming techniques (using the language of your choice… Python, Perl, Ruby. ILENE.. shoot, even .NET) to scrape the web, get and clean data.
  4. You can apply your basic programming techniques to the creation of data-driven news applications using off-the-shelf tools like Google maps, MapBox, Fusion Tables, etc. At this point, you are not running servers, or serving database-driven apps. But you are creatively using what is available to you to add to your reporting online. This is probably where you need to get on the Javascript train.
  5. You have some skills with a web framework (Django, Rails, Grails) in order to enhance your reporting online through data-driven applications that you create from scratch and host.

CartoDB workshop in Barcelona

Last Monday I attended a workshop about CartoDB, organized by Media140 (with whom I’ve also collaborated: 1 and 2) and presented by Sergio Álvarez.

CartoDB is a powerful and open source geospatial data management and visualization tool. It does everything Google Fusion Tables does, and more. If you’re comfortable with SQL queries and CSS (CartoDB uses Carto, a stylesheet language from Mapbox similar to CSS), you can get amazing results, including hexagonal density grids, or editable and interactive maps. They have more case examples in the gallery, and you can find many more examples online.

In the workshop we learned how to do the basic stuff: upload different kinds of data, visualize them, merge them and tinker a bit with the SQL queries and Carto stylings. We did two maps during the workshop: Spanish unemployment by provinces and life-expectancy rates by country:

I had a CartoDB account way before the workshop but I never got around to try it. I don’t know why, but I thought it was harder to use than Google Fusion Tables, so I was pleasantly surprised to discover how easy it was to work with it. Now I’m looking forward to see how can I use CartoDB in my projects.

Common sense: from apps to responsive design

When the iPad was unveiled a year and a half ago, it was received with enthusiasm by media companies, especially by their directive boards, as it provided two essential things for them:

  1. A closed, confortable and standardized environment to receive content. Like good old magazines, but with video and rich-media ads.
  2. An opportunity to charge for content again, by creating scarcity, taking advantage of the walled garden of the iTunes store and selling apps like they sold magazines in the past.

But this approach overlooked several important flaws:

  1. Apple, while the biggest player in the market (at least for now), is not the only one. Any effort would have to be repeated, and then maintained, to gain more potential audience for any other platform (Android, Blackberry…). It’s not escalable.
  2. There is a bottleneck at the distribution stage, and you’re at the mercy of Apple’ internal app approval policies.
  3. The company has to give Apple a 30% cut of their subscription sales through the app, and probably will not have access to their subscribers’ data.
  4. In september, almost 40 million iPads were sold worldwide since the tablet was introduced one year and a half before. Why would you limit to a potential audience of 40 million when there are hundreds of millions of other devices capable of internet access?
  5. What will happen when the iPad and iOS are surpassed by newer, better technologies? Change is unavoidable, and in a world of planned obsolescence, it doesn’t make much sense to tie yourself to a product that will be obsolete in a few years.

Lately, there has been a trend that seems to take a more thoughtful, long-term. sustainable approach called responsive design. The first to jump the boat was the Financial Times, with an HTML5 app that avoids the iTunes app store and lets users access the app directly through the browser. That was a good start, but it was still rooted on the idea of developing an specific product for just one platform, in this case, iOS.

Instead, Propublica made some changes in their site to allowed it to adapt to the screen size of the visitor’s device, whether smartphone or tablet of any size, and independently of the device’s operating system.

The redesign of the Boston Globe was an even more ambitious project. It’s probably the first news website fully redesigned under the responsive web design paradigm, which means it’ll adapt its layout to the characteristics of the device used to acces the site.

If journalism is not a product, it’s a process, a platform-agnostic approach that will deliver, with quality, consistence and coherence, the same news, reports, analysis and commentary, no matter what you use to read them, makes much more sense. It also allows the company to retain control and independence over their most important assets: their audience and how they access their content.

Over 2012 we’ll see more and more media companies sailing away from the siren chants of the iPad and getting the control of their own future back with HTML5 responsive design websites. Those who don’t will see their efforts scattered ineffectively accross a handful of platforms, draining precious resources away from meaningful innovation.

This article was previously published in the blog of the ESCACC Foundation (Espai Català de Cultura i Comunicació, in catalan) and in ElEConomista / CanalPDA (spanish).

Barcelona: income by neighborhood

Red hues: Below average
Green: Around average
Blue hues: Above average

I see a trend around the Diagonal. Urbanism influenced wealth distribution around the city, or the other way around? What has been the impact of gentrification (Olympic Games, Fórum, 22@) in Vila Olímpica, Diagonal Mar, and Parc i Llacuna del Poblenou?

This map is going to be one of the exercises I’ll teach at a data visualization workshop organized by Media140, on November 7th, at Vilaweb, in Barcelona.

Víctimas del terrorismo de ETA

Como todos sabéis a estas alturas, el grupo terrorista ETA ha declarado su abandono definitivo de las armas. En los 53 años que han pasado desde su fundación, ETA ha asesinado a 829 personas. No, a 858. No, no, 952. Espera un momento, ¿cuántas son las víctimas de ETA?

Ministerio del Interior 829
Fundación Víctimas del Terrorismo 828
Asociación Víctimas del Terrorismo 858
Colectivo Víctimas del Terrorismo en el País Vasco 952
El País 829
El Mundo 864
El Correo 858
Diario Vasco 829
ABC 857
La Voz de Galicia 829
El Periódico de Catalunya 829
La Vanguardia (1 / 2) 829 / 858
Wikipedia (ES) 839
Wikipedia (EN) 829

¿Cómo se explican estas diferencias? En el caso del Colectivo de Víctimas del Terrorismo del País Vasco la inclusión de las víctimas de los Comandos Autónomos Anticapitalistas como víctimas de ETA infla el número hasta llegar a 952. La esquizofrenia de La Vanguardia se debe a que una de las noticias es de la agencia EFE, que da por bueno el número de 858, mientras que la información propia se queda con 829. Este número es el reconocido oficialmente por el Ministerio de Interior y el Gobierno Vasco, mientras que la Asociación Víctimas del Terrorismo defiende el número de 858, probablemente incluyendo víctimas de incidentes o acciones terroristas no reconocidas por ETA. Los números de la Fundación Víctimas del Terrorismo y de ABC probablemente estén simplemente desactualizados respecto a sus fuentes, mientras que el origen del número de El Mundo es un misterio, puesto que no se acerca a ninguna de las otras fuentes.

Mención aparte merece La Información en este conjunto de gráficos, en el que cada uno tiene una cifra total de víctimas diferente.

Primera víctima mortal: ¿1960 o 1968?

La fecha del primer atentado mortal de ETA también es controvertida. Algunos medios dan por buena la atribución a ETA de la muerte del bebé de 2 años Begoña Urroz Ibarrola en un atentado con bombas incendiarias en la estación de tren de Amara en 1960, a pesar de que ETA nunca ha reivindicado este atentado y de que la mayoría de fuentes apuntan al DRIL (Directorio Revolucionario Ibérico de Liberación). Oficialmente, la primera víctima mortal de ETA es José Pardines Arcay, en 1968.

Innovation means collaboration for media companies

Media companies were never really innovative. They used to be quick to take advantage of technological developments in their content-distribution channels to enhance their content offering, like when printing presses allowed to reproduce pictures, and later, color. But these where not developed by the media companies. They could have sparked this innovation, their needs may have pushed for these developments to happen, but they were not theirs.

They didn’t innovate much in content production or presentation as well. Newspaper sections have remained virtually untouched for years, the same classification and categorization of information today as the newspapers that served society 100 years ago. When internet emerged in the 90’s, they used the same information architecture in the new channel. Ultimately, newsrooms are meant to produce, following a set of rules and processes, not to do research and development, and those kind of departments are rare in most but the biggest media companies.

With revenue streams getting thinner and management struggling to maintain media companies profitable (or reduce losses), even cutting newsroom resources, R&D is not a priority, if it ever was. Some of the most disruptive innovations in advertising, information architecture and content in the last years have not come from media companies:

  • Think of how Craigslist established a new standard for online classifieds, historically a business dominated by newspapers, and one of their main revenue streams.
  • How Google first, and Groupon later developed ways to put in touch local businesses with consumers online. Both of which are natural newspaper customers, although in different ends of the product chain.
  • How newspapers didn’t get the need for CRM and analytics and a better understanding and insight on their audiences until it was too late, and social networks appeared to provide advertisers with a profiled audience to develop targetted advertising programs.
  • How newspapers not only skipped the chance to bypass intermediaries, like distribution chains, when trying to sell digital subscriptions to consumers, but jumped in the wagon when Apple demanded a 30% cut of their subscription revenue to iPad apps.
  • And how that demonstrates that newspaper and media companies seem fixated on the idea of “channel” instead of focusing on creating platform-agnostic content.

Innovation happens in the fringes

But not everything is that bad. We have had our share of innovation in media in the last 10 years. It just hasn’t come from media companies, but from the fringes of the media ecosystem, or even outside of it.

  • Storify, a tool to organize and create a narrative around curated content from social networks, is co-founded by a journalist.
  • Google Living Stories was developed by Google in partnership with the New York Times and The Washington Post. It’s a way to organize news content around an ongoing issue in a way that makes it easy to understand the timeline of events, as well as the access to all the related content.
  • One of the first mashups, ChicagoCrime.org, which had a clear impact in the current interest and attention on data visualization, including its adoption in newspapers and media, was created single-handedly by one journalist, Adrian Holovaty. His later company, Everyblock, which geolocates content from several sources in several of the major US cities, was acquired by MSNBC.

Collaboration and partnerships

If we can learn something from these stories, is that for media companies innovation can, and will happen thanks to collaboration and partnership. In this process, institutions like the Knight Foundation have been critical, as it has provided funding to projects that otherwise may have not received it, and pushed an spirit of sharing and collaboration, demanding, for example, that software developed using its grant is released under the GPL license, and all the other material under Creative Commons licenses. A perfect example would be one of their funded projects, DocumentCloud, also a partnership between ProPublica and the New York Times, is an open source tool to share, analyze and annotate source documents, and is currently used by more than 200 newsrooms in the US.

Hacks / Hackers, an informal and loose network of meetups of journalists and programmers, which recently has seen the birth of a new chapter in Madrid,  is another example of collaboration, very focused on software development for newsrooms. One of the most interesting projects lately is PDFSpy, which allows to monitor changes to a large set of hosted PDF files, like the ones released by Spanish congressmen and senators. That way, a journalist would receive an alert if something is added, or removed from any of the 614 pdf documents.

In Spain we have meetups (Café y Periodismo, the upcoming Hacks / Hackers), research groups (1001 medios) and hybrids of both (BCNMediaLab, which I co-founded). These are outlets for journalists to meet, debate and share ideas and projects, but I feel we’re going to need to take these initiatives (or new ones) a step further in order to generate the collaboration and contribute to the innovation our industry needs.

This post was first published in catalan in the blog of the ESCACC Foundation (Espai Català de Cultura i Comunicació).

QR codes could bridge the gap between print and online journalism

Wikipedia has just announced a project to link physical locations with their online wikipedia articles. Using QR codes in stickers, a passer-by can access from her smartphone the wikipedia article in her language relevant to the location or reference of the sticker: a landmark, a painting in a museum,  a statue…

QR codes transform the physical world into a digital interface, linking atoms to bits just using a smartphone. As such, they could be used with success in printed media to cross the gap between publishing time and real time, between the limitations of print and the possibilities of online rich media. They require an smartphone, able to use a QR code recognition app and internet access to retrieve the linked content. That means the use of QR codes could be limited to US and Western Europe, but that will change over the next few years. These are a few ideas that could enrich the experience of reading a print newspaper.

In advertising, QR codes could change the print business model from selling audiences to advertisers to include additional cost per click (CPC) and cost per action (CPA) models:

  • Track advertising actions from consumers, so the newspaper could demonstrate a direct, proven link between advertising and sales. A QR code could be a link to an online store, but also to a coupon to be redeemed in a brick-and-mortar store.
  • Generate extra revenue through affiliate sales: from advertising, but also from shows and movie listings, product reviews…

Embedded in editorial content, the uses of QR codes can get more creative:

  • Provide a link to an online updated version of the print article. This would avoid the feeling of old-news when covering fast-unfolding events. Think of a citizen revolt that may have already succeed or maybe been repressed by the time the newspaper hit the stands. This would also relieve pressure on print to compete with other channels and allow it to focus on long, in-depth stories that provide context.
  • Provide a link to an archive or series of articles around the same topic. Useful for infrequent readers if the article is one in a series.
  • Or maybe you’d want to let users download an ebook or pdf of that coverage. Add Paypal or other easy-to-use payment system and you have another revenue stream that takes advantage of already produced content.
  • When there is a high profile event, the limits of print mean that only a small part of the work of the photography staff can be featured in the printed newspaper. Those limits don’t exist online, where picture galleries are common. You can provide a link to the online full picture gallery from the printed newspaper. Or to (curated, of course) user-generated content around that particular event.
  • Sometimes, graphic support can come in the form of an infographic, or static data visualization, like when presenting election results. The next step would be to provide a link for the online, interactive version of that data visualization.
  • The same idea could be applied when rich-media could provide useful context to the printed story, like footage about a riot, a sport event, or access to the full video interview with a high-profile individual.
  • It’s quite common, at least in Spain, to read (or, more likely, browse) the newspaper while in a coffee or a bar. If you see an interesting article, a QR code can be used to bookmark the online version of the story for later read. Most likely this would be a sort of partnership with services like Delicious, Reading, Instapaper or similar.
  • QR codes can also be used to collect feedback or content from print readers.

Also, copying wikipedia’s idea would be a great marketing campaign for a newspaper: put ads for your paper with QR codes specific for that location, linking to coverage of that location. Or ads emphasizing an important topic (sports, education, politics…) with a QR code that links to online coverage of that topic.

These advertising and editorial uses of QR codes have an important consequence: they generate data that can be processed to obtain usage patterns and other insights, which can be quite valuable for advertising and content production.

The idea of using QR codes in print newspapers has been around for over more than two years, but so far, the main adopters of QR codes have been advertisers and print magazines. They have been using it to provide extra content to promotions, or enhanced magazine covers. In one of such cases, Esquire used codes similar to QR to provide enhanced reality content for its cover and a few pages of content, but, besides being a one-time effort, it’s not really the same as linking to online content and services.

There’s a warning, though. QR codes are just an implementation of an idea: the link between the real world and relevant information about it online. It’s a technology with a great danger of being rendered obsolete by a better implementation. Think of Google Goggles, for example, which uses text and image recognition technology to do Google searches or text translations from a picture taken by your smartphone. As these implementations evolve, a more seamless link between atoms and bits could appear, a better interface for integrating print and online content, but the idea would remain the same.