Spanish Congress and Senate members released four days ago statements about their owned property and assets, as well as any other employment or line of work besides public service. But there was a catch. They released it in single PDF files, one per each Senator and Congressman. Quickly, developer David Cabo, member of the ProBonoPublico collective and a key member of the data visualization community in Spain, created a hashtag (#adoptaunsenador) and opened a Google Docs spreadsheet to crowdsource the data extraction process from the individual PDF files into a single structured file.
The process was completed after just 4 days, but the experience was not entirely satisfactory:
- The spreadsheet was too slow to be used with more than 50 people editing simultaneously.
- Anonymous editing was allowed from the beginning, but it was difficult to cope with erased / lost data. Because of simultaneous editing, recovering an earlier instance of the spreadsheet meant losing data as well from more recent edits.
- At some point, spam finally came and anonymous editing was closed. The amount of people working on the spreadsheet dropped to 12-15.
- Anonymous edits were also the laziest: unfinished sentences and skipped sections.
- Data columns for currencies or dates need to be formatted before the spreadsheet is made public to avoid confusion and different formattings by the contributors.
- An editor is needed to overview the process, secure that there is a standard in the transcription process and ensure that there’s no missing data.
There’s another crowdsourcing process going on with the objective to put together all the properties and assets data from members of the Congress: spreadsheet and hashtag.