Skip to main content

Jeremy Singer-Vine’s Data Liberation Project

Not to be confused with Canada’s Data Liberation Initiative, Jeremy Singer-Vine is spending his time on the Data Liberation Project, “an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.” There’s not yet a lot to look at there, but there’s plenty in the pipeline.

I just attended a webinar he gave in conjunction with MuckRock and DocumentCloud (I think I’d missed the memo or forgotten that MuckRock and DocumentCloud had merged in 2018) You can watch a recording of it here. Jeremy described how he built an add-on for DocumentCloud that allows him to subscribe to an email distribution list, turn the content of the emails that are received to RSS, and then parse the RSS looking for PDFs which are then automatically loaded to DocumentCloud, where the PDFs become searchable. Pipelines!

Unfortunately, at this time Jeremy’s pretty much focussed on US data sources, which is completely understandable for a one-man show, but I couldn’t help thinking about the recently-launched Canadian Investigative Journalism Foundation. Even if they don’t end up working together, there are definitely some tools available in Jeremy’s project that could be of use to IJF, and indeed anyone who’s interested in surfacing data that should be public.

What’s that, you say, you don’t already know about DocumentCloud? That’s probably because it’s geared towards journalists, but I think I got in when I was working on a project I thought it would help with. Your mileage may vary for gaining full access as a non-journalist, but if you’ve got a good use-case, your odds are probably good. It’s pretty frickin’ cool, and I’ve just made a mental note to spend some more time with it 🙂

Source of Article

Similar posts