Data scraping: Using Google Docs to grab table data
Sometimes I like to point out that there are tools around that will let you accomplish tasks you’d normally have to script. Interestingly enough, Google Docs comes with a set of functions — importHTML, importData, importXML and importFeed — that will allow you to grab data on the Web and put it into your spreadsheet.
The OUseful.info blog has a good post on how you can use importHtml to pull in a table from Wikipedia. It even goes so far as to pump the resulting spreadsheet into Yahoo! Pipes to map the data. If you don’t know what Yahoo Pipes! is, it’s basically another service that allows you to manipulate data without writing code. I’ll be sure to cover it in a blog post coming soon.
In any case, being able to scrape text using Google Docs is a great way to get moving quickly on data-based projects.
Has anyone done a project using this method? I’m working on one right now at the Alligator. I’ll update this post when I get it up and running.
Filed under: Info | 1 Comment
Tags: data scraping, google docs, scraping