PackRat

by Chris Murphy, Andrew Correa, and Robin Stewart

PackRat is a tool for collecting and browsing Microformats found on the web. It acts as a structured bookmarking tool which records not just the page title and url, but all of the structured data found on the page. For example, marked up contact information (e.g. name, address, phone number) or citation information (author, journal, year) can be stored and used later to look up the corresponding web pages or to use the information as is.

What's a Microformat?

Microformats provide a simple, standardized way to identify common types of data in an html document. For example, contact information can be identified using the hCard microformat, which is based on the common vCard standard used by most address book software, but is adapted to be integrated into normal html. Microformat elements are specified by keywords in the "class" tag of regular nested html elements such as <a>, <div>, and <span>. Read more...

How does PackRat work?

PackRat is a "user script" that extends Operator, a Firefox extension that can extract Microformats from web pages. PackRat is installed into Operator as a single javascript file.

We expect that users will accumulate a large number of microformats over time as they "stash" web pages, so we designed PackRat with scalability in mind. We use a sqlite database to store the microformats.

Demos

For hCards, go to whitepages.com.

for citations, go to Sam Madden's publications.

The source is available from SVN.