|
|
DiggStripper
JavaScript for stripping Digg pages using jQuery
Web scraping is a common practice employed by search engines, and other web utilities to scrape content from sites. Traditionally this is a server side functionality with extensive use of regular expressions, and string comparisons to perform various levels of pattern matching and content scraping.
DiggStripper is an experiment using Digg.com pages as an example to see how page scraping can be done via JavaScript, taking advantage of the access to the DOM tree to parse and scrape information from a page. jQuery is used to access and manipulate the DOM, and create a JSON object containing the scrapped information.
