NEWS FLASHES
19 Nov 2009: Added support for gzip encoding in HTTP GETs. This will speed up many types of spidering activities by 10-20%. (v0.90.07.04) No changes were made to the User Guide.
29 Oct 2009: The official 0.90 beta has now begun. See files to download on the right side of this page. The User Guide is available as a separate download, but is also included in the install zip. Unzip the install zip and view the readme file and User Guide for instructions on how to install.
Mining the Web for Social Networking Data
When studying aspects of the World Wide Web using network analysis tools and techniques, building networks for analysis can be tedious, time-consuming, and error-prone. The UrlNet library, written in the Python scripting language, is intended to provide a powerful, flexible, easy to use, "spider"-like mechanism for generating such networks.
UrlNet was originally written with to aid in the analysis of search engine result set quality. This relied on the time-tested method of harvesting links from static or dynamically generated Web pages. Many of its features stem from the demands of this research domain, but have much wider applicability.
A Tool for the Semantic Web
UrlNet's usefulness is not limited to spidering and link harvesting. Because many data sources focused on a plethora of subjects are also available via Web Services over the Internet, UrlNet can be employed to build networks for many different topic domains. For example, it has been used to look at blogger conversations and biomedical researcher co-citation networks.
Want to Learn More?
- See the Wiki for more information. The UrlNet Google Group is a good place to ask questions, make suggestions, and see what people are doing with UrlNet.
- Visit CurrentKnownBugsAndWorkarounds to see a current list of known bugs and workarounds. This is also a good place to put enhancement requests.