|
Project Information
Members
|
This dataset has three sub-datasets: Contact, Pizza, Hotel For each sub-dataset: The file named "url" contains a list of all downloaded URLs and their assigned IDs, also indicates whether the saved file tagged correctly by the rule based extractor. format: <id> <0/1(tagged correctly/not)> <total number of addresses extracted correctly> <total number of address extracted> <total number of addresses in the web page> <url> The path named "original" contains all web pages collected without any tagging. The path named "tagged" contains all tagged web pages. file name format: f0000id |