My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Introduction  

RuralCafe is a platform for web search and browsing over extremely slow or intermittent networks.

RuralCafe improves web search and browsing this by providing an expanded search query interface which allows a user to specify additional query terms to maximize the utility of the results returned by a search query. Given knowledge of the limited available network resources, RuralCafe performs optimizations to prefetch pages to best satisfy a search query based on a user.s search preferences. In addition, RuralCafe does not require modifications to the web browser, and can provide single round search results tailored to various types of networks and economic constraints.

RuralCafe consists of two main components the local proxy and remote proxy. Each proxy is responsible for a different set of web optimizations. The local proxy is meant to be deployed at the local area network gateway to perform caching, enable local search, and manage client requests. The remote proxy resides on a separate machine on the other end of the slow network link, and is presumably well-connected to the Internet. The remote proxy coordinates with the local proxy by accepting requests, and attempts to prefetch useful pages and filter unwanted content on behalf of the local proxy. Together, the two proxies coordinate across the slow link to maximize its overall effectiveness.

The web optimizations implemented at the proxies may be broadly configured for either low bandwidth, high latency, and/or intermittent connections. I.e. A low bandwidth connection setting would cause the prefetching algorithm to prefetch pages more conservatively or stop prefetching altogether. We are in the process of implementing an automatic detection of the link characteristics to reduce the level of technical knowledge required for installing RuralCafe.

System Requirements

RuralCafe is designed to be deployable in three basic configurations depending on the situation: The first is the configuration described above with both a local proxy on the LAN gateway machine and a remote proxy on a well-connected machine. RuralCafe's optimizations are fully available in this configuration and details may be found in the technical paper here. Having access to the LAN gateway and a remote machine is not always possible. In the case where no remote machine is available, the remote proxy may be deployed on the same machine as the local proxy at the gateway. In this case, filtering is no longer beneficial, but the benefits of prefetching would still be intact. If access to even a LAN gateway is not available (i.e. in the case of a user with a single machine) RuralCafe may be installed completely on one client machine. With this setup RuralCafe is still able to manage requests and perform prefetching and local search, but the quality may be reduced to the the lack of aggregate browsing statistics and a shared cache. Each of the machines where the RuralCafe software will be running requires Windows XP or above, and .NET Framework 3.5 installed. Optionally, Visual Studio 2008+ is required for debugging and modifying the source code.

Installation

Install .NET 3.5.

  • RuralCafe needs to be configured via the configuration file in "RuralCafe/config.txt".
  • To use RuralCafe as a disconnected CIP, set the "DEFAULT_SEARCH_PAGE=cip.html" in RuralCafe/config.txt
  • After the configuration is completed start RuralCafe by executing "RuralCafe/bin/Debug/RuralCafe.exe".

To interact with RuralCafe, the browser must be configured to use RuralCafe as the proxy using the IP address and port of the local proxy set in the configuration (below). These settings can be found in the connection settings of any modern browser (IE, Firefox, Chrome). Also, the browser's homepage set to "http://www.ruralcafe.net/". Note that for deployments where RuralCafe is running the local and remote components on seperate machines, steps 1-3 must be followed for the remote proxy machine as well.

Configuration

The configuration fields are fairly straightforward and are defaulted to work as a standalone service on a single machine (as in this example). In the case where the local and remote proxies are setup on seperate machines, the config.txt should be identical on both machines.

The local proxy settings are where RuralCafe listens to requests from the browser, these are the IP address and port settings that the browser should be configured with to use RuralCafe.

  • LOCAL_PROXY_IP_ADDRESS=127.0.0.1
  • LOCAL_PROXY_LISTEN_PORT=8080

The remote proxy settings are where RuralCafe's remote proxy listens to requests from the local proxy. These settings are also used by the local proxy to know where to forward requests to the remote proxy.

  • REMOTE_PROXY_IP_ADDRESS=127.0.0.1
  • REMOTE_PROXY_LISTEN_PORT=8081

The external proxy settings are used by the remote proxy to connect to the Internet if there is another upstream proxy in the network.

  • EXTERNAL_PROXY_IP_ADDRESS=
  • EXTERNAL_PROXY_LISTEN_PORT=
  • EXTERNAL_PROXY_LOGIN=
  • EXTERNAL_PROXY_PASS=

The following settings are for setting RuralCafe behaviors. The DEFAULT_SEARCH_PAGE may be changed to "cip.html" if RuralCafe is to be used in a completely disconnected fashion as an information portal. The default quota is how many bytes of data to download per query by the remote proxy. The default depth is how much prefetching depth to download per query by the remote proxy. The maximum download speed is used to throttle the transfer between the local and remote proxies if the bandwidth is constrained and some moderation is necessary.

  • DEFAULT_SEARCH_PAGE=searchpage.html
  • DEFAULT_QUOTA=2000000
  • DEFAULT_DEPTH=1
  • MAXIMUM_DOWNLOAD_SPEED=5000000

The INDEX and cache path configuration settings are to let RuralCafe know where to store the caches and indices used by the proxies for local caching and search. In the case of a completely disconnected RuralCafe information portal, these settings must be changed to point to existing sources (i.e. the output from a focused crawler). Relative paths (to RuralCafe.exe) are acceptable.

  • INDEX_PATH=c:\cygwin\home\jchen\index-mathematics\
  • LOCAL_CACHE_PATH=c:\cygwin\home\jchen\files-mathematics\
  • REMOTE_CACHE_PATH=Cache\

Finally, if a local copy of Wikipedia is available, the WIKI_DUMP_FILE may be set to point to it. Note that to be able to search through this wikipedia image, it must first be indexed by bzReader (below). Otherwise, the copy of Wikipedia may be accessed via direct URL requests, but not found during search requests.

  • WIKI_DUMP_FILE=d:\wikipedia\enwiki-20090520-pages-articles.xml.bz2

Wikipedia Indexing

Prior to being able to search through a wikipedia image dump from wikipedia.org, the image dump must be first indexed. To do this, run bzReader and enter the correct information for the location of the image dump.

Bugs

Please help us by sending in bug reports along with your operating system, config.txt, a description of the physical deployment scenario, and a description of the bug.


Sign in to add a comment
Powered by Google Project Hosting