My favorites | Sign in
Project Logo
                
Search
for
Updated Mar 27, 2009 by sebast...@alombra.de
Labels: Featured
FAQ  
frequently asked questions
  • Why is this piece of software called dine?
Actually it doesn't have anything to do with eating ;) It is inspired from the name of the Navajo language which was used in World War II as a way of secret communication. That language seems to be so cryptic and hard to understand for outsiders, that no one could find out what the code talkers were speaking about. It sometimes seems to me that making a piece of code understand information found on the internet is an equally hard task, so I chose this name.
  • Why do I have to write a step for each website I want to parse, couldn't I parse more than one in a step?
dine executes your steps concurrently and the best way to parallelize execution is to use a single thread for each website to parse.
In fact a step is an implementation of the command pattern.
  • Why should steps ONLY consist of methods and NEVER save information in instance variables?
Dine compiles your steps only once at startup and uses a singleton instance everytime a step is executed. That means that steps need to be stateless!
This architecture allows a very low memory footprint of dine.
  • What XML parser is the JavaScript code using?
It uses E4X, check wikipedia and the in-depth tutorial from Mozilla to learn more.
  • Why is there no XPath support?
I think that E4X is much more intuitive and easier to use than XPath, so the dine API currently only offers support for E4X.
Maybe there will also be an XPath API in the future, because a lot of people are familiar with it and it is much more powerful than E4X, yet harder to handle. If you're interested in implementing this, feel free to join this project!
  • Does it support POST or HTTP AUTH ?
It supports POST, you have to add a function called getMethod() in your step, that returns the string "POST".
You can then implement another method call getPostParams() that should return a map of the parameters used for the post.
HTTP Auth is currently not supported, but could be added with very small effort, as we use commons-httpclient inside.
Drop me a mail to spam at alombra.de, if you need this functionality, maybe I will sit down for some hours and add it.


Sign in to add a comment
Hosted by Google Code