|
UsingPythonExamples
Demonstrates the usage of code in the python-examples directory.
Phase-Support IntroductionUsing the example code in the python-examples directory is straightforward and will allow you to pull both aggregate and entity style queries from the Recorded Future API. You'll need Python 2.6 (or greater) installed in order to use these scripts. You will also need a Recorded Future API token. To obtain access to a token, please e-mail sales@recordedfuture.com. InstructionsGet the codeThe easiest way to get the code is to save the following files to the same directory: You can also browse for it and view the files online for now if you'd prefer. Alternatively, if you have Subversion installed, you can check out the code as follows: svn checkout http://recordedfuture.googlecode.com/svn/trunk/python-examples/ recordedfuture-read-only A new directory called "recordedfuture-read-only" will be created in your current working directory and the python files will be included in it. In any case, you'll want to fire up a terminal (cmd.exe on Windows, Terminal on Mac, or Xterm/gnome-terminal on Linux) and cd to the directory that houses these four files for the remainder of this tutorial. Run the codeRunning the code is simple when you've got Python 2.6+ installed. We'll go over the two types of queries separately. Instance QueriesAn entity style query pulls information about any occurrences of an entity from our database, subject to the constraints of the query itself. If, for instance, you only want to see occurrences published in a particular date range, you will set that in the query. By default, our program entquery.py is set up to pull all occurrences of a list of entities (identified by market ticker) over a user-specified date range. The list of entities is provided in a file. The file should contain tickers, one per line, for which the user wants to see entity occurrences. A sample file tickerfile.txt is provided in the python-examples directory. To run the example entity style query you'll do the following: python company-entquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > entoutputfile.txt MYTOKEN will be substituted with your API Token. The fifth and sixth arguments above are the date range over which you want to run the query. Query ouput will be placed in the file entoutputfile.txt and you should see something like this in your terminal: Running ticker C Running ticker GOOG Running ticker MSFT Running ticker YHOO Running ticker CHK Running ticker XOM Running ticker GE Running ticker INTC Aggregate QueriesAggregate queries are similar to entity queries. The main difference they provide is that they aggregate all occurrences on a particular date for a particular ticker and provide total counts of the types of events you query for, as well as average momentum and sentiment metrics for those instances on those days. Running is similar to the entity query, but the output will look slightly different. To run: python company-aggquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > aggrawoutputfile.txt This time you'll see no output in your terminal. Examine the outputEntity Query OutputSome lines of the entity query output should look something like this: ticker id document.published document.source.name start stop type momentum positive negative AMZN 52200795 "2009-01-05T13:15:04.000Z "seeking_alpha_market_currents "2009-01-05T13:15:04.000Z "2009-01-05T13:15:04.000Z "EntityOccurrence 0.0645560727197 0.5142857 0.0 AMZN 37057385 "2009-01-03T06:33:00.000Z "engadget "2009-01-03T06:33:00.000Z "2009-01-03T06:33:00.000Z "CompanyProduct 0.173733129615 0.0 0.0 AMZN 52200797 "2009-01-05T13:15:04.000Z "seeking_alpha_market_currents "2009-01-05T13:15:04.000Z "2009-01-05T13:15:04.000Z "AnalystRecommendation 0.384920634921 0.5142857 0.0 AMZN 37772575 "2009-01-05T13:36:22.000Z "24_7 "2009-01-05T13:36:22.000Z "2009-01-05T13:36:22.000Z "CompanyTicker 0.111111111111 0.0 0.0 AMZN 48943642 "2009-01-05T14:00:00.000Z "pr_newswire "2009-01-05T14:00:00.000Z "2009-01-05T14:00:00.000Z "CompanyProduct 0.071276371308 0.0 0.0 .... This file is a tab-delimited file, with column headings at the top. It should be pretty straightforward to open this in Excel or read it in with R's read.delim() command. The output columns are the ticker, the document ID, the publish date for the document, the document source, the time period that the document references, the momentum of the entity and the sentiment behind the entity. Not all fields will always be published. Aggregate Query OutputSome aggregate results will look something like this: Ticker,Entity,Time,Count,Momentum,Positive,Negative MSFT,33312449,2011-11-01 19:30:00,780,0.43689,0.062,0.00461 GOOG,33321272,2011-11-01 19:30:00,1707,0.72436,0.07052,0.0254 AMZN,33328212,2011-11-01 19:30:00,344,0.20139,0.05491,0.01374 CHK,33511577,2011-11-01 19:30:00,6,0.00817,0,0 MSFT,33312449,2011-11-02 19:30:00,1235,0.4538,0.04981,0.0137 GOOG,33321272,2011-11-02 19:30:00,2602,0.80317,0.06482,0.02282 AMZN,33328212,2011-11-02 19:30:00,619,0.22222,0.06884,0.00787 CHK,33511577,2011-11-02 19:30:00,45,0.02334,0,0.02581 ... The columns here are the ticker, date for the aggregate, count of articles referencing the entity associated with the ticker, and the average momentum and sentiment associated with the ticker on that date. In this case, results are in CSV format, again easily ingestible by your favorite statistics software. Modify the codeUsers will likely want to try running their own queries, and the neat thing about the JSON query API is how flexible it is. You can query the data in the Recorded Future database from all sorts of directions. Changing dates and tickers really just scratches the surface. Before modifying these queries, I recommend reading our API documentation. There you will find what type of data you can ask for and what the various metadata fields in the results mean. To change output columns in the code, currently you need to change two lines in entquery.py. First you'll change the line outfields = ["id","time","source.name", "document.published","type", "momentum", "sentiment"] to match the output fields you decide on after reading the API documentation. Next, you'll need to change the line: outorder = ["id","document.published", "document.source.name", "start","stop","type", "momentum", "positive", "negative"] to match the output order that matches what you'd like in your output file. Planned enhancements to the helper functions in recfut.py should eliminate the need to manually set these in multiple places, but for now this should do the trick. |