My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
UsingPythonExamples  
Demonstrates the usage of code in the python-examples directory.
Phase-Support
Updated Nov 21, 2011 by evan.sparks@gmail.com

Introduction

Using the example code in the python-examples directory is straightforward and will allow you to pull both aggregate and entity style queries from the Recorded Future API. You'll need Python 2.6 (or greater) installed in order to use these scripts. You will also need a Recorded Future API token. To obtain access to a token, please e-mail sales@recordedfuture.com.

Instructions

Get the code

The easiest way to get the code is to save the following files to the same directory:

You can also browse for it and view the files online for now if you'd prefer.

Alternatively, if you have Subversion installed, you can check out the code as follows:

svn checkout http://recordedfuture.googlecode.com/svn/trunk/python-examples/ recordedfuture-read-only 

A new directory called "recordedfuture-read-only" will be created in your current working directory and the python files will be included in it.

In any case, you'll want to fire up a terminal (cmd.exe on Windows, Terminal on Mac, or Xterm/gnome-terminal on Linux) and cd to the directory that houses these four files for the remainder of this tutorial.

Run the code

Running the code is simple when you've got Python 2.6+ installed. We'll go over the two types of queries separately.

Instance Queries

An entity style query pulls information about any occurrences of an entity from our database, subject to the constraints of the query itself. If, for instance, you only want to see occurrences published in a particular date range, you will set that in the query. By default, our program entquery.py is set up to pull all occurrences of a list of entities (identified by market ticker) over a user-specified date range. The list of entities is provided in a file. The file should contain tickers, one per line, for which the user wants to see entity occurrences. A sample file tickerfile.txt is provided in the python-examples directory.

To run the example entity style query you'll do the following:

python company-entquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > entoutputfile.txt

MYTOKEN will be substituted with your API Token. The fifth and sixth arguments above are the date range over which you want to run the query. Query ouput will be placed in the file entoutputfile.txt and you should see something like this in your terminal:

Running ticker C
Running ticker GOOG
Running ticker MSFT
Running ticker YHOO
Running ticker CHK
Running ticker XOM
Running ticker GE
Running ticker INTC

Aggregate Queries

Aggregate queries are similar to entity queries. The main difference they provide is that they aggregate all occurrences on a particular date for a particular ticker and provide total counts of the types of events you query for, as well as average momentum and sentiment metrics for those instances on those days.

Running is similar to the entity query, but the output will look slightly different. To run:

python company-aggquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > aggrawoutputfile.txt

This time you'll see no output in your terminal.

Examine the output

Entity Query Output

Some lines of the entity query output should look something like this:

ticker	id	document.published	document.source.name	start	stop	type	momentum	positive	negative
AMZN	52200795	"2009-01-05T13:15:04.000Z	"seeking_alpha_market_currents	"2009-01-05T13:15:04.000Z	"2009-01-05T13:15:04.000Z	"EntityOccurrence	0.0645560727197	0.5142857	0.0
AMZN	37057385	"2009-01-03T06:33:00.000Z	"engadget	"2009-01-03T06:33:00.000Z	"2009-01-03T06:33:00.000Z	"CompanyProduct	0.173733129615	0.0	0.0
AMZN	52200797	"2009-01-05T13:15:04.000Z	"seeking_alpha_market_currents	"2009-01-05T13:15:04.000Z	"2009-01-05T13:15:04.000Z	"AnalystRecommendation	0.384920634921	0.5142857	0.0
AMZN	37772575	"2009-01-05T13:36:22.000Z	"24_7	"2009-01-05T13:36:22.000Z	"2009-01-05T13:36:22.000Z	"CompanyTicker	0.111111111111	0.0	0.0
AMZN	48943642	"2009-01-05T14:00:00.000Z	"pr_newswire	"2009-01-05T14:00:00.000Z	"2009-01-05T14:00:00.000Z	"CompanyProduct	0.071276371308	0.0	0.0

....

This file is a tab-delimited file, with column headings at the top. It should be pretty straightforward to open this in Excel or read it in with R's read.delim() command. The output columns are the ticker, the document ID, the publish date for the document, the document source, the time period that the document references, the momentum of the entity and the sentiment behind the entity. Not all fields will always be published.

Aggregate Query Output

Some aggregate results will look something like this:

Ticker,Entity,Time,Count,Momentum,Positive,Negative
MSFT,33312449,2011-11-01 19:30:00,780,0.43689,0.062,0.00461
GOOG,33321272,2011-11-01 19:30:00,1707,0.72436,0.07052,0.0254
AMZN,33328212,2011-11-01 19:30:00,344,0.20139,0.05491,0.01374
CHK,33511577,2011-11-01 19:30:00,6,0.00817,0,0
MSFT,33312449,2011-11-02 19:30:00,1235,0.4538,0.04981,0.0137
GOOG,33321272,2011-11-02 19:30:00,2602,0.80317,0.06482,0.02282
AMZN,33328212,2011-11-02 19:30:00,619,0.22222,0.06884,0.00787
CHK,33511577,2011-11-02 19:30:00,45,0.02334,0,0.02581
...

The columns here are the ticker, date for the aggregate, count of articles referencing the entity associated with the ticker, and the average momentum and sentiment associated with the ticker on that date. In this case, results are in CSV format, again easily ingestible by your favorite statistics software.

Modify the code

Users will likely want to try running their own queries, and the neat thing about the JSON query API is how flexible it is. You can query the data in the Recorded Future database from all sorts of directions. Changing dates and tickers really just scratches the surface.

Before modifying these queries, I recommend reading our API documentation. There you will find what type of data you can ask for and what the various metadata fields in the results mean.

To change output columns in the code, currently you need to change two lines in entquery.py. First you'll change the line

outfields = ["id","time","source.name", "document.published","type", "momentum", "sentiment"]

to match the output fields you decide on after reading the API documentation.

Next, you'll need to change the line:

outorder = ["id","document.published", "document.source.name", "start","stop","type", "momentum", "positive", "negative"]

to match the output order that matches what you'd like in your output file. Planned enhancements to the helper functions in recfut.py should eliminate the need to manually set these in multiple places, but for now this should do the trick.


Sign in to add a comment
Powered by Google Project Hosting