My favorites | Sign in
Project Logo
                
Search
for
Updated Sep 02, 2008 by edannin
SplunkPythonSDK  

Embedded Python SDK

You can find the most current version of this page here.

The Splunk product ships with an embedded Python based SDK, which can be used for development work. The internal SDK is also used by the web application framework inside of the splunkd process.

Note: There is a bug in the embedded Python SDK which ships with the 3.2.x versions of Splunk. If you aren't running Splunk 3.3 or higher, it is recommend you download the latest version. If you can't upgrade to 3.3, you can download a patch file and replace the offending code yourself.

To use the internal Python SDK from the command line, enter the following at your command prompt:

source $SPLUNK_HOME/bin/setSplunkEnv

Now start up Python and try getting an auth key.

root@ulysses [~]# source /opt/splunk/bin/setSplunkEnv 
root@ulysses [~]# python 
Python 2.5.1 (r251:54863, Apr  4 2008, 00:16:06) 
[GCC 4.0.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from splunk import auth, search
>>> import time
>>> 
>>> auth.getSessionKey('admin', 'changeme')
'43d7ea46ff602238ca5d1de56e17f692'
>>> 

Here's an example that gets a session key, then performs a search for events from the last minute. The search is performed synchronously, so your code will block until Splunk is done returning results. Stick this code in something like example.py:

from splunk import auth, search
import time

auth.getSessionKey('admin','changeme')

job = search.dispatch('search * startminutesago=1')

# this will stream events back until the last event is reached
for event in job:
	print event
	
job.cancel()

Running this outputs the raw events from the last minute:

root@ulysses [~]# source /opt/splunk/bin/setSplunkEnv 
root@ulysses [~]# python example.py 
111.111.111.111 - - [17/Jun/2008:13:26:09 -0500] "GET http://photos.zoto.com/kordless/img/28/40aab3c632b6fc2215cc850545793c31.jpg HTTP/1.0" 200 19429 "http://splunk.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FDM; .NET CLR 2.0.50727; InfoPath.1)"
.
.
.

Limiting Output to Extracted Fields

This next example limits output to a particular field which was extracted at index time. In this example Splunk is extracting the 'clientip' field:

from splunk import auth, search
import time

auth.getSessionKey('admin','changeme')

job = search.dispatch('search * startminutesago=1')

# this will stream events back until the last event is reached
for event in job:
	print event['clientip']

job.cancel()

Running this code outputs only the IP addresses that were extracted:

root@ulysses [~]# python example.py 
111.111.111.111
222.222.222.222
.
.
.

Limiting Output to Fields Extracted at Search Time

If Splunk hansn't extracted a particular field, you can use the rex command to extract them at search time:

* startminutesago=1 | rex field=_raw "(?<imageid>[0-9a-f]{32})"

This search string assumes an MD5 exists in the event stream. Use your own regular expressions to extract a custom field from your own data.

You can test your rex extractions with the Splunk UI to ensure you are getting back the correct results before starting to code. To see the extracted field in the Splunk UI, you'll need to select extracted fields from the fields pulldown:

Be aware that the events object being used above returns un-transformed data. In this example, the rex command is a transforming command and requires using the results object type instead of events.

You'll need to wait on Splunk to finish the search before you get back these transformed results. Splunk provides a method for checking to see if a job is done or not, and we use it to hang out until the results are back and transformed:

from splunk import auth, search
import time

auth.getSessionKey('admin','changeme')

job = search.dispatch('search * startminutesago=1 | rex field=_raw "(?<imageid>[0-9a-f]{32})" | where imageid > ""')

# at this point, Splunk is running the search in the background; how long it
# takes depends on how much data is indexed, and the scope of the search

# wait until the job has completed before trying to access job
while not job.isDone: 
    time.sleep(1)

# this will iterate through the completed results - with transforms applied
for result in job.results:
	print result['imageid']

job.cancel()

Notice we use a where clause to filter out results that don't contain an extracted imageid field. We do this because some events may not provide a match to our regular expression!

root@ulysses [~]# python example.py 
2387d1e5d205d5d9e803e6535f66aacc
71d6cad91460f5f9873fb57c5ebcf446
2e63a453e5292da64292deb724a7bb9b
d518ed5fb7e21548b5efbe8f7d2c232b
ae0b6c614a69b579aae4a01ffc4a07ba
f8efa202eb3dbc8d50f298ee762d683c
f3422c85de5e843c0c43c91ebe89aac3
.
.
.

Sign in to add a comment
Hosted by Google Code