|
docs
documentation
Shed Skin DocumentationVersion 0.9.1, January 15 2012, Mark Dufour and James Coughlan
IntroductionShed Skin is an experimental Python-to-C++ compiler designed to speed up the execution of computation-intensive Python programs. It converts programs written in a restricted subset of Python to C++. The C++ code can be compiled to executable code, which can be run either as a standalone program or as an extension module easily imported and used in a regular Python program. Shed Skin uses type inference techniques to determine the implicit types used in a Python program, in order to generate the explicit type declarations needed in a C++ version. Because C++ is statically typed, Shed Skin requires Python code to be written such that all variables are (implicitly!) statically typed. Besides the typing and subset restrictions, supported programs cannot freely use the Python standard library, although 25 common modules are supported, such as random and re (see Library Limitations). Additionally, the type inference techniques employed by Shed Skin currently do not scale very well beyond several thousand lines of code (the largest compiled program is about 4,000 lines (sloccount)). In all, this means that Shed Skin is currently mostly useful to compile smallish programs and extension modules, that do not make extensive use of dynamic Python features or the standard or external libraries. See here for a collection of more than 60 example programs. Because Shed Skin is still in an early stage of development, it can also improve a lot. At the moment, you will probably run into some bugs when using it. Please report these, so they can be fixed! At the moment, Shed Skin is compatible with Python versions 2.4 to 2.7, behaves like 2.6, and should work on Windows and most UNIX platforms, such as GNU/Linux and OSX. On UNIX platforms, GCC version 4.2 or higher is required to compile the resulting C++ code. Typing RestrictionsShed Skin translates pure, but implicitly statically typed, Python programs into C++. The static typing restriction means that variables can only ever have a single, static type. So, for example, a = 1 a = '1' # bad is not allowed. However, as in C++, types can be abstract, so that for example, a = A() a = B() # good where A and B have a common base class, is allowed. The typing restriction also means that the elements of some collection (list, set, etc.) cannot have different types (because their subtype must also be static). Thus: a = ['apple', 'b', 'c'] # good b = (1, 2, 3) # good c = [[10.3, -2.0], [1.5, 2.3], []] # good is allowed, but d = [1, 2.5, 'abc'] # bad e = [3, [1, 2]] # bad f = (0, 'abc', [1, 2, 3]) # bad is not allowed. Dictionary keys and values may be of different types: g = {'a': 1, 'b': 2, 'c': 3} # good
h = {'a': 1, 'b': 'hello', 'c': [1, 2, 3]} # badIn the current version of Shed Skin, mixed types are also permitted in tuples of length two: a = (1, [1]) # good In the future, mixed tuples up to a certain length will probably be allowed. None may only be mixed with non-scalar types (i.e., not with int, float, bool or complex): l = [1]
l = None # good
m = 1
m = None # bad
def fun(x = None): # bad: use a special value for x here, e.g. x = -1
pass
fun(1)Integers and floats can often be mixed, but it is better to avoid this where possible, as it may confuse Shed Skin: a = [1.0] a = [1] # wrong - use a float here, too Python Subset RestrictionsShed Skin will only ever support a subset of all Python features. The following common features are currently not supported:
Some other features are currently only partially supported:
self.class_attr # bad
SomeClass.class_attr # good
SomeClass.some_static_method() # good
var = lambda x, y: x+y # good
var = some_func # good
var = self.some_method # bad, method reference
[var] # bad, containedLibrary LimitationsAt the moment, the following 25 modules are largely supported. Several of these, such as os.path, were compiled to C++ using Shed Skin.
Note that any other module, such as pygame, pyqt or pickle, may be used in combination with a Shed Skin generated extension module. For examples of this, see the Shed Skin examples. See How to help out in Development on how to help improve or add to the set of supported modules. InstallationThere are four types of downloads available: a self-extracting Windows installer, a Debian (Ubuntu) package, an RPM package, and a UNIX tarball. WindowsTo install the Windows version, simply download and start it. (If you use ActivePython or some other non-standard Python distribution, or MingW, please deinstall this first.) Debian (Ubuntu)To install the Debian package, simply download and install it using your package manager. Make sure the following packages are installed (at least version 4.2 of g++): sudo apt-get install g++ libpcre3-dev libgc-dev python-dev RPM(Note that several RPM distributions, such as Fedora, contain Shed Skin in their package repositories.) To install the RPM package, simply download and install it using your package manager. Make sure the following packages are installed (at least version 4.2 of gcc-g++): sudo yum install gcc-c++ pcre-devel gc-devel python-devel UNIXTo install the UNIX tarball on a GNU/Linux or OSX system, take the following steps:
If the Boehm garbage collector is not available via your package manager, the following is known to work. Download for example version 7.2alpha6 from the website, unpack it, and install it as follows: ./configure --prefix=/usr/local --enable-threads=posix --enable-cplusplus --enable-thread-local-alloc --enable-large-config make make check sudo make install If the PCRE library is not available via your package manager, the following is known to work. Download for example version 8.12 from the website, unpack it, and build as follows: ./configure --prefix=/usr/local make sudo make install Compiling a Stand-Alone ProgramUnder Windows, first execute (double-click) the init.bat file in the directory where you installed Shed Skin. To compile the following simple test program, called test.py: print 'hello, world!' Type: shedskin test This will create two C++ files, called test.cpp and test.hpp, as well as a Makefile. To create an executable file, called test (or test.exe), type: make Generating an Extension ModuleTo compile the following program, called simple_module.py, as an extension module: # simple_module.py
def func1(x):
return x+1
def func2(n):
d = dict([(i, i*i) for i in range(n)])
return d
if __name__ == '__main__':
print func1(5)
print func2(10)Type: shedskin -e simple_module make For 'make' to succeed on a non-Windows system, make sure to have the Python development files installed (under Debian, install python-dev; under Fedora, install python-devel). Note that for type inference to be possible, the module must (indirectly) call its own functions. This is accomplished in the example by putting the function calls under the if __name__=='__main__' statement, so that they are not executed when the module is imported. Functions only have to be called indirectly, so if func2 calls func1, the call to func1 can be omitted. The extension module can now be simply imported and used as usual: >>> from simple_module import func1, func2
>>> func1(5)
6
>>> func2(10)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}LimitationsThere are some important differences between using the compiled extension module and the original.
Numpy IntegrationShed Skin does not currently come with direct support for Numpy. It is possible however to pass a Numpy array to a Shed Skin compiled extension module as a list, using its tolist method. Note that this is very inefficient (see above), so it is only useful if a relatively large amount of time is spent inside the extension module. Consider the following example: # simple_module2.py
def my_sum(a):
""" compute sum of elements in list of lists (matrix) """
h = len(a) # number of rows in matrix
w = len(a[0]) # number of columns
s = 0.0
for i in range(h):
for j in range(w):
s += a[i][j]
return s
if __name__ == '__main__':
print my_sum([[1.0, 2.0], [3.0, 4.0]]) After compiling this module as an extension module with Shed Skin, we can pass in a Numpy array as follows: >>> import numpy >>> import simple_module2 >>> a = numpy.array(([1.0, 2.0], [3.0, 4.0])) >>> simple_module2.my_sum(a.tolist()) 10.0 Distributing BinariesWindowsTo use a generated Windows binary on another system, or to start it without having to double-click init.bat, place the following files into the same directory as the binary: shedskin-0.9\shedskin\gc.dll shedskin-0.9\shedskin-libpcre-0.dll shedskin-0.9\bin\libgcc_s_dw-1.dll shedskin-0.9\bin\libstdc++.dll UNIXTo use a generated binary on another system, make sure libgc and libpcre3 are installed there. If they are not, and you cannot install them globally, you can place copies of these libraries into the same directory as the binary, using the following approach: $ ldd test libgc.so.1 => /usr/lib/libgc.so.1 libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 $ cp /usr/lib/libgc.so.1 . $ cp /lib/x86_64-linux-gnu/libpcre.so.3 . $ LD_LIBRARY_PATH=. ./test Note that both systems have to be 32- or 64-bit for this to work. If not, Shed Skin must be installed on the other system, to recompile the binary. Parallel ProcessingSuppose we have defined the following function in a file, called meuk.py: def part_sum(start, end):
""" calculate partial sum """
sum = 0
for x in xrange(start, end):
if x % 2 == 0:
sum -= 1.0 / x
else:
sum += 1.0 / x
return sum
if __name__ == ’__main__’:
part_sum(1, 10)To compile this into an extension module, type: shedskin -e meuk
makeTo use the generated extension module with the multiprocessing standard library module, simply add a pure-Python wrapper: from multiprocessing import Pool
def part_sum((start, end)):
import meuk
return meuk.part_sum(start, end)
pool = Pool(processes=2)
print sum(pool.map(part_sum, [(1,10000000), (10000001, 20000000)]))Calling C/C++ CodeTo call manually written C/C++ code, follow these steps:
#stuff.py
def more_primes(n, nr=10):
return [1]
#test.py import stuff print stuff.more_primes(100) shedskin test
Standard LibraryBy moving stuff.* to lib/, we have in fact added support for an arbitrary library module to Shed Skin. Other programs compiled by Shed Skin can now import stuff and use more_primes. In fact, in the lib/ directory, you can find type models and implementations for all supported modules. As you may notice, some have been partially converted to C++ using Shed Skin. Shed Skin TypesShed Skin reimplements the Python builtins with its own set of C++ classes. These have a similar interface to their Python counterparts, so they should be easy to use (provided you have some basic C++ knowledge.) See the class definitions in lib/builtin.hpp for details. If in doubt, convert some equivalent Python code to C++, and have a look at the result! Command-line OptionsThe shedskin command can be given the following options: -a --ann Output annotated source code (.ss.py)
-b --nobounds Disable bounds checking
-e --extmod Generate extension module
-f --flags Provide alternate Makefile flags
-l --long Use long long ("64-bit") integers
-m --makefile Specify alternate Makefile name
-n --silent Silent mode, only show warnings
-o --noassert Disable assert statements
-r --random Use fast random number generator (rand())
-s --strhash Use fast string hashing algorithm (murmur)
-w --nowrap Disable wrap-around checking
-x --traceback Print traceback for uncaught exceptions
-L --lib Add a library directoryFor example, to compile the file test.py as an extension module, type shedskin –e test or shedskin ––extmod test. Using -b or --nobounds is also very common, as it disables out-of-bounds exceptions (IndexError), which can have a large impact on performance. a = [1, 2, 3]
print a[5] # invalid index: out of boundsPerformance Tips and TricksPerformance Tips
To use Gprof2dot, download gprof2dot.py from the website, and install Graphviz. Then: shedskin program make program_prof ./program_prof gprof program_prof | gprof2dot.py | dot -Tpng -ooutput.png To use OProfile, install it and use it as follows. shedskin -e extmod make sudo opcontrol --start python main_program_that_imports_extmod sudo opcontrol --shutdown opreport -l extmod.so Tricks
statistics = {'nodes': 28, 'solutions': set()}
class statistics: pass
s = statistics(); s.nodes = 28; s.solutions = set()
print 'hoei', raw_input() # raw_input is called before printing 'hoei'!
class mytuple:
def __init__(self, a, b, c):
self.a, self.b, self.c = a, b, c
print "x =", x
print "y =", y
#{
import pylab as pl
pl.plot(x, y)
pl.show()
#}How to help out in DevelopmentOpen source projects thrive on feedback. Please send in bug reports, patches or other code, or suggestions about this document; or join the mailing list and start or participate in discussions. There is also a page with suggestions for possible tasks to start out with. If you are a student, you might want to consider applying for the yearly Google Summer of Code or GHOP projects. Shed Skin has so far successfully participated in one Summer of Code and one GHOP. |