Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sendfile() to speed up transfers #152

Closed
giampaolo opened this issue May 28, 2014 · 9 comments
Closed

Use sendfile() to speed up transfers #152

giampaolo opened this issue May 28, 2014 · 9 comments
Assignees
Labels
Component-Library enhancement imported imported from old googlecode site and very likely outdated Performance Priority-High

Comments

@giampaolo
Copy link
Owner

From g.rodola on January 24, 2011 08:06:14

Many Unix kernels provide a function called sendfile(2), a system call which 
provides a "zero-copy" way of copying data from one file (or socket) descriptor 
to another, which should result in a considerable speed-up when sending a file 
from server to client: http://www.proftpd.org/docs/howto/Sendfile.html This 
kind of function is particularly useful for softwares such as FTP servers and 
it would be good if pyftpdlib can take advantage of it.

= Implementation =

sendfile() is available on different UNIX platforms such as Linux, *BSD and AIX. 
There are a bunch of python wrappers available. Here is one: 
http://pypi.python.org/pypi/py-sendfile/ Medusa also provides one, and also 
provides an example of integration with asynchat: 
http://www.nightmare.com/medusa/ As for python itself, sendfile() has been 
proposed for inclusion in stdlib; here's a patch which is supposed to be added 
in python 3.3: http://bugs.python.org/issue10882 = Windows = 

Windows provides a similar function (TransmitFile): 
http://msdn.microsoft.com/en-us/library/ms740565(v=vs.85).aspx ...and it is 
exposed via pywin32 extension: 
http://sourceforge.net/tracker/index.php?func=detail&aid=1962146&group_id=78018&atid=551956
 = Integration with pyftpdlib = 

pyftpdlib can either implement its own wrapper for sendfile/TransmitFile or 
depend on py-sendfile/pywin32 modules.
The latter choice seems to make more sense.
In this case, changes would consist in modifying DTPHandler class so that it 
uses the new function when one of the two modules is installed, otherwise 
fallback on default transfer method/implementation.
Use of sendfile can be enabled/disabled via a DTPHandler.use_sendfile class 
attribute defaulting to True or False depending on whether the required module 
is installed.
The use of sendfile() will be explicitly avoided/disabled by pyftypdlib itself 
in case of:

- transfers in ASCII mode
- throttled transfers via ThrottledDTPHandler class
- when SSL/TLS is used on the data channel

Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=152

@giampaolo giampaolo self-assigned this May 28, 2014
@giampaolo
Copy link
Owner Author

From g.rodola on January 24, 2011 04:01:20

In attachment a patch for UNIX and a benchmark script which tests the 
performance difference by transfering:
- a big 1GB file 1 time
- some other smaller 10 MB files 100 times

Results:

Without sendfile:

bigfile: 23.71 usec/pass
smallfile: 25.25 usec/pass

With sendfile:

bigfile: 14.14 usec/pass
smallfile: 15.69 usec/pass


If I'm not mistaken the speed-up should be around 65%.

Attachment: bench.py sendfile.patch

@giampaolo
Copy link
Owner Author

From anacrolix@gmail.com on January 24, 2011 18:27:18

This is a great idea, especially given the issue regarding multithreading, I 
imagine the GIL is released for the duration of the sendfile() syscall, 
significantly reducing Python buffer/garbage collection shenanigans, and 
enabling it to server other requests in the meanwhile. Shame it requires Pipi, 
and that Python-3k won't be mainstream for at least 2 years yet :P

@giampaolo
Copy link
Owner Author

From g.rodola on January 30, 2011 17:23:55

Attached is a patch which uses ctypes implemented for Linux and FreeBSD.
I decided not to use py-sendfile extension as it is broken on FreeBSD.

Results:

Without patch:

big-file:    14.47 usec/pass
small-files: 15.08 usec/pass
download:    760.32 Mb/sec
cpu:         4.01 seconds
memory:      9.21 MB

With patch, using sendfile():

big-file:    7.61 usec/pass
small-files: 9.93 usec/pass
download:    1501.95 Mb/sec
cpu:         2.61 seconds
memory:      9.22 MB


Yep, that's right: 1.5 giga bytes per second! =)
That's about 2x faster than using plain send() calls, with half the CPU usage.
Out of curiosity, I've tried bench.py script against proftpd (which is written 
in C, and should use multiple processes for handling concurrency) and results 
are almost the same.

Attachment: sendfile_ctypes.patch bench.py

@giampaolo
Copy link
Owner Author

From g.rodola on February 03, 2011 14:37:31

Adds OSX support.
On OSX use of sendfile() results in a 3x speedup.

Attachment: sendfile_ctypes.patch

@giampaolo
Copy link
Owner Author

From g.rodola on April 08, 2011 07:42:07

Patch in attachment uses third-party py-sendfile extension which I rewritten in 
C and is available here: https://code.google.com/p/py-sendfile/

Attachment: sendfile.patch

@giampaolo
Copy link
Owner Author

From g.rodola on January 03, 2012 03:27:30

This is now finally committed in r943 .
sendfile() usage is governed by FTPHandler.use_sendfile option (a bool) which 
defaults to True if https://code.google.com/p/py-sendfile/ C extension module 
is installed.
A final benchmark on Linux shows a 2x speedup:

send()

big-file:    21.12 usec/pass
small-files: 21.36 usec/pass
download:    544.07 Mb/sec
cpu:         4.83 seconds
memory:      12.76 MB


sendfile()

big-file:    10.98 usec/pass
small-files: 10.35 usec/pass
download:    1189.83 Mb/sec
cpu:         2.8 seconds
memory:      12.74 MB

Status: FixedInSVN
Labels: -Priority-Medium Priority-High Milestone-0.6.1 Version-0.6.0

@giampaolo
Copy link
Owner Author

From g.rodola on January 03, 2012 03:40:13

Labels: Milestone-0.7.0

@giampaolo
Copy link
Owner Author

From g.rodola on January 11, 2012 04:14:29

New benchmark after some other tuning shows the speedup is actually closer to ~3x:

send()

big-file:    56.18 usec/pass
small-files: 62.49 usec/pass
download:    187.26 Mb/sec
cpu:         12.59 seconds
memory:      12.43 MB

sendfile()

big-file:    20.02 usec/pass
small-files: 24.22 usec/pass
download:    506.4 Mb/sec
cpu:         5.19 seconds
memory:      12.44 MB

@giampaolo
Copy link
Owner Author

From g.rodola on January 25, 2012 11:24:18

0.7.0 is out. Closing this out as definitively fixed.

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Library enhancement imported imported from old googlecode site and very likely outdated Performance Priority-High
Projects
None yet
Development

No branches or pull requests

1 participant