Issue 78: Feature request: please, allow parallelization of external commands
Reported by rbr...@gmail.com, Feb 26, 2013
Some PDF files, especially those that were created by scanning a document, take a lot of time to be processed by pdfsizeopt, since each file is compressed one after the other by pngout etc.

If pdfsizeopt could use a program like GNU parallel (already in Debian and in Ubuntu) to "dispatch" the external commands, we could have a tremendous speedup on computers with more than 1 thread/core/etc.

If using another script like GNU parallel is not desired (that would avoid reinventing the wheel, at least for the short term), then perhaps using a thread pool for the external commands could prove useful.

Thanks.

Feb 27, 2013
Project Member #1 pts...@gmail.com
Thank you for coming up with this idea.

Implementing this feature is not as easy as it sounds. It would need a major redesign of how pdfsizeopt processes the object in the input PDF. The reimplementation would be error-prone, and lots of concurrency bugs would need to be diagnosed and fixed.

I think I'll postpone this because of the lack of free time and motivation on my part.

Another similar improvement: for each slow conversion step, save the SHA-1 hash of the input, and save the output to a cache directory, so if pdfsizeopt is run again on the same input, it will reuse the output from the cache. This is much easier to implement, but still a lot of work.
Labels: -Type-Defect -Priority-High Type-Enhancement Priority-Medium