Issue 39: PdfTokenParseError when trying to optimize pdf
Status:  Fixed
Owner:
Closed:  Feb 2011
Reported by dr.dan.drake@gmail.com, Nov 29, 2010
The PDF is the book "Analytic Combinatorics", available from http://algo.inria.fr/flajolet/Publications/books.html

I used "./pdfsizeopt.py --use-pngout=true --use-jbig2=true --use-multivalent=true --do-unify-fonts=false" to optimize.

What does pdfsizeopt display when running the command above?

I get:

info: This is pdfsizeopt.py rUNKNOWN.
info: loading PDF from: book.pdf
info: loaded PDF of 12141468 bytes
info: separated to 3673 objs
Traceback (most recent call last):
  File "./pdfsizeopt.py", line 6157, in <module>
    main(sys.argv)
  File "./pdfsizeopt.py", line 6133, in main
    ).Load(file_name)
  File "./pdfsizeopt.py", line 3037, in Load
    do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "./pdfsizeopt.py", line 344, in __init__
    (other[start : start + 32], file_ofs))
__main__.PdfTokenParseError: X Y obj expected, got '&nJ\xd2\xde\x12w\xfeFX?T\xd9\x06\x13\xd4\xdbf\xbe\xca\x80\x18\xe7\xb8k\xf7\\\xb87\xda\xa7\x8c' at ofs=688

...and no output pdf is produced.

I checked out the sources, and my copy of pdfsizeopt is identical to the current checkout: $Id: pdfsizeopt.py 134 2009-11-29 11:48:12Z ptspts $


Feb 10, 2011
Project Member #1 pts...@gmail.com
Sorry for the long time it took for me to respond.

Thank you for reporting this bug. It's fixed in r143.

Please note that book.pdf is invalid: its xref table contains the offset 688, but there is no valid object there. I made pdfsizeopt.py ignore such a problem (printing a warning instead).
Status: Fixed