Issue 76: Stack trace when running pdfsizeopt
Status:  Fixed
Owner:
Closed:  Feb 2013
Reported by rbr...@gmail.com, Feb 25, 2013
Dear Peter,

Using revision 224 of pdfsizeopt with the document at http://www.sp4comm.org/docs/sp4comm.pdf

I am getting the following output:

----

info: This is pdfsizeopt.py rUNKNOWN size=318356.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: sp4comm.pdf
info: loaded PDF of 4175835 bytes
info: using Ghostscript gs: GPL Ghostscript 9.05 (2012-02-08)
info: decompressing 72 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 5/Predictor 12>>
info: decompressing 2211 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 5/Predictor 12>>
info: found 1914 obj offsets and 39 obj streams in xref stream
Traceback (most recent call last):
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 7887, in <module>
    main(sys.argv)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 7849, in main
    ).Load(file_name)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3504, in Load
    data, do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3859, in ParseUsingXref
    xref_ofs, xref_obj_num, xref_generation)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3790, in ParseUsingXrefStream
    obj_num)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 2834, in ParseObjStm
    objstm_data = self.GetUncompressedStream()
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 2327, in GetUncompressedStream
    return PermissiveZlibDecompress(self.stream)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 229, in PermissiveZlibDecompress
    uncompressed = zlib.decompressobj().decompress(data)
zlib.error: Error -3 while decompressing: incorrect header check

----

As the document may change/be updated, the md5sum of the copy that I have here is f22d1e72e38601a4b32861c653e2b24d.  I don't know if I am allowed to post the file here, but can gladly post it if you ask me.

I suspect that the problem with the decompression *may* be related to the fact that the document is (apparently) encrypted, as the output of pdfinfo is:

----

pdfinfo sp4comm.pdf 
Title:          livre.dvi
Creator:        dvips(k) 5.94b Copyright 2004 Radical Eye Software
Producer:       Acrobat Distiller 8.1.0 (Macintosh)
CreationDate:   Tue Apr 29 17:05:46 2008
ModDate:        Sun Nov 22 17:43:59 2009
Tagged:         no
Pages:          388
Encrypted:      yes (print:yes copy:no change:no addNotes:no)
Page size:      453.543 x 680.315 pts
File size:      4175835 bytes
Optimized:      yes
PDF version:    1.6

----

Is that the case?

If you need any further information, please let me know.


Thanks,
Rogério Brito.


P.S.: As a last resort, of course, it is way too easy to circumvent such things with, for example, a pass of ghostscript (e.g., ps2pdf), but it would be nice to have pdfsizeopt warn the user of this potential (say, perhaps, with a message suggesting the possibility above) instead of dumping the stack trace.

Feb 27, 2013
Project Member #1 pts...@gmail.com
Thank you for reporting this bug.

Indeed pdfsizeopt doesn't support encrypted PDF. There is issue 51 already open about that.

However, the error message printed for your input file was not informative enough, so I fixed that in r226. The new error message is:

info: This is pdfsizeopt.py r226 size=321397.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from:  issue76 .pdf
info: loaded PDF of 4175835 bytes
Traceback (most recent call last):
  File "./pdfsizeopt.py", line 8097, in <module>
    main(sys.argv)
  File "./pdfsizeopt.py", line 8059, in main
    ).Load(file_name)
  File "./pdfsizeopt.py", line 3692, in Load
    '.decrypted.pdf')))
NotImplementedError: encrypted PDF input not supported, use this command to decrypt first: qpdf --decrypt  issue76 .pdf  issue76 .decrypted.pdf

Status: Fixed
Feb 27, 2013
#2 rbr...@gmail.com
Great, much more informative, and separating the code in the extra class makes the code a bit easier to read.

Thanks.