Issue 79: Multivalent: java.io.IOException: invalid distance too far back @ 0
Reported by rbr...@gmail.com, Feb 26, 2013
Hi, Peter.

I'm getting a stack trace when running pdfsizeopt revision 244 with some files:

----

info: This is pdfsizeopt.py rUNKNOWN size=318356.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: Galois.pdf
info: loaded PDF of 518573 bytes
warning: problem with xref table: xref table not found at 508214
warning: trying to load objs without the xref table
info: separated to 513 objs + trailer
info: found 0 Type1 fonts loaded
info: found 30 Type1C fonts loaded
info: writing Type1CParser (90093 font bytes) to: pso.conv.parse.tmp.ps
info: using Ghostscript gs: GPL Ghostscript 9.05 (2012-02-08)
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 905 20120208
Type1CParser: all OK
info: parsed 30 Type1C fonts
info: eliminated 5 duplicate objs
info: saving PDF with 508 objs with Multivalent to: Galois.psom.pdf
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: generated object stream of 8541 bytes in 364 objects (9%)
info: written 462924 bytes to Multivalent input PDF: pso.conv.mi.tmp.pdf
info: executing Multivalent to optimize PDF: /usr/bin/java -cp /home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/Multivalent.jar -Djava.awt.headless=true tool.pdf.Compress -nopagepiece -noalt -mon pso.conv.mi.tmp.pdf
file:/home/rbrito/Dropbox/documents-to-sort-out/pso.conv.mi.tmp.pdf, 462924 bytes
PDF 1.5, producer=MiKTeX-xdvipdfmx (0.7.8), creator= XeTeX output 2012.12.16:1357
511 objects / 106 pagesjava.io.IOException: invalid distance too far back @ 0 while reading object #204: {Filter=FlateDecode, DATA=118811, Length=3781}
pso.conv.mi.tmp.pdf: java.io.IOException: invalid distance too far back @ 0
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 0 bytes (0%)
Traceback (most recent call last):
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7887, in <module>
    main(sys.argv)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7880, in main
    is_flate_ok=not do_decompress_flate)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7579, in Save
    multivalent_java=multivalent_java)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7513, in _RunMultivalent
    'Multivalent generated empty output (see its error above)')
AssertionError: Multivalent generated empty output (see its error above)

----

I don't know if the problem here is with multivalent or if it is with pdfsizeopt, but I have been getting this java.IO.IOException a lot with some new PDF files that I am trying (yes, I am now hitting pdfsizeopt quite hard).

The offending file is attached.

Please let me know if there are other information that is needed.


Thanks.

Galois.pdf
506 KB   Download
Feb 26, 2013
Project Member #1 pts...@gmail.com
Thank you for reporting this bug, and thank you for attaching the sample input PDF. I could reproduce the problem. Indeed there is something wrong in Multivalent, and pdfsizeopt doesn't recover from it. I'll take a closer look later.

Please note that the Galois.pdf you have attached seems to be invalid: evince can't display page 37 properly, see the attached screen shot.

In order to isolate bugs in pdfsizeopt, could you please upload a valid sample PDF (possibly by regenerating it without page 37) for which it fails?

In the meantime, you can run `pdfsizeopt --use-multivalent=no' (without the quotes) as a workaround, but it won't fix your PDF if it was already broken.
Summary: Multivalent: java.io.IOException: invalid distance too far back @ 0 (was: (Another) Stack trace when running pdfsizeopt)
Status: Accepted
Feb 26, 2013
Project Member #2 pts...@gmail.com
The attached Galois.pdf file is corrupt: object 136, of /Length 3781, contains a corrupt (uncompressible) /FlateDecode stream.

The behavior and output of pdfsizeopt is not defined when it receives invalid input (such as Galois.pdf). All I can do for this issue is improving the error message pdfsizeopt prints a bit.

To get this PDF optimized, please regenerate it correctly first, or run it through a converter which removes invalid parts, and run pdfsizeopt only after that.
Labels: -Priority-High Priority-Medium
Feb 27, 2013
#3 rbr...@gmail.com
Hi, Peter.

I just got another copy of the document from the author and this new one is fine.

I guess that what we can take from this episode is that pdfsizeopt could print an error message instead of dumping a stack trace.

Thanks.

Mar 3, 2013
Project Member #4 pts...@gmail.com
pdfsizeopt indeed prints a useful error message ``Multivalent generated empty output (see its error above).''. It also prints a stack trace, which is even more useful, because it can be copy-pasted to the issue tracker. Removing the stack trace would make it less useful, thus worse. Making this particular error message (or the corresponding Multivalent error message) more useful would be too much work. Maybe I could add an ``Is the input PDF corrupt?'' clause here, but that's also too much work to do consistently, because PDFs can be corrupt in many ways. The easy improvement is to add the following sentence to the documentation: ``If your input PDF is corrupt, pdfsizeopt may succeed or it may fail, possibly with an error message which is difficult to understand. If you think your PDF is correct, then please report a bug in the pdfsizeopt issue tracker.''.

Do you have any specific suggestions how to better report the failure in this particular case?
Labels: -Type-Defect Type-Enhancement