Issue 58: Ghostscript fails to uncompress stream on Windows
Status:  Fixed
Owner: ----
Closed:  Jun 2012
Reported by fdnc...@gmail.com, Jun 21, 2012
What steps will reproduce the problem?
1. pdfsizeopt.py Pages1-7.pdf

What is the expected output? What do you see instead?
I expect it not to fail.  I realize that this may be a gs problem but I'm not sure what else to try.

This is the output I get.
C:\Users\x991808\Desktop\PDFSizeOpt>pdfsizeopt.py Pages1-7.pdf
info: This is pdfsizeopt.py rUNKNOWN size=306026.
info: loading PDF from: Pages1-7.pdf
info: loaded PDF of 257118 bytes
info: decompressing 36 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 3/Predictor
 12>>
Traceback (most recent call last):
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 7594, in <module>
    main(sys.argv)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 7564, in main
    ).Load(file_name)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3336, in Load
    data, do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3655, in ParseUsingXref
    xref_ofs, match)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3488, in ParseUsingXrefStream
    w0, w1, w2, index, xref_data = trailer_obj.GetAndClearXrefStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 1357, in GetAndClearXrefStream
    xref_tuple = self.GetXrefStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 1343, in GetXrefStream
    xref_data = self.GetUncompressedStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 2218, in GetUncompressedStream
    gs_defilter_cmd)
AssertionError: Ghostscript decompression failed: gswin32c -dNODISPLAY -q -sINFN=pso.filter.tmp.bin
-c '/i INFN(r)file<</CloseSource true /Intent 2/Filter /FlateDecode/DecodeParms <</Columns 3/Predict
or 12>>>>/ReusableStreamDecode filter def /o(%stdout)(w)file def/s 4096 string def {i s readstring e
xch o exch writestring not{exit}if}loop o closefile quit'

What version of the product are you using? On what operating system?
The version is the one in the source control.  Windows7.

Please provide any additional information below.
Using ghostscript 8.53, tried 9.05 to no avail.
Using sam2p.exe 0.49, as well as the exe's in the image-decode-win32.zip on the sam2p site.
Using pngout.exe from 7/2/2011
Using python 2.7
Using Multivalent20060102.jar, tried Multivalent20091027.jar

Thanks,
Darren

Pages1-7.pdf
251 KB   Download
Jun 25, 2012
Project Member #1 pts...@gmail.com
Thank you for reporting this problem, and thank you for sending a detailed bug report.

Using the attached file Pages1-7.pdf I could identify and fix (in r194) an xref object parsing bug.

Based on the information provided I could diagnose and fix (in r195) a Windows-only bug (GetUncompressedStream was calling Ghostscript incorrenctly).

Please download the newest pdfsizeopt, and run the following command:

  pdfsizeopt.py --use-multivalent=no --do-optimize-images=no Pages1-7.pdf

For me it succeeds and prints this on Linux:

info: This is pdfsizeopt.py r195 size=309327.
info: loading PDF from: Pages1-7.pdf
info: loaded PDF of 257118 bytes
info: using Ghostscript gs: GPL Ghostscript 8.71 (2010-02-10)
info: decompressing 36 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 3/Predictor 12>>
info: decompressing 97 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 5/Predictor 12>>
info: found 43 obj offsets and 3 obj streams in xref stream
info: separated to 38 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: eliminated 6 duplicate objs
info: eliminated 2 unused objs in 2 classes
info: saving PDF with 30 objs to: Pages1-7.pso.pdf
info: generated object stream of 560 bytes in 21 objects (12%)
info: generated 253953 bytes (99%)

If that command doesn't work for you, please reply (and include the full output).

If that command works for you, then you can remove the flags --use-multivalent=no and --do-optimize-images=no one-by-one. If removing the flags makes it fail, please open  another issue about that.
Summary: Ghostscript fails to uncompress stream on Windows
Status: Fixed
Labels: OpSys-Windows
Jun 25, 2012
#2 fdnc...@gmail.com
Thanks for your help.

I get the same output you do when i don't use multivalent and i don't optimize images however when multivalent=yes and optimize-images=yes I get this error string:

"Error in findFileFormatStream: failed to read first 12 bytes of file"

and the program keeps running.  I'm assuming this is an error with jbig2.exe because the PDF file that is created has 7 pages but i get an acrobat error "Insufficient data for an image" and all pages are blank.  I see that error string in Leptonica but the function looks pretty simple so I'm not sure why it's failing.

Attached is the log and pdf file.
Pages1-7.psom.pdf
183 KB   Download
image_optimze_error.txt
23.9 KB   View   Download
Jun 25, 2012
#3 fdnc...@gmail.com
One last note.  I just finished compiling Adam Langley's jbig2 encoder with vs2010 on my system.  That got rid of the "Error in findFileFormatStream" problems but the PDF still fails to open with the "Insufficient blah blah" error.  Not quite sure where to go to from here.

Thanks,
Darren
Jun 25, 2012
Project Member #4 pts...@gmail.com
Please open a new issue, attach the original PDF (again), the PDF generated by pdfsizeopt+jbig2, and the jbig2.exe you use. Don't forget to include the console output messages of pdfsizeopt. I'll start by trying to reproduce the problem on Linux.