Issue 57: Add generation of object streams (/Type/ObjStm) with --use-multivalent={yes,no}
Status:  Fixed
Owner:
Closed:  Apr 2012
Reported by TTSten...@gmail.com, Apr 3, 2012
What command do you run to optimize the PDF?
pdfsizeopt.py --use-pngout=false --use-jbig2=false --use-multivalent=false example.pdf

What does pdfsizeopt display when running the command above?
info: This is pdfsizeopt.py rUNKNOWN size=281270.
info: loading PDF from: example.pdf
info: loaded PDF of 4093 bytes
info: found 22 obj offsets and 1 obj streams in xref stream
info: separated to 20 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 2 Type1C fonts loaded
info: saving PDF with 20 objs to: example.pso.pdf
info: generated 4856 bytes (119%)

What's wrong with the optimized PDF?
It's bigger than the original

TeX-File, compiled with XeLaTeX (same problem with LuaTeX):
\documentclass{article}
\usepackage{fontspec}
\begin{document}
\begin{section}{Section}
\end{section}
\end{document}
example.pdf
4.0 KB   Download
Apr 8, 2012
Project Member #1 pts...@gmail.com
What default behavior (of pdfsizeopt) would you expect in this case?
Apr 10, 2012
Project Member #2 pts...@gmail.com
I think it's nearly impossible to avoid the ``optimized PDF bigger than original'' case in general, because the original PDF might contain images or other bulk data with a very cleverly optimized ZIP compression, and when pdfsizeopt recompresses those objects (with ZIP), they become larger. If that really bothers you, I can suggest a workaround: add a flag to pdfsizeopt (disabled by default) so that it will use the original PDF if the optimized one turns out to be larger. Please request this in another issue if you need that.

Another improvement would be maintaining a cache of (uncompressed, compressed) stream data pairs, and reusing the compressed data if it's smaller than what pdfsizeopt can produce. This has already been implemented for images. But even implementing this wouldn't completely avoid the ``optimized PDF bigger than original'', it would just make it more rare.

I've analyzed the example.pdf attached to your previous post. The reason why it is smaller than the optimized one is that pdfsizeopt (with --use-multivalent=no) can't generate object streams (/Type/ObjStm). Adding this feature would be easy, it would solve the problem in this specific case, and it would be a good general improvement. I'm narrowing the scope of this issue as a feature request for that.
Summary: Add generation of object streams (/Type/ObjStm) with --use-multivalent=no
Status: Accepted
Labels: -Type-Defect -Priority-High Type-Enhancement Priority-Medium
Apr 11, 2012
Project Member #3 pts...@gmail.com
The original reported issue has been fixed in r183, which adds object stream generation to pdfsizeopt:

$ ./pdfsizeopt.py --use-multivalent=no example.pdf 
info: This is pdfsizeopt.py r183 size=292014.
info: loading PDF from: example.pdf
info: loaded PDF of 4093 bytes
info: found 22 obj offsets and 1 obj streams in xref stream
info: separated to 20 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 2 Type1C fonts loaded
info: saving PDF with 20 objs to: example.pso.pdf
info: generated object stream of 702 bytes in 13 objects (21%)
info: generated 4019 bytes (98%)

However, it's not fixed when Multivalent is enabled:

$ ./pdfsizeopt.py --use-multivalent=yes example.pdf 
info: This is pdfsizeopt.py r183 size=292014.
info: loading PDF from: example.pdf
info: loaded PDF of 4093 bytes
info: found 22 obj offsets and 1 obj streams in xref stream
info: separated to 20 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 2 Type1C fonts loaded
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: saving PDF with 20 objs to: pso.conv.mi.tmp.pdf
info: generated object stream of 702 bytes in 13 objects (21%)
info: generated 4019 bytes (98%)
info: executing Multivalent to optimize PDF: java -cp .../Multivalent.jar -Djava.awt.headless=true tool.pdf.Compress -nopagepiece -noalt pso.conv.mi.tmp.pdf
file:.../pso.conv.mi.tmp.pdf, 4019 bytes
PDF 1.5, producer=xdvipdfmx (0.7.8), creator= XeTeX output 2012.04.03:1909
additional compression may be possible with:
         -compact
=> new length = 4818, saved -19%, elapsed time = 0 sec
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 4839 bytes (120%)
info: compressed xref stream from 44 to 159 bytes (361%)
info: optimized to 4760 bytes after Multivalent (98%)
info: saving PDF to: example.psom.pdf
info: generated 4760 bytes (116%)

That's because Multivalent has decided not to emit an object stream this time. I'm keeping the issue open until I implement a workaround for that (i.e. pdfsizeopt will post-process the output of Multivalent, forcibly creating an object stream).
Status: Started
Apr 15, 2012
Project Member #4 pts...@gmail.com
I've just committed r185, which adds generates an object stream with --use-multivalent=yes, even if Multivalent hasn't generated one.
Apr 15, 2012
Project Member #5 pts...@gmail.com
(No comment was entered for this change.)
Summary: Add generation of object streams (/Type/ObjStm) with --use-multivalent={yes,no}
Apr 15, 2012
Project Member #6 pts...@gmail.com
As of r190 I've just submitted, pdfsizeopt tries all combinations of --do-generate-xref-stream= and --do-generate-object-stream= for small files, and picks the one with the smallest output size. This way the probability that the optimized PDF is larger than the original is much higher in cases like the example.pdf attached.

Again, thank you very much for reporting this issue, and providing the necessary details, so I could investigate and prepare fixes. I close this issue now. If you find something which is still wrong (or got wrong), please comment on the issue, and I'll reopen it.
Status: Fixed