My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 13: PDF/A compliance: ID in file trailer missing or incomplete
3 people starred this issue and may be notified of changes. Back to list
Status:  Accepted
Owner:  pts...@gmail.com


 
Reported by lev.bishop, Oct 31, 2009
What command do you run to optimize the PDF?
user@ubuntu804server:~/pdfsizeopt$ ./pdfsizeopt.py test.pdf

What does pdfsizeopt display when running the command above?
info: This is pdfsizeopt.py r102.
info: loading PDF from: test.pdf
info: loaded PDF of 16776 bytes
info: separated to 19 objs
info: found 1 Type1 fonts loaded
info: writing Type1CConverter (9062 font bytes) to: pso.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVI
CE=pdfwrite -dPDFSETTINGS=/printer
-dColorConversionStrategy=/LeaveColorUnchange
d -sOutputFile=pso.conv.tmp.pdf -f pso.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 861 20071121
Type1CConverter: converting font /PGWBAM+CMR10 to /Obj0000000016
Type1CConverter: all OK
info: loading PDF from: pso.conv.tmp.pdf
info: loaded PDF of 3943 bytes
info: separated to 14 objs
info: found 1 fonts in GS output
info: optimized total Type1 font size 9035 to Type1C font size 895 (10%)
info: optimized Type1 font XObject 16,15: new size=1132 (12%)
info: found 1 Type1C fonts loaded
info: writing Type1CParser (909 font bytes) to: pso.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=
nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 861 20071121
Type1CParser: all OK
info: parsed 1 Type1C fonts
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: saving PDF with 18 objs to: pso.conv.mi.tmp.pdf
info: generated 8290 bytes (49%)
info: executing Multivalent to optimize PDF: java -cp
/home/user/pdfsizeopt/Mult
ivalent.jar tool.pdf.Compress pso.conv.mi.tmp.pdf
file:/home/user/pdfsizeopt/pso.conv.mi.tmp.pdf, 8290 bytes
PDF 1.4, producer=pdfTeX, creator=pdfTeX
additional compression may be possible with:
         -compact
=> new length = 7963, saved 3%, elapsed time = 0 sec
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 7984 bytes (96%)
info: compressed xref stream from 40 to 157 bytes (393%)
info: optimized to 7906 bytes after Multivalent (99%)
info: saving PDF to: test.psom.pdf
info: generated 7906 bytes (47%)

What's wrong with the optimized PDF?
It fails to validate as PDF/A-1b (using acrobat 7.1.0 for the validation).
I get the message:
ID in file trailer missing or incomplete
test.pdf
16.4 KB   Download
Oct 31, 2009
#1 lev.bishop
Patch:
Index: pdfsizeopt.py
===================================================================
--- pdfsizeopt.py       (revision 102)
+++ pdfsizeopt.py       (working copy)
@@ -3284,7 +3284,7 @@
       trailer_obj.Set('Compress', None)  # emitted by Multivalent.jar
       # Emitted by Multivalent.jar etc., see section 10.3 in
       # pdf_reference_1-7.pdf .
-      trailer_obj.Set('ID', None)
+      # trailer_obj.Set('ID', None)
       assert trailer_obj.head.startswith('<<')
       assert trailer_obj.head.endswith('>>')
       output.append('trailer\n%s\n' % trailer_obj.head)
@@ -5777,7 +5777,7 @@
         # Please note that we save the space of the removed /ID and /Compress
         # below, because /Type/XRef is usually the last object, so we don't
         # need to add padding.
-        pdf_obj.Set('ID', None)
+       # pdf_obj.Set('ID', None)
         pdf_obj.Set('Compress', None)
         if pdf_obj.Get('Index') != None:
           raise NotImplementedError('unexpected /Index in xref object')
Nov 15, 2009
Project Member #2 pts...@gmail.com
Thank you for the bug report and the patch.

pdfsizeopt.py doesn't strive for PDF/A compliance. But if all you need is the /ID,
please add a command-line flag that enables keeping the ID, turned off by default.


Status: Accepted
Jan 24, 2010
#3 lev.bishop
In addition to the /ID, it PDF/A requires 1.4 or lower. Therefore, the -old option
should be passed to tool.pdf.Compress. However this causes problems that I don't yet
understand, so I am still investigating this.
Feb 10, 2011
Project Member #4 pts...@gmail.com
It would be nice to add PDF/A compatibility to pdfsizeopt's output -- provided that its input PDF is also compliant to PDF/A, and the user explicitly asks for PDF/A output by specifying a command-line flag. However, I definitely don't want it enabled by default, because it increases the file size.

I'm not starting to add this feature alone. If you'd like to contribute, please attach some (preferably tiny) example PDFs to this bug, for which pdfsizeopt.py currently doesn't produce PDF/A. I'm closing this bug until you reply.

Do you have a software which checks for PDF/A compatibility? Is there free software for that?
Labels: -Type-Defect Type-Enhancement
Feb 10, 2011
#5 lev.bishop
Thanks for considering this. I would be glad to work with you on getting it working. I attach a small file that verifies as PDF/A-1b (using the Acrobat 9.4.1 preflight tool), the result of running pdfsizeopt --use-multivalent=false on this, and the resulting PDF/A-1b conformance failure report from Acrobat. The problems are: 
1) ID in file trailer missing or incomplete
2) Syntax problem: Stream dictionary improperly formatted
3) Syntax problem: Stream dictionary has improper length entry
4) Syntax problem: Indirect object “endobj” keyword not preceded by an EOL marker
5) Indirect object “endobj” keyword not followed by an EOL marker

As I said in the previous comment, with --use-multivalent=true it would be necessary to give the -old option to multivalent, but that breaks other parts of pdfsizeopt.py. Perhaps in the first place it would be enough to support only -use-multivalent=false for PDF/A.

I have Acrobat Pro 9.4.1 so I can certainly verify any fixes you implement. I'm not aware of any free conformance tools, but I can't say that I've looked very hard.

test1.pdf
22.2 KB   Download
test1.pso_report.txt
8.1 KB   View   Download
Feb 10, 2011
#6 lev.bishop
Sorry, here's the pdfsizeopt output that I forgot to attach
test1.pso.pdf
22.1 KB   Download
Feb 10, 2011
Project Member #7 pts...@gmail.com
Cool, thanks for the details.

I'm happy to make changes to pdfsizeopt.py so that Acrobat preflight won't complain. But since I don't have that software, the most straightforward way is that we prepare test input and output file.

I'll implement solutions to complaints 1) ... 5). Stay tuned for an update to this bug.

I'll add support to pdfsizeopt.py for generating xref streams, no matter if Multivalent is used.

I'll make sure that pdfsizeopt won't use %PDF-1.5 features, and it would fail if the input is newer than %PDF-1.4.

I'll to figure out what kind of an /ID should be added if there was none.

I'll also patch pdfsizeopt.py so that it accepts the output of Multivalent tool.pdf.Compress -old.
Feb 10, 2011
#8 lev.bishop
Its probably not necessary to add an /ID if there was none, since this would mean that the input already did not conform to PDF/A.
Feb 10, 2011
Project Member #9 pts...@gmail.com
> Its probably not necessary to add an /ID if there was none, since this would mean that the input already did not conform to PDF/A.

You are correct that it's not necessary. But I'd do so anyway, because it's just a simple modification to pdfsizeopt.py, and can be helpful just in case.
Feb 10, 2011
Project Member #10 pts...@gmail.com
Could you please try if Acrobat preflight accepts /ID[()()] in the trailer without complaining? What about /ID[(A)(A)]?
Feb 16, 2011
#11 lev.bishop
Sorry it took me a while to figure out how to do this.
/ID[()()]   : not accepted
/ID[(A)(A)] : accepted
Mar 4, 2011
Project Member #12 pts...@gmail.com
 Issue 38  has been merged into this issue.

Powered by Google Project Hosting