| Issue 13: | PDF/A compliance: ID in file trailer missing or incomplete | |
| 3 people starred this issue and may be notified of changes. | Back to list |
What command do you run to optimize the PDF? user@ubuntu804server:~/pdfsizeopt$ ./pdfsizeopt.py test.pdf What does pdfsizeopt display when running the command above? info: This is pdfsizeopt.py r102. info: loading PDF from: test.pdf info: loaded PDF of 16776 bytes info: separated to 19 objs info: found 1 Type1 fonts loaded info: writing Type1CConverter (9062 font bytes) to: pso.conv.tmp.ps info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVI CE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchange d -sOutputFile=pso.conv.tmp.pdf -f pso.conv.tmp.ps Type1CConverter: using interpreter GPL Ghostscript 861 20071121 Type1CConverter: converting font /PGWBAM+CMR10 to /Obj0000000016 Type1CConverter: all OK info: loading PDF from: pso.conv.tmp.pdf info: loaded PDF of 3943 bytes info: separated to 14 objs info: found 1 fonts in GS output info: optimized total Type1 font size 9035 to Type1C font size 895 (10%) info: optimized Type1 font XObject 16,15: new size=1132 (12%) info: found 1 Type1C fonts loaded info: writing Type1CParser (909 font bytes) to: pso.conv.parse.tmp.ps info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVICE= nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps Type1CParser: using interpreter GPL Ghostscript 861 20071121 Type1CParser: all OK info: parsed 1 Type1C fonts info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf info: saving PDF with 18 objs to: pso.conv.mi.tmp.pdf info: generated 8290 bytes (49%) info: executing Multivalent to optimize PDF: java -cp /home/user/pdfsizeopt/Mult ivalent.jar tool.pdf.Compress pso.conv.mi.tmp.pdf file:/home/user/pdfsizeopt/pso.conv.mi.tmp.pdf, 8290 bytes PDF 1.4, producer=pdfTeX, creator=pdfTeX additional compression may be possible with: -compact => new length = 7963, saved 3%, elapsed time = 0 sec info: Multivalent generated pso.conv.mi.tmp-o.pdf of 7984 bytes (96%) info: compressed xref stream from 40 to 157 bytes (393%) info: optimized to 7906 bytes after Multivalent (99%) info: saving PDF to: test.psom.pdf info: generated 7906 bytes (47%) What's wrong with the optimized PDF? It fails to validate as PDF/A-1b (using acrobat 7.1.0 for the validation). I get the message: ID in file trailer missing or incomplete
Oct 31, 2009
#1
lev.bishop
Nov 15, 2009
Thank you for the bug report and the patch. pdfsizeopt.py doesn't strive for PDF/A compliance. But if all you need is the /ID, please add a command-line flag that enables keeping the ID, turned off by default.
Status:
Accepted
Jan 24, 2010
In addition to the /ID, it PDF/A requires 1.4 or lower. Therefore, the -old option should be passed to tool.pdf.Compress. However this causes problems that I don't yet understand, so I am still investigating this.
Feb 10, 2011
It would be nice to add PDF/A compatibility to pdfsizeopt's output -- provided that its input PDF is also compliant to PDF/A, and the user explicitly asks for PDF/A output by specifying a command-line flag. However, I definitely don't want it enabled by default, because it increases the file size. I'm not starting to add this feature alone. If you'd like to contribute, please attach some (preferably tiny) example PDFs to this bug, for which pdfsizeopt.py currently doesn't produce PDF/A. I'm closing this bug until you reply. Do you have a software which checks for PDF/A compatibility? Is there free software for that?
Labels:
-Type-Defect Type-Enhancement
Feb 10, 2011
Thanks for considering this. I would be glad to work with you on getting it working. I attach a small file that verifies as PDF/A-1b (using the Acrobat 9.4.1 preflight tool), the result of running pdfsizeopt --use-multivalent=false on this, and the resulting PDF/A-1b conformance failure report from Acrobat. The problems are: 1) ID in file trailer missing or incomplete 2) Syntax problem: Stream dictionary improperly formatted 3) Syntax problem: Stream dictionary has improper length entry 4) Syntax problem: Indirect object “endobj” keyword not preceded by an EOL marker 5) Indirect object “endobj” keyword not followed by an EOL marker As I said in the previous comment, with --use-multivalent=true it would be necessary to give the -old option to multivalent, but that breaks other parts of pdfsizeopt.py. Perhaps in the first place it would be enough to support only -use-multivalent=false for PDF/A. I have Acrobat Pro 9.4.1 so I can certainly verify any fixes you implement. I'm not aware of any free conformance tools, but I can't say that I've looked very hard.
Feb 10, 2011
Sorry, here's the pdfsizeopt output that I forgot to attach
Feb 10, 2011
Cool, thanks for the details. I'm happy to make changes to pdfsizeopt.py so that Acrobat preflight won't complain. But since I don't have that software, the most straightforward way is that we prepare test input and output file. I'll implement solutions to complaints 1) ... 5). Stay tuned for an update to this bug. I'll add support to pdfsizeopt.py for generating xref streams, no matter if Multivalent is used. I'll make sure that pdfsizeopt won't use %PDF-1.5 features, and it would fail if the input is newer than %PDF-1.4. I'll to figure out what kind of an /ID should be added if there was none. I'll also patch pdfsizeopt.py so that it accepts the output of Multivalent tool.pdf.Compress -old.
Feb 10, 2011
Its probably not necessary to add an /ID if there was none, since this would mean that the input already did not conform to PDF/A.
Feb 10, 2011
> Its probably not necessary to add an /ID if there was none, since this would mean that the input already did not conform to PDF/A. You are correct that it's not necessary. But I'd do so anyway, because it's just a simple modification to pdfsizeopt.py, and can be helpful just in case.
Feb 10, 2011
Could you please try if Acrobat preflight accepts /ID[()()] in the trailer without complaining? What about /ID[(A)(A)]?
Feb 16, 2011
Sorry it took me a while to figure out how to do this. /ID[()()] : not accepted /ID[(A)(A)] : accepted
Mar 4, 2011
Issue 38 has been merged into this issue. |