Issue 38: Validation PDFa Broken After runing pdfsizeopt
Status:  Duplicate
Merged:  issue 13
Owner: ----
Closed:  Mar 2011
Reported by j.lefebv...@gmail.com, Jul 17, 2010
Pdfsizeopt broke PDFa :

- Remove ID
- Remove line break with 'endobj' and 'endstream'

I update William Bader patch for PDFa optimisation working :

./pdfsizopt.py --use-multivalent=false test.pdfa.pdf test.opt.pdfa.pdf

( multivalent broke PDFa )


pdfsizeopt.pat
14.3 KB   View   Download
Feb 10, 2011
Project Member #1 pts...@gmail.com
Which patch are you updating? Please give a URL to the original patch, and please attach the original pdfsizeopt.py you are patching.

In the long run, I think pdfsizeopt should not generate PDFa files by default. So if you want to get this patch integrated to mainstream pdfsizeopt, please add it so that it has to be enabled by a command-line flag.
Mar 4, 2011
Project Member #2 pts...@gmail.com
I've integrated most of the attached patch (pdfsizeopt.pat) to the trunk, r158, except for /Type/Page unification (has to be disabled with --do-unify-pages=false explicitly), except for Multivalent -nocore14, and except for these entries:

@@ -475,9 +475,9 @@
       output.append(self.stream)
       # We don't need '\nendstream' after a non-compressed content stream,
       # 'Qendstream endobj' is perfectly fine (accepted by gs and xpdf).
-      output.append('endstream endobj\n')
+      output.append('\nendstream\nendobj\n')
     else:
-      output.append('%sendobj\n' % space)
+      output.append('%s\nendobj\n' % space)
 
   def __GetHead(self):
     if self._head is None and self._cache is not None:
@ -3302,7 +3310,7 @@
       trailer_obj.Set('Compress', None)  # emitted by Multivalent.jar
       # Emitted by Multivalent.jar etc., see section 10.3 in
       # pdf_reference_1-7.pdf .
-      trailer_obj.Set('ID', None)
+      #trailer_obj.Set('ID', None)
       assert trailer_obj.head.startswith('<<')
       assert trailer_obj.head.endswith('>>')
       output.append('trailer\n%s\n' % trailer_obj.head)
@@ -5816,7 +5871,7 @@
         # Please note that we save the space of the removed /ID and /Compress
         # below, because /Type/XRef is usually the last object, so we don't
         # need to add padding.
-        pdf_obj.Set('ID', None)
+        #pdf_obj.Set('ID', None)
         pdf_obj.Set('Compress', None)
         if pdf_obj.Get('Index') != None:
           raise NotImplementedError('unexpected /Index in xref object')
@@ -2592,15 +2592,17 @@
     else:
       pdf_obj.Set('BitsPerComponent', pdf_image_data['BitsPerComponent'])
       pdf_obj.Set('ColorSpace', pdf_image_data['ColorSpace'])
-      pdf_obj.Set('Decode', pdf_image_data.get('Decode'))
+      if pdf_obj.Get('Decode') == None:
+        # Update Decode only if it is currently not set
+        pdf_obj.Set('Decode', pdf_image_data.get('Decode'))
     pdf_obj.Set('Filter', pdf_image_data['Filter'])
     pdf_obj.Set('DecodeParms', pdf_image_data.get('DecodeParms'))
     pdf_obj.Set('Length', len(pdf_image_data['.stream']))
     # Don't pdf_obj.Set('Decode', ...): it is good as is.
     pdf_obj.stream = pdf_image_data['.stream']
 
   def CompressToZipPng(self):

About PDF/A compatibility: yes, /ID has to be present and endobj/endstream must have whitespace in front of them for PDF/A compatibility. These features will be added to pdfsizeopt later, please post further comments about PDF/A support to https://code.google.com/p/pdfsizeopt/issues/detail?id=13 .
Mar 4, 2011
Project Member #3 pts...@gmail.com
The /Decode issue has been fixed in https://code.google.com/p/pdfsizeopt/issues/detail?id=37 , and the PDF/A issues are to be discussed in https://code.google.com/p/pdfsizeopt/issues/detail?id=13 , so closing this issue now.
Status: Duplicate
Mergedinto: 13