Export to GitHub

flying-saucer - issue #259

Hindi characters appear incorrectly even after adding 'Arial Unicode MS' font when converting from HTML to PDF

Posted on Apr 2, 2015 by Massive Ox

What steps will reproduce the problem? 1. Take any HTML which has hindi characters in it. I am attaching an input html file. 2. Convert this HTML to PDF using the regular method by also providing font using renderer.getFontResolver().addFont() method. I checked it for 'Arial Unicode MS' and also using other hindi fonts such as Samyak Devanagari and Sarai but the result is same . The fonts are embedded correctly in the PDF which I have verified, hindi content is also visible but the words are not correct.

What is the expected output? What do you see instead? I am attaching files for the expected output and the actual output which should make things more clear.The expected output pdf has been generated using HTML from a tool called pdfcrowd which is doing it correctly.

What version of the product are you using? On what operating system? product version : Release 8 (R8) OS : Ubuntu 14.04

Please provide any additional information below.

Thank you for this. It is a wonderful tool but I am really stuck on this part. I have tried almost all solutions available on the net and filing an issue here as the last resort. The problem is that the fonts do get embedded in the PDF that is generated but the content is not displayed correctly. If I copy the content generated in the PDF and paste it on browser , everything is displayed correctly.

Attachments

input.html 847

actualoutput.pdf 69.91KB

expectedoutput.pdf 11.03KB

Status: New

Labels:
Type-Defect Priority-Medium

Code

Archive

flying-saucer - issue #259