My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 604: East asian characters are not aligned correctly in console output
1 person starred this issue and may be notified of changes. Back to list
Status:  Done
Owner:  jussi.ao...@gmail.com
Closed:  Aug 2010


Sign in to add a comment
 
Reported by xieya...@gmail.com, Aug 1, 2010
The width of some Unicode characters -- East asian -- is 2, that cause pybot's output aligned incorrectly.

Robot Framework 2.5 (Python 2.6.5 on darwin)

Demo:

0$ cat test_east_asian_width.txt 
*** test cases ***
汉字应该正确对齐
        Log  Hello world!
0$ pybot test_east_asian_width.txt 
==============================================================================
Test East Asian Width                                                         
==============================================================================
汉字应该正确对齐                                                              | PASS |
------------------------------------------------------------------------------
Test East Asian Width                                                 | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed


After patched:
0$ pybot test_east_asian_width.txt 
==============================================================================
Test East Asian Width                                                         
==============================================================================
汉字应该正确对齐                                                      | PASS |
------------------------------------------------------------------------------
Test East Asian Width                                                 | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed


The patch and testcase attached.
robotframework-output-chinese-width.patch
2.7 KB   View   Download
test_east_asian_width.txt
63 bytes   View   Download
Aug 1, 2010
#1 xieya...@gmail.com
see also: East Asian Width http://unicode.org/reports/tr11/
Aug 4, 2010
Project Member #2 pekka.klarck
Thanks for a bug report and patch. I was both able to verify the problem and test that the patch fixes it.

Imporing the unicodedata module used here is, unfortunately, _very_ slow with Jython:

$ time jython -c "import sys"
real	0m5.243s
user	0m5.664s
sys	0m0.392s

$ time jython -c "from unicodedata import east_asian_width"
real	0m10.867s
user	0m15.545s
sys	0m0.488s

Applying the patch in the current format would thus mean slowing the start-up time with Jython for 5 seconds, which clearly is not acceptable. Do you know is there any other method to find out how long these characters actually are? If there isn't, we need to use this fix only with Python.

Because this problem apparently only affects the console output I consider it relatively low priority. 
Summary: East asian characters are not aligned correctly in console output
Status: Accepted
Labels: -Priority-Medium Priority-Low Target-2.6
Aug 5, 2010
#3 xieya...@gmail.com
I'am sure I can optimize this code with pre-compiled data, and I have dumped all wide chars with a script to do it. I am glad to hear any suggestions, and the script file attached, for anyone if interested.
analyze_eaw_chars.py
1.0 KB   View   Download
Aug 16, 2010
Project Member #4 pekka.klarck
Pre-compiled data sounds like a good solution. I modified the attached script to print the number of characters and there only were 261 of them. I think it would be best to have a new module that would have both the characters and a single function to cut (and justify) the text correctly. xieyanbo, are you interested to try that out? We are going to do RF 2.5.2 in the near future and getting this in is still possible.
Aug 16, 2010
#5 xieya...@gmail.com
Actually, that script print 261 range of wild characters, and 45647 is the total number. I have implement a prototype to replace east_asian_width function. The attachment generate_wild_chars.py output a module's source code, which include a function "is_wild_char". "is_wild_char(c)" have the same behaviors as "eaw(c) in 'WF'". You can do more optimize for it, but I think "is_wild_char" is good enough to work in our product. Have a try.
generate_wild_chars.py
1.7 KB   View   Download
wild_chars.py
4.4 KB   View   Download
Aug 23, 2010
Project Member #6 pekka.klarck
We try to get this into 2.5.2 which we must get out this week. No promises at this point, though.
Labels: -Target-2.6 Target-2.5.2
Aug 27, 2010
Project Member #7 pekka.klarck
Unfortunately we don't have time to get this into 2.5.2. =(
Labels: -Target-2.5.2 Target-2.6
Aug 31, 2010
Project Member #8 jpran...@gmail.com
(No comment was entered for this change.)
Owner: jussi.ao.malinen
Labels: -Target-2.6 Target-2.5.3
Aug 31, 2010
Project Member #9 jussi.ao...@gmail.com
This is now committed in r4005, r4006, and r4007. We also implemented check for combining characters that have width of 0. (This caused problems in mac, which uses NFD encoding for file names.) Now coming out in 2.5.3.

Thanks for the brilliant patch xieyanbo!
Status: Done
Aug 31, 2010
#10 xieya...@gmail.com
Great job, thanks to you guys!
Mar 21, 2012
#11 xieya...@gmail.com
The generate script and east asian chars list in this page are not correct, don't use it. The correct version is in  issue #1096 , use that.
Sign in to add a comment

Powered by Google Project Hosting