My favorites | Sign in
Project Home
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 43: TouchXML have problem with xml substring using ISO-8859-1 (Latin1) encoding
4 people starred this issue and may be notified of changes. Back to list
Status:  Invalid
Owner:  ----
Closed:  Apr 2009


 
Reported by baro...@wanadoo.fr, Jan 12, 2009
I use TouchXML to parse xml string from a french web site. This site use utf8 encoding but some 
of the xml string are formed with substring using ISO-8859-1 encoding

for example if i use:

CXMLElement *aElement=[[[document rootElement] childAtIndex:0] childAtIndex:0];
	
and [aElement stringValue] 

i got:  Détail d'un article
instead of: Détail d'un article

using CFShow i got:

D\u00c3\u00a9tail d'un article

Using NSXMLDocument i got an utf8 string with no problem



Feb 3, 2009
#1 sunil....@gmail.com
I do have the same issue whilst parsing ISO-8859-1, (eg. any bbc RSS feeds) any 
thoughts?

S
Feb 11, 2009
#3 mathias....@gmail.com
I've been struggling with the same problem for a while now. Best workaround I could find was to.. 
setShouldResolveExternalEntities:YES
and add this method to the delegate..
- (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName:(NSString *)eName systemID:(NSString 
*)sID
{
	NSString *result;

	if ([eName isEqualToString:@"acute"]) // example - XHTML encoded char é as in the end of café
	{
		result = [NSString stringWithString:@"é"];
	} else
	{
		NSLog(@"Unresolved entity in EniroPoIXMLParserDelegate: ", eName);
		result = [NSString stringWithString:@""];  // Empty or any character you want to return in this case..
	}

	return [result dataUsingEncoding:NSUTF8StringEncoding];
}

I also had problems receiving encoded chars such as \\u00xx which I had to replace before parsing (I'm 
assuming there was a double encoding made on the server side)..
[aString replaceOccurrencesOfString:@"\\u00E5" withString:@"å" options:0 range:NSMakeRange(0, [aString 
length])];

These solutions seem crazy to me but despite searching for solutions I could find no other way. I even tried 
adding headers to the XML file before parsing, such as "<?xml version=\"1.0\" encoding=\"UTF-8\" ?> or 
ISO-8859-1 but no luck.


Feb 11, 2009
#4 mathias....@gmail.com
Just to clarify the above - this is a general issue for parsing based on NSXMLParser where I encountered the 
problem. I am assuming the issue is the same for TouchXML.
Feb 28, 2009
Project Member #5 jwight
Can you please provide links to the XML files in question?
Apr 18, 2009
Project Member #6 jwight
Closing this bug as no test data was provided. Please reopen or create a new ticket with ALL data required for 
testing. 
Status: Invalid

Powered by Google Project Hosting