What steps will reproduce the problem? 1. try the basic demo from https://code.google.com/p/gwtwiki/wiki/MediaWikiDumpSupport#Example_how_to_print_the_wiki_articles_title_and_raw_text
What is the expected output? What do you see instead? expected: title and text for all articles instead: Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 30; columnNumber: 1; XML-Dokumentstrukturen müssen innerhalb derselben Entität beginnen und enden. (translates rought to document must start and end with the same entity) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(XMLDocumentFragmentScannerImpl.java:865) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(XMLDocumentScannerImpl.java:564) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(XMLEntityManager.java:1355) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1774) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1426) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2754) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210) at info.bliki.wiki.dump.WikiXMLParser.parse(WikiXMLParser.java:221) at sd.wikitest.App.main(App.java:54)
What version of the product are you using? On what operating system? 3.0.20-snapshot
Please provide any additional information below. can be fixed by using another constructor: Reader reader = new InputStreamReader( new BZip2CompressorInputStream( new FileInputStream( bz2Filename ), true ) ); WikiXMLParser wxp = new WikiXMLParser(reader, handler);
Comment #1
Posted on May 25, 2013 by Happy Dog(No comment was entered for this change.)
Comment #2
Posted on May 25, 2013 by Happy DogI commited r9056 and r9057. If the dump file has extension ".bz2" new BZip2CompressorInputStream( new FileInputStream( bz2Filename ), true ) is used.
Status: Accepted
Labels:
Type-Defect
Priority-Medium