Creating XML documentsCreating a new XML documentThe newDocument method crate a new XML document. You then have to choose a default namespace if you want and then choose the root name of the document. System.out.println(XMLDoc.newDocument(true).addRoot("html").toString());gives: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <html/> Loading an existing XML documentThe from methods can load an XML document from any of the following types:
Example: URL yahooGeoCode = new URL("http://local.yahooapis.com/MapsService/V1/geocode?appid=YD-9G7bey8_JXxQP6rxl.fBFGgCdNjoDMACQA--&state=QC&country=CA&zip=H1W3B8");
System.out.println(XMLDoc.from(yahooGeoCode, true).toString());
System.out.println(XMLDoc.from(yahooGeoCode, true).getText("Result/City"));outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ResultSet xmlns="urn:yahoo:maps" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:yahoo:maps http://api.local.yahoo.com/MapsService/V1/GeocodeResponse.xsd">
<Result precision="zip">
<Latitude>45.543289</Latitude>
<Longitude>-73.543098</Longitude>
<Address/>
<City>Montreal</City>
<State>QC</State>
<Zip>H1W 3B8</Zip>
<Country>CA</Country>
</Result>
</ResultSet>
<!-- ws04.search.re2.yahoo.com uncompressed Tue Dec 9 13:39:12 PST 2008 -->
MontrealIgnoring namespacesAll creational methods XMLDoc.newDocument and XMLDoc.from requires a boolean attribute ignoreNamespaces. If this attribute is set to true, all namespaces in the document are ignored. This is really useful if you use XPath a lot since you can avoid prefixing all your XPath elements. Example: System.out.println(XMLDoc.newDocument(true)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));
System.out.println(XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <html/> <?xml version="1.0" encoding="UTF-8" standalone="no"?> <html xmlns="http://www.w3.org/2002/06/xhtml2/"/> Navigating in a document with namespaces using XPath is quite a pain: doc.gotoTag("ns0:body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("ns0:body").addCDATA("<\"!@#$%'^&*()>")whereas if you load the same document with ignoreNamespaces, you can simply navigate like this when you use XPath: doc.gotoTag("body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("body").addCDATA("<\"!@#$%'^&*()>")Using namespacesWhen you create or load a document, and if you decide to not ignore namespaces, you can add a default namespace for your document and add other ones after. Namespace management is quite a challenge, specifically when using XPath (see Navigate into XMLTag section). When you have an XMLTag instance, you have access to the following methods to manage namespaces in the document: Adding and retrieving namespaces and prefixesaddDefaultNamespace When you create an empty document, you can define a default namespace to use for the document. In example: XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html");will produce: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <html xmlns="http://www.w3.org/2002/06/xhtml2/"/> addNamespace When you obtained an XMLTag instance, you can add any namespace you want. In example: XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");will produce: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/">
<wicket:border xmlns:wicket="http://wicket.sourceforge.net/wicket-1.0"/>
<head/>
<other:foo xmlns:other="http://other-ns.com"/>
</html>Namespace prefix generation When you load an existing XML document, or when you define a default namespace in a new document, prefixes and namespaces are automatically found in the whole document. Often, XML documents have default namespace. This is often the case for example in XHTML documents, like below. For this case, XMLDoc will generate for you a prefix that you can use for XPath navigation, and register the namespace as being the default one. In example, the following document will have a default namespace and also a prefix generated to access it: ns0. <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title/>
</head>
<body/>
</html>XMLTag doc = XMLDoc.from(...);
assertEquals(doc.getPrefix("http://www.w3.org/1999/xhtml"), "ns0");
assertEquals(doc.getContext().getNamespaceURI("ns0"), "http://www.w3.org/1999/xhtml");The prefix 'ns0' has been generated in the namespace context of the document so that XPath expression can use it. You can access the javax.xml.namespace.NamespaceContext like this: NamespaceContext ctx = doc.getContext(); Prefix constraintsYou cannot override an already defined prefix in a context, and you cannot override default XML prefixes. The following 3 attempts will throw an exception: // these prefixes are reserved
XMLDoc.newDocument().addRoot("html").addNamespace("xml", "http://ns0");
XMLDoc.newDocument().addRoot("html").addNamespace("xmlns", "http://ns0");
// shows namespace generation prefix: when we add default prefix, 'ns0' is also created (or another if it already exists). So we cannot bind another namespace to this prefix.
XMLDoc.newDocument()
.addDefaultNamespace("http://def")
.addRoot("html")
.addNamespace("ns0", "http://ns0");XML elements operationsOn elementsOperations affecting elements: hasTag, addTag, getCurrentTag, getCurrentTagName, deleteChilds, delete, renameTo hasTag Check for the existence of a tag. addTag Create a new tag System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head")
.toString());outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head/>
</html>getCurrentTag Returns the current org.w3c.dom.Element. getCurrentTagName Returns the current tag name. System.out.println(XMLDoc.newDocument(true).addRoot("html").getCurrentTagName());outputs: html delete Deletes the current tag. The parent tag of the deleted tag becomes one the current tag. If we call delete on the root tag, an exception is thrown. Root node can only be renamed. System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head")
.delete()
.toString());outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <html/> deleteChilds Deletes all tags under the current tag. System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addTag("title")
.toString());
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addTag("title")
.gotoRoot().deleteChilds()
.toString());outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head>
<title/>
</head>
</html>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>renameTo Rename a tag to another name. System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.renameTo("xhtml")
.toString());outputs: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <html/> On attributesOperations affecting elements: hasAttribute, getAttributeNames, getAttribute, deleteAttributes, deleteAttribute Supposing we load the following XML file: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>my title</title>
</head>
<body>
<div id="header" class="banner"></div>
<div id="content" class="cool"></div>
<div id="footer" class="end"></div>
</body>
</html>hasAttribute Check for the existence of an attribute. getAttributeNames Returns a list of attribute names of the current tag. String[] names = XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.getAttributeNames();
System.out.println(Arrays.toString(names));outputs: [class, id] getAttribute Returns an attribute value of the current tag or the selected tag by the XPath expression. If the attribute does not exist, throws an exception. System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.getAttribute("class"));
System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.getAttribute("class", "body/div[2]"));outputs: banner cool deleteAttributes Deletes all attributes of a the current tag. System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.deleteAttributes()
.toString());<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>my title</title>
</head>
<body>
<div/>
<div class="cool" id="content"/>
<div class="end" id="footer"/>
</body>
</html>deleteAttribute Deletes a specific attribute. If it does not exist, an exception is thrown. System.out.println(XMLDoc.from(getClass().getResource("/test.xhtml"), true)
.hasAttribute("id", "body/div[1]"));
System.out.println(XMLDoc.from(getClass().getResource("/test.xhtml"), true)
.gotoTag("body/div[1]").deleteAttribute("id")
.hasAttribute("id"));true false On text and dataOperations affecting elements: addText, addCDATA, getAttribute, deleteAttributes, deleteAttribute addText addCDATA Adds text or CDATA sections to the document. As you have seen above, you can mix text, data and tags under one tag. When we add text or data, the current tag automatically becomes the parent tag. This behavior facilitate document creation since most of the time you will have to add one text or one data per tag like this: System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addText("<\"!@#$%'^&*()>")
.addTag("body").addCDATA("<\"!@#$%'^&*()>")
.toString());which gives: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head><"!@#$%'^&*()></head>
<body><![CDATA[<"!@#$%'^&*()>]]></body>
</html>getText getCDATA Returns the text or data contained in the current tag or the targetted tag with the XPath expression. If the tag has no text, returns "". Given: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:ns0="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title ns0:id="titleID">my special title: <"!@#$%'^&*()></title>
</head>
<body>
<![CDATA[my special data: ]]>
<ns0:border>
<div/>
child1
</ns0:border>
<ns0:border>child2</ns0:border>
<![CDATA[<"!@#$%'^&*()>]]>
<ns0:border>child3</ns0:border>
</body>
</html>The following assertions are true: assertEquals(doc.getCurrentTag().getNodeType(), Document.ELEMENT_NODE);
assertEquals(doc.getCurrentTagName(), "html");
assertEquals(doc.getCurrentTagName(), "html");
assertEquals(doc.getPefix("http://www.w3.org/2002/06/xhtml2/"), "ns1"); // ns0 is already used in the document
assertEquals(doc.gotoTag("ns1:head/ns1:title").getText(), "my special title: <\"!@#$%'^&*()>");
assertEquals(doc.getText("."), "my special title: <\"!@#$%'^&*()>");
assertEquals(doc.getCDATA("../../ns1:body"), "my special data: <\"!@#$%'^&*()>");
assertEquals(doc.getAttribute("ns0:id"), "titleID");NB: we loaded the document by not ignorign namespaces. That's why you see required ns prefixes in XPath expressions. Navigation, XPath and Callback supportRaw XPathYou can execute RAW XPath directly through Java Xpath API by using rawXpath methods:
GotosNavigation in the document is achieved by gotos methods gotoParent Returns to the parent tag, or remain to the root tag if we are already in the root tag. gotoRoot As it says, goes to the root tag. gotoChild Goes to the only existing child of a tag. It is just a useful method to traverse XML document from child to child when there are only one child per element. If you call this method when you are in a tag that does not contain exactly one child element, the method will throw an exception. gotoChild(int i) Goes to the Nth child of the current element. Index is from 1 up to child number, exactly like XPath array selection (childi) If the child at given position does not exist, an exception is thrown. gotoChild(String name) Goes to to the unique existing child element having given name. If there is no child with this name, or if there are more than one, an exception will be thrown. gotoTag(String relativeXpath, Object... arguments) Goes to to a tag element given an XPath expression. arguments is useful to parametrize the XPath expression with namespace prefixes for example. It uses String.format(). Remember when using XPath on a document with namespaces, you must always use prefixes even when the document has a default namespace. Example: Given: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:w="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title w:id="title"/>
</head>
<body>
<w:border>
<div/>
child1
</w:border>
<w:border>child2</w:border>
<w:border>child3</w:border>
</body>
</html>We can browse the above document like this: XMLTag doc = XMLDoc.from(getClass().getResource("/goto.xml"), false);
String ns = doc.getPefix("http://www.w3.org/2002/06/xhtml2/");
doc.gotoChild("head") // jump to the only 'head' tag under 'html'
.gotoChild() // jump to the only child of 'head'
.gotoRoot() // go to 'html'
.gotoChild(2) // go to child 'body'
.gotoChild(3) // go to third child 'w:border' having text 'child3'
.gotoRoot() // return to root
.gotoTag("%1$s:body/w:border[1]/%1$s:div", ns); // xpath navigation with namespaceNotice the Xpath expression when we use namespace: as we load an existing document, we can get generated prefix for a namespace with the getPrefix method. Then we can use this generated prefix in our XPath. %1$s means that we take the first argument provided (see String.format() documentation). If you debug, you will see that the XPath expression is ns0:body/w:border[1]/ns0:div. Callbacks on selected nodesCallbacks: forEach, forEachChilds XMLTool enables you to execute callback actions for each node selected or each child nodes. Example: If we take back the XHTML example: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>my title</title>
</head>
<body>
<div id="header" class="banner"></div>
<div id="content" class="cool"></div>
<div id="footer" class="end"></div>
</body>
</html>And we execute: XMLDoc.from(getClass().getResource("/test.xhtml"), true).forEachChild(new CallBack() {
public void execute(XMLTag doc) {
System.out.println(doc.getCurrentTagName());
}
});
XMLDoc.from(getClass().getResource("/test.xhtml"), true).forEach(new CallBack() {
public void execute(XMLTag doc) {
System.out.println(doc.getAttribute("id"));
}
}, "//div");We obtain: head body header content footer Converting your XML documentDocument conversion is done through to* methods. toDocument Converts to an org.w3c.dom.Document instance. toString toString(String encoding) Converts to a formatted string, optionally giving an encoding. toBytes Convert to a byte array toResult, toStream Converts to streams. Example: XMLDoc.newDocument(true).addRoot("html")
.toResult(new DOMResult())
.toStream(new StringWriter())
.toStream(new ByteArrayOutputStream());Validating your XML documentXML validation enables to validate current document against a shema. Of course, to use this functionnality you need to create a document that does not ignore namespaces. validate This method is used to validate the document against schemas. It returns a ValidationResult instance containing all warning and error issued during validation. Example: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:w="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title w:id="title"/>
</head>
<body>
<w:border>
<div/>
child1
</w:border>
<w:border>child2</w:border>
<w:border>child3</w:border>
</body>
</html>
If we validate the XML document goto.xml seen above: ValidationResult results = XMLDoc.from(getClass().getResource("/goto.xml")).validate(
new URL("http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"),
new URL("http://wicket.sourceforge.net/wicket-1.0.xsd")
);
assertFalse(results.hasError());If we validate the following document created by us below: results = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("htmlZZ")
.validate(new URL("http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"));
assertTrue(results.hasError());
System.out.println(Arrays.deepToString(results.getErrorMessages()));The output is: [cvc-elt.1: Cannot find the declaration of element 'htmlxxx'.] Exception handlingEach operation causing an exception throws a XMLDocumentException with a described message. Evolution, other methodsThe library keep evolving and has other severals methods like clone(), copy constructors, adding XMLTag in another XMLTag (either from the current tag or either add the whole documenent), ... 2009-03-04 XMLTag XMLDoc.fromCurrentTag(XMLTag tag, boolean ignoreNamespaces); XMLTag tag.getInnerDocument(); String tag.getInnerText(); |
The Link to the test/example code using raw XPath expressions dosen't work: http://code.google.com/p/xmltool/source/browse/trunk/src/test/java/com/google/code/xmltool/XMLDocXPathTest.java
Use this link instead: http://code.google.com/p/xmltool/source/browse/trunk/src/test/java/com/mycila/xmltool/XMLDocXPathTest.java