My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
XMLDog  
Using SAX to Sniff XML
SAX, XPath, Streaming
Updated Apr 30, 2012 by santhosh...@gmail.com

XMLDog is a dog that is trained to sniff xml documents.

We give set of xpaths to XMLDog and ask to sniff some xml document. It uses SAX and with one pass over the document it evaluates all the given xpaths.

Whether it is Xalan/XMLDog, first we need to define javax.xml.namespace.NamespaceContext. This interface defines the binding for prefix to uri.

import jlibs.xml.DefaultNamespaceContext;
import jlibs.xml.Namespaces;

DefaultNamespaceContext nsContext = new DefaultNamespaceContext(); // an implementation of javax.xml.namespace.NamespaceContext
nsContext.declarePrefix("xsd", Namespaces.URI_XSD);

Now create an instance of XMLDOG, and add the xpaths that need to be evaluated. Note that XMLDog can evaluate multiple xpaths in single SAX parse of given xml document.

import jlibs.xml.sax.dog.XMLDog;
import jlibs.xml.sax.dog.expr.Expression;

XMLDog dog = new XMLDog(nsContext);

Expression xpath1 = dog.addXPath("/xs:schema/@targetNamespace");
Expression xpath2 = dog.addXPath("/xs:schema/xs:complexType/@name");
Expression xpath3 = dog.addXPath("/xs:schema/xs:*/@name");

When you add xpath to XMLDog, it returns Expression object. This object is the compiled xpath.

you can get the original xpath from XPath using getXPath():

System.out.println(xpath1.getXPath()); // prints "/xs:schema/@targetNamespace"

you can ask Expression about its result type;

import javax.xml.namespace.QName;

QName resultType = xpath1.resultType.qname;
System.out.println(resultType); // prints "{http://www.w3.org/1999/XSL/Transform}NODESET"

The QName returned will be one of constants in javax.xml.xpath.XPathConstants.

To evaluate given xpaths on some xml document:

import jlibs.xml.sax.dog.XPathResults;

XPathResults results = dog.sniff(new InputSource("note.xml"));

XPathResults object will contain the results of all xpath evaluations.

to get result of particular xpath:

object result = results.getResult(xpath1);

The return type of getResult(XPath) will be java.lang.Object;

Depending on the XPath.resultType(), this result can be safely cased to a particular type.

Below is the actual result Type for each resultType returned by XPath:

XPath.resultType() result can be cast to
XPathConstants.STRING java.lang.String
XPathConstants.BOOLEAN java.lang.Boolean
XPathConstants.NUMBER java.lang.Double
XPathConstants.NODESET java.util.Collection<NodeItem>

NodeItem represens an xml node in xml document; NodeItem has following properties.

NodeItem.type:

returns type of xml node. will be one of following constants in NodeType:
COMMENT, PI, DOCUMENT, ELEMENT, ATTRIBUTE, NAMESPACE, TEXT;
NodeItem.location:
returns unique xpath to the xml node. ex: /xs:schema[1]/xs:complexType[1]/@name
the prefixes in this xpath can be resolved using results.getNamespaceContext()
this can be used to create DOM, in case you need it.
NodeItem.value, NodeItem.localName, NodeItem.namespaceURI, NodeItem.qualifiedName:
return value/localName/namespaceURI/qualifiedName of the xml node it represens.

NodeItem.toString() simply returns its location.

XPathResults has handy print method to print results to given java.io.PrintStream:

results.print(dog.getExpressions(), System.out);

will print:

XPath: /xs:schema/@targetNamespace
      1: /xs:schema[1]/@targetNamespace

XPath: /xs:schema/xs:complexType/@name
      1: /xs:schema[1]/xs:complexType[1]/@name

XPath: /xs:schema/xs:*/@name
      1: /xs:schema[1]/xs:element[1]/@name
      2: /xs:schema[1]/xs:element[2]/@name
      3: /xs:schema[1]/xs:element[3]/@name
      4: /xs:schema[1]/xs:element[4]/@name
      5: /xs:schema[1]/xs:complexType[1]/@name

Multi Threading:

XMLDog supports multi-hreading. You can add multiple xpaths once,
and sniff multiple documents with same XMLDog instance parallely;


XPath Support:

XMLDog supports subset of XPath 1.0;

Axises supported are:

  • self
  • child
  • descendant
  • descendant-or-self
  • following
  • following-sibling
  • attribute
  • namespace

Except id(), rest of the functions are supported. it supports predicates and all operators.

XMLDog will tell you clearly, if given xpath is not supprted; for example:

XPath xpath = dog.add("/xs:schema/../@targetNamespace", 1);

throws following exception:

java.lang.UnsupportedOperationException: unsupported axis: parent

This will be very useful. for example you can first try using XMLDog and if it throws UnsupportedOperationException,
then you can fallback to use DOM


DOM Results

By default XMLDog does not construct dom nodes for results.
You can configure for DOM results as follows:

import package jlibs.xml.sax.dog.sniff.Event;

Event event = dog.createEvent();
results = new XPathResults(event);
event.setListener(results);
event.setXMLBuilder(new DOMBuilder());
dog.sniff(event, new InputSource("note.xml"));

List<NodeItem> items = (List<NodeItem>)results.getResult(xpath1)

you can get the dom node for a given NodeItem as follows:

NodeItem item = ...
org.w3c.dom.Node domNode = (org.w3c.dom.Node)item.xml;

Note that, dom nodes are created only for portions of xml which are hit by xpaths.

Event.setXMLBuilder(...) takes an argument of type jlibs.xml.sax.dog.sniff.XMLBuilder.
So if you want JDom to be construction instead of DOM, write an implementation of XMLBuilder
which constructs JDom and use it.


Instant Results

XPathResults object holds results of all xpaths in memory. This might not be feasible always.

Let us say, you are searching employees.xml for employees with more that 5 years of experience.
if employees.xml has 10000 employees and there are more than 5000 employees who match this criteria.
Holding 5000 employees in memory may cause OutOfMemoryError.

To solve this problem, you register your own InstantEvaluationListener with Event. Then your listener
will be notified as soon as an employee with specified criteria is found. Thus you can process that employee
and discard it.

import jlibs.xml.sax.dog.expr.InstantEvaluationListener;

Event event = dog.createEvent();
event.setXMLBuilder(new DOMBuilder());
event.setListener(new InstantEvaluationListener(){
    @Override
    public void onNodeHit(Expression expression, NodeItem nodeItem){
        org.w3c.dom.Node node = (org.w3c.dom.Node)nodeItem.xml;
        System.out.println("XPath: "+expression.getXPath()+" has hit: "+node);
    }

    @Override
    public void finishedNodeSet(Expression expression){
        System.out.println("Finished Nodeset: "+expression.getXPath());
    }

    @Override
    public void onResult(Expression expression, Object result){
        // this method is called only for xpaths which returns primitive result
        // i.e result will be one of String, Boolean, Double
        System.out.println("XPath: "+expression.getXPath()+" result: "+result);
    }
});
dog.sniff(event, new InputSource("note.xml"), false/*useSTAX*/); // this version sniff method returns void

You can use variables and custom functions in xpath. For this you have to use following constructor:

import javax.xml.namespace.NamespaceContext;
import javax.xml.xpath.XPathVariableResolver;
import javax.xml.xpath.XPathFunctionResolver;

NamespaceContext nsContext = ...;
XPathVariableResolver variableResolver = ...;
XPathFunctionResolver functionResolver = ...;

XMLDog dog = new XMLDogContext(nsContext, variableResolver, functionResolver);

Note that functions are not supposed to expect arguments of type NodeSet


Command Line Utility

You can find xmldog.sh/xmldog.bat in $JLIBS_HOME/bin directory

This will be usefull to play with XMLDog with various xml documents/xpaths


Conformance

The XMLDog results conforms to the XPath-Spec. It is coverted by jlibs.xml.sax.dog.tests.XPathConformanceTest

You can look here, to see the type of xpaths it has been tested.

You can find xmldog-conformance.sh/xmldog-conformance.bat in jlibs installation,


Performance:

You can find xmldog-performance.sh/xmldog-performance.bat in jlibs installation,
which can be used to test XMLDog perfomace against Xalan that is shipped with JDK.
This test reads config from xpaths.xml from jlibs installation.

Here is sample output of this performance test;

Average Execution Time over 20 runs:
--------------------------------------------------------------------------------
                                   File | XPaths XMLDog  SAXON   Diff Percentage
--------------------------------------------------------------------------------
            resources/xmlFiles/note.xml |    290     35     84    -49   -2.42
          resources/xmlFiles/simple.xml |     29      4     13     -8   -3.04
       resources/xmlFiles/positions.xml |    110     16     27    -10   -1.64
          resources/xmlFiles/sample.xml |   2197    195    176     18   +1.11
         resources/xmlFiles/sample1.xml |   2197     25     76    -51   -3.04
         resources/xmlFiles/sample2.xml |   2197     29     77    -47   -2.59
         resources/xmlFiles/sample3.xml |   2197     28     72    -44   -2.53
         resources/xmlFiles/numbers.xml |     83      1      3     -2   -2.22
      resources/xmlFiles/underscore.xml |     80      2      3     -1   -1.41
        resources/xmlFiles/contents.xml |    160      4      9     -4   -1.99
              resources/xmlFiles/pi.xml |     31      0      1     -1   -2.63
        resources/xmlFiles/evaluate.xml |     40      1      2      0   -1.32
             resources/xmlFiles/web.xml |    431      7     23    -16   -3.30
            resources/xmlFiles/fibo.xml |     94      4     12     -7   -2.58
resources/xmlFiles/defaultNamespace.xml |     80      0      2     -1   -3.28
      resources/xmlFiles/namespaces.xml |    150      3      6     -3   -1.78
            resources/xmlFiles/text.xml |     35      0      1      0   -2.16
    resources/xmlFiles/organization.xml |    110      4      6     -2   -1.53
        resources/xmlFiles/moreover.xml |    130     10     16     -6   -1.61
              resources/xmlFiles/id.xml |     40      0      2     -1   -3.45
        resources/xmlFiles/much_ado.xml |     78     14     17     -3   -1.24
             resources/xmlFiles/sum.xml |     17      0      1     -1   -5.00
  resources/xmlFiles/purchase_order.xml |    510      7     18    -11   -2.61
            resources/xmlFiles/roof.xml |     20      0      1     -1   -2.64
            resources/xmlFiles/nitf.xml |     60      1      3     -1   -2.28
         resources/xmlFiles/message.xml |     10      0      1     -1   -3.42
            resources/xmlFiles/lang.xml |     80      1      3     -2   -2.96
  resources/xmlFiles/testNamespaces.xml |     22      0      1     -1   -3.65
            resources/xmlFiles/test.xml |     20      0      1     -1   -3.06
          resources/xmlFiles/jaxen3.xml |     10      0      1      0   -2.73
         resources/xmlFiles/jaxen24.xml |     30      0      1      0   -3.01
             resources/xmlFiles/pi2.xml |     10      0      1     -1   -5.00
         resources/xmlFiles/library.xml |     20      1      1      0   -1.17
            resources/xmlFiles/axis.xml |     32      0      1      0   -2.30
               resources/xmlFiles/t.xml |     10      0      0      0   -2.58
--------------------------------------------------------------------------------
                                  Total |  11610    410    683   -273   -1.67

It shows that XMLDog is faster than Saxon9(1.67 times).

The source code of testcase is here


Future:

  • ability to specify min number of items required in NODESET

I am looking forward to know, who are interested in XMLDog, and why/where you are using. This will give me some boost-up to add more features. Because it takes most of my free time.

Your comments are welcomed;

Comment by bchang2...@gmail.com, Mar 26, 2010

Have you looked at vtd-xml? It is the fastest!!

Comment by mros...@gmail.com, Nov 30, 2010

Hi, we are evaluating XMLDog as a fast xpath evaluator, in order to use it in an esb as a way to implement content based routing of web services, we need a fast evaluation of xpath to minimize the overhead time added by the esb.

Regards, Martin

Comment by project member santhosh...@gmail.com, Dec 1, 2010

Hi Martin,

Good to hear this. Let me know if you need any help. Your feedback will be helpful in improve further...

Comment by thebluem...@gmail.com, Feb 14, 2011

hi, i am currently trying to perform updates on (very large) xml documents being read on the fly (therefore through a streamsource) before writing them (to an output stream).

the elements to update are defined by xpath expressions (obviously a subset of xpath that would require access to parent, ancestor and forward axes only) the update would typically be performed using custom function (embedding a statefull object). do you know whether your library could be used to perform such processing ?

Comment by project member santhosh...@gmail.com, Feb 15, 2011

You can get a notification as soon as a particular xpath result is evaluated. This can be done as folllows:

Expression expr = xmldog.addXPath(xpathStr);
Event event = xmldog.createEvent();
event.addListener(new EvaluationListener(){
    @Override
    public void finished(Evaluation evaluation){
        Expression expr = evaluation.expression;
        Object result = evaluation.getResult();
        System.out.println("Result is: "+result);
    }
});
xmldog.sniff(event, inputSource);

for some xpaths, the result might be notified after the actual node is passed. for example: If the xpath is "/ab?" then when result is notified, you will be either in <b> startElement or <a> endElement.

if your xpaths don't need forward lookup, then you can use it to update the xml in streaming fashion.

Comment by john.ala...@gmail.com, Apr 19, 2011

Hi Santhosh, I needed to get the result of an xpath which returns a node either as an XMLElement object or an XML string but the node objects only return location paths strings. I have tried to modify your code to enable this but beyond the Event class, which seems to hold the current element data, I couldn't understand your classes. How the xpath query is able to produce an accurate location string. The one from the Event class doesn't seem to go beyond the current class. Regards, John.

Comment by project member santhosh...@gmail.com, Apr 19, 2011

This can be done using NodeSetListener?.

NodeSetListener?.mayHit() is called if the current sax event might be possible outcome of xpath expression to which it is attached. You can start populating XML objects after this call. if the xpath engine finds that it is not hit, then NodeSetListener?.discard() is called. It is not so straight forward to implement this. currently NodeSetListener? is used internally. and saxevents has to be multicasted so that xml object can be populated. I will try this at my end and let you know the status...

Comment by asankha....@gmail.com, Apr 20, 2011

I am executing the default bin/xmldog.sh script with the 3rd party dependencies downloaded. However, I do not get a result as expected from XMLDog

asankha@asankha:~/java/jlibs/bin$ ./xmldog.sh /home/asankha/code/XMLPerf/src/test/resources/test1.xml Namespaces: soapenv = http://schemas.xmlsoap.org/soap/envelope/ z = http://somez y = http://someothery y1 = http://somey x = http://somex m = http://services.samples/xsd

XPaths: //order1?/symbol

| XPath-Results |

XPath: //order1?/symbol

  1. /soapenv:Envelope1?/soapenv:Body1?/m:buyStocks1?/order1?/symbol1?

Evaluated in 40 milliseconds

The XML input file is: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:z="http://somez" xmlns:y="http://someothery"> <soapenv:Header xmlns:y="http://somey">

<x:one xmlns:x="http://somex">eka</x:one> <y:two>deka</y:two> <z:three>thuna</z:three> <four>hathara</four>
</soapenv:Header> <soapenv:Body> <m:buyStocks xmlns:m="http://services.samples/xsd"> <order><symbol>IBM</symbol><buyerID>asankha</buyerID><price>140.34</price><volume>2000</volume></order> <order><symbol>MSFT</symbol><buyerID>ruwan</buyerID><price>23.56</price><volume>8030</volume></order> <order><symbol>SUN</symbol><buyerID>indika</buyerID><price>14.56</price><volume>500</volume></order> <order><symbol>GOOG</symbol><buyerID>chathura</buyerID><price>60.24</price><volume>40000</volume></order> <order><symbol>IBM</symbol><buyerID>asankha</buyerID><price>140.34</price><volume>2000</volume></order> <order><symbol>MSFT</symbol><buyerID>ruwan</buyerID><price>23.56</price><volume>803000</volume></order> <order><symbol>SUN</symbol><buyerID>indika</buyerID><price>14.56</price><volume>50000</volume></order> <order><symbol>GOOG</symbol><buyerID>saliya</buyerID><price>60.24</price><volume>400000</volume></order> </m:buyStocks> </soapenv:Body> </soapenv:Envelope>

Comment by project member santhosh...@gmail.com, Apr 20, 2011

NodeItem?.value will be non-null only for NodeTypes? COMMENT, PI, ATTRIBUTE, NAMESPACE, TEXT i.e NodeItem?.value will be for NodeTypes? DOCUMENT, ELEMENT

XMLDog doesn't create dom for the elements or document if they are hit. I am currently working on creating partial dom nodes (i,e create dom nodes only for those which are results of xpaths)

So if you are trying to evaluate xpaths whose resulting nodes are elements, as results you will get only the exact location of elements hit (not the entire element data)

Comment by asankha....@gmail.com, Apr 20, 2011

So what should I do to obtain the element's text data - which is what I am after? If XMLDog parses the XML once, I'd like it to be able to give me the result too from this single parse

Comment by project member santhosh...@gmail.com, Apr 20, 2011

ashank,

use xpath: //order1?/symbol/text()

Comment by antonin....@gmail.com, Jun 7, 2011

Hi M. Santhosh Kumar,

I am trying to parse a GPX file of about 150Mo. I would like not to put in memory all these Mo. I understood that your XMLDog would parse such files using a streaming way (ie without using memory). So I made a little program but I get a « java.lang.OutOfMemoryError? ».

Can you tell me where the problem is ?

I give you the code.

public class XMLDogParser {

        public static void main(String[] s) throws Exception {

                boolean useSTAX = true;
                String file = "/file.gpx";
                DefaultNamespaceContext nsContext = new DefaultNamespaceContext();
                nsContext.declarePrefix(Namespaces.URI_XSI);
                DefaultNamespaceContext resultNSContext = new DefaultNamespaceContext();
                List<Object> dogResult;
                dogResult = new ArrayList<Object>(3);
                XMLDog dog = new XMLDog(nsContext);
                Expression xpath1 = dog.addXPath("/gpx/wpt");
                Expression xpath2 = dog.addXPath("/gpx/rte/rtept");
                Expression xpath3 = dog.addXPath("gpx/trk/trkseg/trkpt");

                Event event = dog.createEvent();
                event.setXMLBuilder(new DOMBuilder());
                XPathResults dogResults = new XPathResults(event, dog.getDocumentXPathsCount(), dog.getXPaths());
                System.out.println("Initialisation succeded.");
                dog.sniff(event, file, useSTAX);
                System.out.println("'snif' succeded.");
                dogResult.add(dogResults.getResult(xpath1));
                dogResult.add(dogResults.getResult(xpath2));
                dogResult.add(dogResults.getResult(xpath3));

                System.out.println("Result : " + dogResult);
                System.out.println("End of program.");
        }
}

Moreover, XMLDog doesn't succed to “sniff” the file because of the content of the first markup. The first markup is :

<gpx 
version="1.1" 
creator="CartoExploreur 3 3.20" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns="http://www.topografix.com/GPX/1/1" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">

In fact, the bug is generated by only one line :

xmlns="http://www.topografix.com/GPX/1/1"

Have you an idea to solve my problem ?

Thank you in advance, Antonin

Comment by project member santhosh...@gmail.com, Jun 7, 2011

comment the following line and try. this might solve OutOfMemoryError?:

event.setXMLBuilder(new DOMBuilder());

what is the error you got for line:

xmlns="http://www.topografix.com/GPX/1/1"

can u post the stracktrace

Comment by antonin....@gmail.com, Jun 8, 2011

When I comment the line, it don't get the OutOfMemoryError?.

But then I do, the 'dogresult' is different : I can't access to the data contained in the markup.

For instance,

- before I comment the line, I got something like that (test on a little GPX file) :

Result : [[/gpx[1]/wpt[1]
<wpt xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" lat="48.8345" lon="2.365">
        <ele>0.0</ele>
        <name>Wpt 0</name>
    </wpt>
, /gpx[1]/wpt[2]
<wpt xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" lat="48.865" lon="2.465">
        <ele>0.0</ele>
        <name>Wpt 1</name>
    </wpt>
, /gpx[1]/wpt[3]
...
...
- then I comment the line, I get this :
Result : [[/gpx[1]/wpt[1], /gpx[1]/wpt[2], /gpx[1]/wpt[3], ...

Is it possible to get the same informations without using the line ?

event.setXMLBuilder(new DOMBuilder());

When the GPX file contains the line

xmlns="http://www.topografix.com/GPX/1/1"

I don't get an error. I only get nothing, as if the GPX file were empty. I get :

Result : [[], [], []]
Comment by project member santhosh...@gmail.com, Jun 9, 2011

> I don't get an error. I only get nothing, as if the GPX file were empty

do the following:

nsContext.declarePrefix("ns", "http://www.topografix.com/GPX/1/1");

and then change the xpaths as below:

Expression xpath1 = dog.addXPath("/ns:gpx/ns:wpt");
Expression xpath2 = dog.addXPath("/ns:gpx/ns:rte/ns:rtept");
Expression xpath3 = dog.addXPath("ns:gpx/ns:trk/ns:trkseg/ns:trkpt");
Comment by project member santhosh...@gmail.com, Jun 9, 2011

regarding OutOfMemory?,

it is caused by dom node creation. the xpaths are hitting lots of dom elements.

currently XMLDog, doesn't support notifying intermediate results. i.e let us say: /gpx/wpt hits 1000 elements. xmldog will create 1000 dom elements and then give you the result. If xmldog supports intermediate results, then as each element is hit, it can give you the dom element for that. then you can process it and discard. This will give possibility of getting huge results without OutOfMemory? issue.

I can try if notifying intermediate results is possible on week end and let you know the status...

Comment by antonin....@gmail.com, Jun 10, 2011

Thank you very much, I changed the Xpaths expressions and the prefix declaration as you said and it works now.

It would be great if you can see if notifying intermediate results is possible.

Best regards,

Antonin

Comment by yaa...@gmail.com, Aug 20, 2011

Is there any way to iterate over the matches instead of buffering them into these result objects? The XML document I'm dealing with is large and has many matching nodes.

Comment by project member santhosh...@gmail.com, Aug 25, 2011

No. Currently Iterating over results without bufferring is not supported

Comment by project member santhosh...@gmail.com, Aug 27, 2011

Hi Antonin,

with revision@1570 XMLDog supports notifying intermediate results without buffering. So now it can evaluate dom results for large documents with less memory.

you can see how to configure XMLDog for intermediate results in XMLDogTest.java

Comment by fresherf...@gmail.com, Oct 5, 2011

Line 1 : <?xml version="1.0"?>

Line 2 : <catalog>

Line 3 : <book id="bk101">

Line 4 : <author>Gambardella, Matthew</author>

Line 5 : <title>XML Developer's Guide</title>

Line 6 : <genre>Computer</genre>

Line 7 : <price>44.95</price>

Line 8 : <publish_date>2000-10-01</publish_date>

Line 9 : <description>An in-depth look at creating applications

Line 10 : with XML.</description>

Line 11 : </book>

Line 12 : </catalog>

how can I use XMLDog to extract Line 3 to Line 11 from above XML ( i.e book section ).

Comment by project member santhosh...@gmail.com, Oct 5, 2011

try using path: /catalog/book[id='bk101']

Comment by yur...@gmail.com, Dec 5, 2011

Hi Santosh,

Great stuff. I was looking for a single pass XPath engine and found yours here. Almost everything works greate except this kind of construct:

Let's say I have xml:

<Root>
<Text>abc</Text> 
<Number>123</Number>
</Root>

Now I want to select one of /Root/Text or /Root/Number whichever comes first. Normally I would do that using the following XPath:

(/Root/Text/text() | /Root/Number/text()) [[1]

Using standard DOM-based XPathFactory I get single result as expected which is String 'abc'

Using event-based XMLDog however, I am getting two calls of onNodeHit() one for /Root/Text another for /Root/Number. This is not expected and not correct. Instead onNodeHit() should be called only once for the first position in the context of the temporary result node list that is {'abc', '123'}[1] = 'abc' .

Also Expression argument passed to the onNodeHit should match one of the objects returned by XMLDog.addXPath() whereas currently for this kind of expression it does not. So there is no way to figure out which requested XPath is evaluated.

Thanks!

Comment by project member santhosh...@gmail.com, Dec 6, 2011

Hi yurgis,

Yes. the xpath that you specified is giving wrong results. will verify and let you know.

thanks santhosh

Comment by project member santhosh...@gmail.com, Dec 6, 2011

Hi yurgis,

you can use the following workaround till it gets resolved:

listener = new InstantEvaluationListener(){
    int nodeCounts[] = new int[dog.getDocumentXPathsCount()];
    @Override
    public void onNodeHit(Expression expression, NodeItem nodeItem){
        if(expression.getXPath()==null)
            return;
        // now we can use the result    
    }

    @Override
    public void finished(Evaluation evaluation){
        if(evaluation.expression.getXPath()==null)
            return;
        Object result = evaluation.getResult();
        if(printResults){
            if(result!=null){
                if(result instanceof List){
                    for(NodeItem nodeItem: (List<NodeItem>)result)
                        onNodeHit(evaluation.expression, nodeItem);
                }else{
                    // we reach here if xpath result is not dom node
                    // now we can use result
                }
            }
        }
    }
};

use the above implementation of listener. you can place your code where you can see the comment "now we can use result";

Let me know if this works for you.

Comment by yur...@gmail.com, Dec 6, 2011

Fantastic! The workaround worked for me.

Comment by project member santhosh...@gmail.com, Dec 7, 2011

Hi Yurgis,

the mentioned issue is fixed with revision 1620;

now finished(Evaluation evaluation) in InstantEvaluationListener? is made final;

instead following two new abstract methods are introduced:

public abstract void finishedNodeSet(Expression expression); public abstract void onResult(Expression expression, Object result);

Comment by urg...@gmail.com, Jan 23, 2012

Any plans to support XPath 2.0 ?

Comment by surko...@gmail.com, Jan 24, 2012

I found out that incorrect XML (without closed tag like "<root>") can be parsed by XMLDog without any exception. Is it correct?

Comment by project member santhosh...@gmail.com, Jan 25, 2012

> Any plans to support XPath 2.0 ?

No. Frankly speaking I haven't used XPath 2.0 yet. No plans in near future...

Comment by project member santhosh...@gmail.com, Jan 25, 2012

> > Any plans to support XPath 2.0 ? XMLDog supports for loop which somewhat mimics path 2.0 see XMLDog.forEach(String forEach, String xpath)

this feature is not yet documented in this wiki page

Comment by rohit.sa...@wisethink.in, Feb 7, 2012

Hi , I am using this with InstantEvaluationListener?

but I am not getting any result for any Xpath

my xml is as :

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="MeasDataCollection?.xsl"?> <!DOCTYPE mdc SYSTEM "MeasDataCollection?.dtd"> <mdc xmlns:HTML="http://www.w3.org/TR/REC-xml"> <mfh> <ffv>32.401 V6.2</ffv> <sn>SubNetwork?=ONRM_RootMo?_R,SubNetwork?=RABD201,MeContext?=RABD201</sn> <st></st> <vn></vn> <cbt>20110714171500Z</cbt> </mfh>

in MeasDataCollection?.xsl as

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

please guide me how to get value for xpaths?

Comment by project member santhosh...@gmail.com, Feb 7, 2012

Hi Rohit Saxena,

XMLDog is not an XSLT engine. It seems you are expecting xslt transformation to be performed because your input xml has: <?xml-stylesheet type="text/xsl" href="MeasDataCollection??.xsl"?>

but that is not the case.

Comment by karl...@gmail.com, Apr 30, 2012

Looks like your example for DOM parsing isn't up to date. It goes more like this:

Expression xpath1 = dog.addXPath("/path"); Event event = dog.createEvent(); results = new XPathResults(event); event.setXMLBuilder(new DOMBuilder()); event.setListener(results); dog.sniff(event, new InputSource?("note.xml")); List<NodeItem> items = (List<NodeItem>)results.getResult(xpath1);

Also note: if you add more xpaths to the dog after creating the event it will throw ArrayIndexOutOfBoundsException? when parsing because the results array doesn't grow after creation (Bug?)

Comment by project member santhosh...@gmail.com, Apr 30, 2012

thanks karlkfi,

corrected the DOM parsing code.

regarding ArrayIndexOutOfBoundsException?, you have to create event only after adding all xpaths; because Event class allocates objects based on xpaths added in xmldog.

Comment by naveen.m...@gmail.com, Apr 30, 2012

hi Santosh, i have the below XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE PurchaseOrderStatusNotification? SYSTEM "MS_V02_02_PurchaseOrderStatusNotification?.dtd"> <PurchaseOrderStatusNotification>

<fromRole> Role1 </fromRole>
</PurchaseOrderStatusNotification>

while applying xpath i will not be having the dtd file , and by default the xpath enging will look for dtd file.

Generally in a DOM model we used entity resolver to avoid this , is there way with XMLDog in SAX way to implement the same ?

Comment by project member santhosh...@gmail.com, Apr 30, 2012

currently there is no api for this.

you can modify SAXEngine class with your entity resolver implementation.

I will be working to provide a way to set entity resolver.

Comment by naveen.m...@gmail.com, Apr 30, 2012

Thanks also, does this allow to have compressed (Zip/Gzip) as input file and apply Xpath without opening them in memory

Comment by project member santhosh...@gmail.com, Apr 30, 2012

you can use ZipInputStream?/GZipInputStream for that

Comment by naveen.m...@gmail.com, Apr 30, 2012

Hi Santosh, need some help for the earlier problem i modified as below SAXUtil.newSAXParser(true, false, false).parse(new InputSource?(file),

new DefaultHandler?() {
public org.xml.sax.InputSource? resolveEntity(
String publicId, String systemId) throws org.xml.sax.SAXException, java.io.IOException {
System.out.println("Ignoring: " + publicId + ", "
+ systemId);
return new org.xml.sax.InputSource?(
new java.io.StringReader?(""));
}
});

final XMLDog dog = new XMLDog(nsContext, null, null);

Also, in SAXEngine class public void start(InputSource? is) throws XPathException {

SAXParser parser = null; try {
parser = getParser();

// parser.parse(is, this); parser.parse(is, new org.xml.sax.helpers.DefaultHandler?() {
public org.xml.sax.InputSource? resolveEntity(String publicId,
String systemId) throws org.xml.sax.SAXException, java.io.IOException {
System.out.println("Ignoring: " + publicId + ", "
+ systemId);
return new org.xml.sax.InputSource?(
new java.io.StringReader?(""));
}
});
} catch (Exception ex) {
if (ex != Event.STOP_PARSING)
throw new XPathException(ex);
} finally {
if (parser != null)
parser.reset();
}

With this change, Error seem to have vanished but i dont see nodeitem output . if i make parser.parse(is,this) then it works .. other wise fails is there some i'm messing with your code base.

Comment by project member santhosh...@gmail.com, Apr 30, 2012

you should download the jlibs sources and add following method to SAXEngine.java:

public org.xml.sax.InputSource resolveEntity(String publicId, String systemId) throws org.xml.sax.SAXException,java.io.IOException {
    System.out.println("Ignoring: " + publicId + ", " + systemId);
    return new org.xml.sax.InputSource(new java.io.StringReader(""));
}

Sign in to add a comment
Powered by Google Project Hosting