My favorites | Sign in
Project Home Downloads Wiki Issues Source
Repository:
Checkout   Browse   Changes   Clones  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336

.. _faq:

Frequently Asked Questions
==========================

* :ref:`Why do I get "Cannot parse regexp..."? <faq_regexp>`
* :ref:`Why isn't my parser matching the full expression? <faq_lefttoright>`
* :ref:`Why does using Or() stop a full match from happening? <faq_or_bug>`
* :ref:`How do I parse an entire file? <faq_file>`
* :ref:`When I change from > to >> my function isn't called <faq_precedence>`
* :ref:`How do I choose between > and >> ? <faq_apply>`
* :ref:`Why am I seeing "No handlers could be found..." messages? <faq_logging>`
* :ref:`Why does my matcher take so long to compile? <faq_slowregexp>`


.. _faq_regexp:

Why do I get "Cannot parse regexp..."?
--------------------------------------

*Why do I get "Cannot parse regexp '(' using ..." for Token('(')?*

String arguments to `Token() <api/redirect.html#lepl.lexer.matchers.Token>`_
are treated as regular expressions. Because ``(`` has a special meaning in a
regular expression you must escape it, like this: ``Token('\\(')``, or like
this: ``Token(r'\(')``


.. _faq_lefttoright:

Why isn't my parser matching the full expression?
-------------------------------------------------

*In the code below*::

word = Token('[a-z]+')
lpar = Token('\\(')
rpar = Token('\\)')
expression = word | (word & lpar & word & rpar)

*why does expression.parse('hello(world)') match just 'hello'*?

In general Lepl is greedy (it tries to matches the longest possible string),
but for `Or() <api/redirect.html#lepl.matchers.combine.Or>`_ it will try alternatives left-to-right. So in this case you
should rewrite the parser as::

expression = (word & lpar & word & rpar) | word

Alternatively, you can force the parser to match the entire input by ending
with `Eos() <api/redirect.html#lepl.matchers.core.Eos>`_::

expression = word | (word & lpar & word & rpar)
complete = expression & Eos()

See also the next answer.


.. _faq_or_bug:

Why does using Or() stop a full match from happening?
-----------------------------------------------------

*Why does this code*::

>>> matcher = Letter() | "ab"
>>> matcher.parse("a")
['a']
>>> matcher.parse("ab")
lepl.stream.maxdepth.FullFirstMatchException: The match failed at 'b'

*fail?*

OK, so this behaviour does seem odd, I agree. But it's a logical consequence
of some other design decisions, all of which seem individually reasonable. So
I'll explain those and, hopefully, that will shed some light on this.

#. `Or() <api/redirect.html#lepl.matchers.combine.Or>`_ is not greedy

Repetition in Lepl is greedy by default, but Or() isn't. If it can match
the first option, it will do so. But it will try other possibilities if
that fails, or if all possible parses are requested.

This is because there is no way to predict which option will return "most".
So if `Or() <api/redirect.html#lepl.matchers.combine.Or>`_ were greedy it
would need to evaluate every possible option, measure them, and return the
"largest". This could require a lot of memory and time. Instead, it
returns the first match it finds, but then supports backtracking.

(Note that this is similar to regular expressions, except that Perl regexps
are even worse - they don't backtrack).

If that's not what you want there is, fortunately, a solution. Please read
on...

#. Lepl doesn't force you to match the entire input

The "fundamental" parsing operation in Lepl is `matcher.match() <api/redirect.html#lepl.core.config.ParserMixin.match>`_. This returns a
list of pairs. Each pair combines a result list with `the remaining
input`. There's nothing in that that says you need to match the entire
input, because that's not the most general behaviour.

For example::

>>> matcher = Letter() | "ab"
>>> matcher.config.no_full_first_match()
>>> matcher.match("ab")
<generator object trampoline at 0x916640>
>>> list(matcher.match("ab"))
[([u'a'], (1, <helper>)), (['ab'], (2, <helper>))]

Here you can see, in detail, what Lepl is doing. The ``(n, <helper>)``
values are the remaining input, from index 1 and 2 respectively.

If you *want* to match the whole input you can add `Eos() <api/redirect.html#lepl.matchers.core.Eos>`_ to the matcher::

>>> matcher = (Letter() | "ab") & Eos()
>>> list(matcher.match("ab"))
[(['ab'], ''[0:])]

#. The "full first match" implementation is very simple. It checks the
remaining stream (see above) for the first match. If it is not empty, then
the error is raised.

Why didn't I make this also add `Eos() <api/redirect.html#lepl.matchers.core.Eos>`_? I could have done so, and
then I wouldn't have had to write this explanation, but it would have meant
adding more "magic" to the configuration system. I did start to do this,
but then I realised that *disabling the check could change the parse
results*. And I think that's a worse problem than the current (imperfect)
compromise.

In summary then, this is a consequence of the way `Or() <api/redirect.html#lepl.matchers.combine.Or>`_ works (for efficiency), and
the way that Lepl does backtracking (for generality) and a desire to keep the
"full first match" code separate from "what the parser matches". I know it's
a little confusing at first, but I don't see a better solution. Sorry!

See also the previous answer.


.. _faq_file:

How do I parse an entire file?
------------------------------

*I understand how to parse a string, but how do I parse an entire file?*

Instead of `matcher.parse() <api/redirect.html#lepl.core.config.ParserMixin.parse>`_ or
`matcher.parse_string() <api/redirect.html#lepl.core.config.ParserMixin.parse_string>`_ use
`matcher.parse_file() <api/redirect.html#lepl.core.config.ParserMixin.parse_file>`_. For example:

>>> with open('myfile') as input:
... return matcher.parse_file(input)

Matchers extend `ParserMixin() <api/redirect.html#lepl.core.config.ParserMixin>`_, which provides these
methods.


.. _faq_precedence:

When I change from > to >> my function isn't called
---------------------------------------------------

*Why, when I change my code from*::

inverted = Drop('[^') & interval[1:] & Drop(']') > invert

*to*::

inverted = Drop('[^') & interval[1:] & Drop(']') >> invert

*is the `invert` function no longer called?*

This is because of operator precedence. ``>>`` binds more tightly than ``>``,
so ``>>`` is applied only to the result from `Drop(']')
<api/redirect.html#lepl.matchers.derived.Drop>`_, which is an empty list
(because `Drop() <api/redirect.html#lepl.matchers.derived.Drop>`_ discards the
results). Since the list is empty, the function ``invert`` is not called.

To fix this place the entire expression in parentheses::

inverted = (Drop('[^') & interval[1:] & Drop(']')) >> invert


.. _faq_apply:

How do I choose between > and >> ?
----------------------------------

To understand > and >> it's important that you first see that Lepl is designed
to work with lists of results. For example, `Any() <api/redirect.html#lepl.matchers.core.Any>`_, the most basic
matcher, places the matched character in a list::

>>> Any().parse('a')
['a']

Similarly, repetition returns a list of results::

>>> Any()[:].parse('ab')
['a', 'b']

as does `And() <api/redirect.html#lepl.matchers.combine.And>`_::

>>> (Any() & Any()).parse('ab')
['a', 'b']

Even when the strings are joined, they are still in a list::

>>> Any()[:, ...].parse('ab')
['ab']
>>> (Any() + Any()).parse('ab')
['ab']

You may not want this -- you may want a parser that returns a single object
rather than a list. The best way to return a single value is to wrap the
*final* parser in an extra function that returns the first value from the
list::

>>> def my_letter_parser(text):
... return Any().parse(text)[0]
...
>>> my_letter_parser('a')
'a'

What does all this have to do with > and >>? It's important because *you want
the result of applying a function to return a list*.

Given that, there are two obvious ways to apply functions to results.

The first way is to take a a list of results (which might contain just one
value -- that's completely normal and OK) and **apply the function to each
result in the list**. This is what ``>>`` does::

>>> def add_x(text):
... return text + 'x'
...
>>> ( Any() >> add_x ).parse('a')
['ax']
>>> ( (Any() & Any()) >> add_x ).parse('ab')
['ax', 'bx']

This (``>>``) is useful when:

* You want to modify each result, one at a time, all in the same way.

* You know that your matcher gives a *single* result, and you want to change
it. For example,

* Translating escaped characters.

* Converting a number in a string to a float value.

Usually, if you are calling a *function* (``float()``, ``lambda`` etc) you
want to use ``>>``.

The second way that you can process a list of results is by **passing the
entire list to a function**. Because we still want a list afterwards, Lepl
*adds an extra list around the result*. This is what ``>`` does::

>>> def first(my_list):
... return my_list[0]
...
>>> ( Any() > first ).parse('a')
['a']
>>> ( (Any() & Any()) > first ).parse('ab')
['a']

This is also useful for structuring results::

>>> ( (Any() & Any()) > tuple ).parse('ab')
[('a', 'b')]
>>> ( (Any() & Any()) > list ).parse('ab')
[['a', 'b']]
>>> (( (Any() & Any()) > list ) & Any()).parse('abc')
[['a', 'b'], 'c']

So ``>`` is useful when:

* You want to select some results.

* You want to build data structures around the results.

Usually, if you are calling a *constructor* (`Node() <api/redirect.html#lepl.support.node.Node>`_, ``tuple()`` etc.) you want to
use ``>``.

.. _faq_logging:

Why am I seeing "No handlers could be found..." messages?
---------------------------------------------------------

*Why do I see this warning printed to stderr?*

::

No handlers could be found for logger "lepl.parser.trampoline"

This is because Lepl is sending messages to the Python logging system (usually
debug information), but you don't have logging configured.

You can suppress the warning by adding the following somewhere in your code::

from logging import basicConfig, ERROR
basicConfig(level=ERROR)

but only do this if you are not using the logging package!

.. _faq_slowregexp:

Why does my matcher take so long to compile?
--------------------------------------------

*Why is the matcher taking several seconds just to compile?*

You are probably using `Float() <api/redirect.html#lepl.support.warn.Float>`_ or `Real() <api/redirect.html#lepl.matchers.derived.Real>`_ which are being compiled
internally to regular expressions. The current regexp implementation is very
ineffecient when compiling such values.

In the future Lepl will move to a new regular expression engine. For now, if
you don't need backtracking within the number and you are using a simple
parser without tokens (ie. no lexer), you can use these replacements (which
delegate to the system ``re`` library)::

Real = lambda: Regexp(r'[\+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.|[0-9]+)(?:[eE][\+\-]?[0-9]+)?')
Float = lambda: Regexp(r'[\+\-]?(?:[0-9]*\.[0-9]+(?:[eE][\+\-]?[0-9]+)?|[0-9]+\.(?:[eE][\+\-]?[0-9]+)?|[0-9]+[eE][\+\-]?[0-9]+)')

However, those will not improve the speed of the lexer (which will convert
them back to the the internal DFA implementation).

Another alternative is to use `.config.no_compile_regexp() <api/redirect.html#lepl.core.config.ConfigBuilder.no_compile_regexp>`_ which will avoid
the compilation in some circumstances. Again, this won't help when the lexer
is used.

Finally, remember that you can avoid recompiling your parser by making your
matcher just once and then re-using it. It may be worth, for example,
creating a matcher in a global variable (or during set-up for the entire
suite) to re-use in a series of unit tests.

Change log

50cf79968d0f by andrew on Mar 19, 2011   Diff
links back
Go to: 
Project members, sign in to write a code review

Older revisions

ea2d4bea011b by andrew on Mar 19, 2011   Diff
before adding links
1082e82fed90 by andrew on Mar 19, 2011   Diff
still cleaning out docs
8f2692346ba5 by andrew on Mar 19, 2011   Diff
docs without api links
All revisions of this file

File info

Size: 11955 bytes, 336 lines
Powered by Google Project Hosting