My favorites | Sign in
Project Home Downloads Wiki Issues Source
Repository:
Checkout   Browse   Changes   Clones  
Changes to /doc-src/manual/resources.rst
ea2d4bea011b vs. 50cf79968d0f Compare: vs.  Format:
Revision 50cf79968d0f
Go to: 
Project members, sign in to write a code review
/doc-src/manual/resources.rst   ea2d4bea011b /doc-src/manual/resources.rst   50cf79968d0f
1 1
2 .. index:: resources 2 .. index:: resources
3 .. _resources: 3 .. _resources:
4 4
5 Resource Management 5 Resource Management
6 =================== 6 ===================
7 7
8 .. index:: GeneratorWrapper, backtracking, generators 8 .. index:: GeneratorWrapper, backtracking, generators
9 9
10 The Input Problem 10 The Input Problem
11 ----------------- 11 -----------------
12 12
13 :ref:`backtracking` within Lepl is implemented using generators. These are 13 :ref:`backtracking` within Lepl is implemented using generators. These are
14 semi--autonomous *loop--like* blocks of code that can be paused and restarted. 14 blocks of code that can be paused and restarted. Each matcher has, at its
15 Each matcher has, at its core, one of these generators. Backtracking is 15 core, one of these generators. Backtracking is achieved by calling the
16 achieved by calling the generator to continue searching the input for 16 generator to continue searching the input for alternative matches.
17 alternative matches.
18 17
19 This can be a problem, particularly when parsing input from files. The 18 This can be a problem, particularly when parsing input from files. The
20 ``parse_file()`` method calls ``parse_iterable()`` which iterates over the 19 `parse_file() <api/redirect.html#lepl.core.config.ParserMixin.parse_file>`_
21 lines in the file. The streams created by ``parse_iterable()`` are lazy 20 method calls `parse_iterable()
21 <api/redirect.html#lepl.core.config.ParserMixin.parse_iterable>`_ which
22 iterates over the lines in the file. The streams created by `parse_iterable()
23 <api/redirect.html#lepl.core.config.ParserMixin.parse_iterable>`_ are lazy
22 linked lists that only reference later parts of the input. That means that 24 linked lists that only reference later parts of the input. That means that
23 Python can reclaim (garbage--collect) the memory used by earlier input while 25 Python can reclaim (garbage--collect) the memory used by earlier input while
24 the parser is still matching later input. However, this cannot happen (memory 26 the parser is still matching later input. However, this cannot happen (memory
25 cannot be freed) if a generator keeps a reference to earlier input (which it 27 cannot be freed) if a generator keeps a reference to earlier input (which it
26 needs in case it is called to backtrack and provide a new match). 28 needs in case it is called to backtrack and provide a new match).
27 29
28 So the generators used in Lepl to provide backtracking frustrate Python's 30 So the generators used in Lepl to provide backtracking frustrate Python's
29 garbage collection and lead to parsers consuming much more memory than would 31 garbage collection and lead to parsers consuming much more memory than would
30 otherwise be necessary. 32 otherwise be necessary.
31 33
32 The Output Problem 34 The Output Problem
33 ------------------ 35 ------------------
34 36
35 As a parser consumes input it also constructs the output. In Lepl the output 37 As a parser consumes input it also constructs the output. In Lepl the output
36 is a list of data held in memory. This output list is built by the parser 38 is a list of data held in memory. This output list is built by the parser
37 until all the input has been consumed, and then returned as a result. 39 until all the input has been consumed, and then returned as a result.
38 40
39 A large input often implies a large output. So even if we solve the "input 41 A large input often implies a large output. So even if we solve the "input
40 problem" above, a parser will still use a lot of memory to store the output. 42 problem" above, a parser will still use a lot of memory to store the output.
41 43
42 The Input Solution 44 The Input Solution
43 ------------------ 45 ------------------
44 46
45 To solve the "input problem" we need to discard generators, but we need 47 To solve the "input problem" we need to discard generators, but we need
46 generators while parsing because the parser needs to backtrack when it fails 48 generators while parsing because the parser needs to backtrack when it fails
47 to match something. The solution to this apparent contradiction is to discard 49 to match something. The solution to this apparent contradiction is to discard
48 only the generators that we will not need. 50 only the generators that we will not need.
49 51
50 In general we cannot find an exact solution (I believe; finding which 52 In general we cannot find an exact solution (I believe; finding which
51 generators we do not need is probably related to the halting problem), but we 53 generators we do not need is probably related to the halting problem), but we
52 can implement a simple heuristic that works well in practice: we discard 54 can implement a simple heuristic that works well in practice: we discard
53 generators that have not been used "for a long time". 55 generators that have not been used "for a long time".
54 56
55 To do this, Lepl maintains a list of generators, sorted by last access time. 57 To do this, Lepl maintains a list of generators, sorted by last access time.
56 When the list exceeds some size the least-used generator is deleted. This 58 When the list exceeds some size the least-used generator is deleted. This
57 reduces memory use while allowing a reasonable amount of backtracking. 59 reduces memory use while allowing a reasonable amount of backtracking.
58 60
59 The list of generators is maintained by a ``GeneratorManager()``, but you do 61 The list of generators is maintained by a `GeneratorManager()
60 not need to use this class directly. Instead, call ``.config.low_memory()``. 62 <api/redirect.html#lepl.core.manager.GeneratorManager>`_, but you do not need
61 That configuration method takes a parameter ``queue_len`` which controls the 63 to use this class directly. Instead, call `.config.low_memory()
62 size of the list of "valid" generators. 64 <api/redirect.html#lepl.core.config.ConfigBuilder.low_memory>`_. That
65 configuration method takes a parameter ``queue_len`` which controls the size
66 of the list of "valid" generators.
63 67
64 The Output Solution 68 The Output Solution
65 ------------------- 69 -------------------
66 70
67 We could reduce the amount of memory needed to hold the output if there were 71 We could reduce the amount of memory needed to hold the output if there were
68 some way of returning the output incrementally. In normal use this is not 72 some way of returning the output incrementally. In normal use this is not
69 possible with Lepl, but there is a "neat hack" that works quite well. 73 possible with Lepl, but there is a "neat hack" that works quite well.
70 74
71 Remember that a Lepl parser can produce a series of matches. These are 75 Remember that a Lepl parser can produce a series of matches. These are
72 alternative ways of parsing the input, given the parser. But in most cases we 76 alternative ways of parsing the input, given the parser. But in most cases we
73 are only interested in the first match --- this is particularly true when 77 are only interested in the first match --- this is particularly true when
74 using ``.config.low_memory()`` as described above, because the alternatives 78 using `.config.low_memory()
75 will be unavailable. 79 <api/redirect.html#lepl.core.config.ConfigBuilder.low_memory>`_ as described
80 above, because the alternatives will be unavailable.
76 81
77 The hack is to return fragments of the output as alternative matches. This 82 The hack is to return fragments of the output as alternative matches. This
78 means: 83 means:
79 84
80 * Adding a matcher at the "top level" of the parser that can return each 85 * Adding the matcher ``Iterate()`` at the "top level" of the parser. This
81 fragment in turn. 86 will return each fragment in turn.
82 87
83 * Calling ``parse_all()`` and then treating the generator as a sequence of 88 * Calling `parse_all()
84 fragments, rather than a sequence of complete matches. 89 <api/redirect.html#lepl.core.config.ParserMixin.parse_all>`_ and then
90 treating the generator as a sequence of fragments, rather than a sequence
91 of complete matches.
85 92
86 An example will make this clear. Here is a matcher that simply matches every 93 An example will make this clear. Here is a matcher that simply matches every
87 line:: 94 line::
88 95
89 >>> all_lines = Line(Token(AnyBut('\n')[:,...]))[:] 96 >>> all_lines = Line(Token(Any()[:,...]))[:]
90 97
91 and here is the equivalent matcher than returns each line as a separate 98 and here is the equivalent matcher than returns each line as a separate
92 fragment: 99 fragment. Note how the final ``[:]`` is replaced by ``Iterate()``::
93 100
94 >>> each_line = Iterate(Line(Token(AnyBut('\n')[:,...]))) 101 >>> each_line = Iterate(Line(Token(Any()[:,...])))
95 102
96 We can use ``each_line`` as follows:: 103 We can use ``each_line`` as follows::
97 104
98 >>> input = '''line one 105 >>> input = '''line one
99 ... line two 106 ... line two
100 ... line three''' 107 ... line three'''
101 >>> each_line.config.no_full_first_match().lines() 108 >>> each_line.config.no_full_first_match().lines()
102 >>> parser = each_line.get_parse_all() 109 >>> parser = each_line.get_parse_all()
103 >>> for line in parser(input): 110 >>> for line in parser(input):
104 >>> print(line) 111 >>> print(line)
105 ['line one\n'] 112 ['line one\n']
106 ['line two\n'] 113 ['line two\n']
107 ['line three'] 114 ['line three']
Powered by Google Project Hosting