My favorites | Sign in
Project Home Downloads Wiki Issues Source
Checkout   Browse   Changes    
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#! /usr/bin/env python
# File: bibgrammar.py

"""
Provides an EBNF description of the bibtex bibliography format.
The grammar draws largely from
the grammar description in Nelson Beebe's `Lex/Yacc parser`_
and also from
Greg Ward's btOOL_ documentation.

:author: Dylan Schwilk
:contact: http://www.schwilk.org
:author: Alan G Isaac
:contact: http://www.american.edu/cas/econ/faculty/faculty.htm#isaac
:license: MIT (see `license.txt`_)
:date: 2008-06-28


.. _license.txt: ./license.txt
.. _`Lex/Yacc parser`: http://www.math.utah.edu/~beebe/
.. _btooL: http://www.tug.org/tex-archive/biblio/bibtex/utils/btOOL/
"""
__docformat__ = "restructuredtext en"
__needs__ = '2.4'
__version__ = "1.7"
__author__ = ["Dylan W. Schwilk", "Alan G Isaac"]


################### IMPORTS ##################################################
#import from standard library
# (some if run as main; see below)

#import dependencies
from simpleparse.parser import Parser
from simpleparse.common import numbers, strings, chartypes

#local imports
################################################################################


# EBNF description of a bibtex file

# 2008-06-27: There may be a bug in simpleparse that sometimes causes certain entries to
# not be recognized. The problem, however, can disapear if the order of entries
# in a bibfile is changed! I do not believe it is a problem with the grammar
# but is a bug in simpleparse itself.

#modification 2009-01-01
# change `key` to `citekey`
# add `alpha_name`
# change `macro` def (use case insenstive string)
# change `macro_contents` def (field instead of fields)
# change `fields` def (since comma is allowed after last field)
#modification 2009-02-11
# change braces_string and esp. quotes_string def bec old def *very* slow
# also, gives better match to format described at
# http://artis.imag.fr/~Xavier.Decoret/resources/xdkbibtex/bibtex_summary.html

dec = r"""
bibfile := entry_or_junk+
>entry_or_junk< := (tb, object) / (tb, junk)
>object< := entry / macro / preamble / comment_entry
entry := '@', entry_type, tb, ( '{' , tb, contents, tb, '}' ) / ( '(' , tb, contents, tb, ')' )
macro := c'@string', tb, ( '{' , tb, macro_contents, tb, '}' ) / ( '(' , tb, macro_contents, tb, ')' )
preamble := '@', entry_type, tb, ( '{' , tb, preamble_contents, tb, '}' ) / ( '(' , tb, preamble_contents, tb, ')' )
comment_entry := '@', entry_type, tb, string
>contents< := citekey , tb, ',' , tb, fields
>macro_contents< := field
>preamble_contents< := value
entry_type := alpha_name
citekey := number / name
fields := (field_comma / field)+
>field_comma< := field , tb, ',', tb
field := name, tb, '=' , tb, value
value := simple_value / (simple_value, (tb,'#', tb, simple_value)+)
>simple_value< := string / number / name
alpha_name := [a-zA-Z]+
name := []-[a-z_A-Z!$&+./:;<>?^`|'] , []-[a-z_A-Z0-9!$&+./:;<>?^`|']*
number := [0-9]+ / ([[0-9]+, tb, [-]+, tb, [0-9]+)
string := ('\"' , quotes_string?, '\"') / ('{' , braces_string?, '}')
<braces_string> := (-[{}@]+ / string)+
<quotes_string> := (-[\"{}]+ / ('{', braces_string,'}'))+
<junk> := -[ \t\r\n]+
<tb> := (comment / ws)*
<ws> := [ \t\n\r]
<comment> := '%' , -[\n]*, '\n'
"""


## instantiate SimpleParse parsers
parser = Parser(dec, 'bibfile')
entry_parser = Parser(dec, 'entry')

## offer a default parse function
def Parse(src, processor=None) :
'''Parse the bibtex string in src, process with processor.'''
return parser.parse(src, processor=processor)

## self-test
if __name__ =="__main__":
import sys, pprint
if len(sys.argv) > 1 :
src = open(sys.argv[1]).read()
taglist = Parse(src)
pprint.pprint(taglist)

Change log

r54 by alan.isaac on Mar 11, 2009   Diff
Small change to quotes_string in
bibgrammar.py. Add a test to example.
Go to: 
Project members, sign in to write a code review

Older revisions

r49 by alan.isaac on Feb 12, 2009   Diff
Slight change in string def in
bibgrammar.py -> much better speed.
Add default post-processing to
DEFAULT_CITATION_TEMPLATE in
default_templates.py.
r45 by alan.isaac on Jan 14, 2009   Diff
Allow apostrophe in cite key. (Is this
BibTeX conformable? Check.)
r44 by alan.isaac on Jan 2, 2009   Diff
Try to fix name parsing (for special
characters).
All revisions of this file

File info

Size: 3939 bytes, 106 lines

File properties

svn:eol-style
native
Powered by Google Project Hosting