|
"http://www.w3.org/TR/html4/loose.dtd">
>
Chapter 14
|
|
The figure 14.2 shows token precedence (see 6.2). Tokens are declared in the order of appearance except from inline tokens that are declared before predefined tokens.
Grammar rules (see 7.3) are used to define what a symbol is composed of. A rule is translated into a method of the parser class (see figure 14.3). The attributes of the symbol are the parameters of the methods. The docstring of the method is the grammar rule.
|
Terminal symbols (see 6.2) are recognized by calling the _eat method with the name of the token to match (see figure 14.4). Terminal symbols can return the token text in a string. If the current token is not the expected token, _eat raises a TPGWrongMatch exception. This exception will be catched either by an outer choice point to try another choice or by TPG to turn this exception into a ParserError exception.
|
Non terminal symbols (see 7.5) are recognized by calling their rules (see figure 14.5). Non terminal symbols can have attributes, a return value or both.
|
The token number is updated by the _eat method when called so a sequence (see 7.6) in a rule is translated into a sequence of statements in Python (see figure 14.6).
|
The cut mechanism (see 7.7) is implemented as a shortcut to the TPGWrongMatch exception. When the sequence following a cut fails, i.e. when it raises a TPGWrongMatch exception, TPG turns this exception into a ParserError exception to immediately abort parsing (see figure 14.7).
|
Alternatives (see 7.8) are tried in the order they are declared. Before trying the first branch, TPG saves the current token number. If the first choice fails, the token number is restored before trying the second branch. When a branch fails, the _eat method raises a TPGWrongMatch exception which is catched by the alternative structure. This algorithm is very simple to implement but isn’t very efficient. This is how the computation of any prediction table is avoided.
|
Repetitions (see 7.9) are implemented in a similar way to alternatives. The TPGWrongMatch tells TPG when to go out of the loop. See figures 14.9 and 14.10 for repetition examples.
|
|
Abstract syntax trees (see 7.11.1) are simply Python objects. The figure 14.11 shows the instanciation of a node. The figure 14.12 shows the update with the add method.
|
|
Text can be extracted (see 7.11.2) from the input string (including separators). The prefix @ operator puts a mark on the current token. The infix .. operator extracts the text between two marks.
The figure 14.13 shows how this extraction works.
|
TPG has an adapted syntax for some Python expressions (see 7.11.3).
The figure 14.14 shows this implementation.
|