The Parser

The parser, residing in the module CmpPrs, transforms a token sequence into an abstract program, i.e. a parse tree.

Our parser will use the technique referred to as a recursive-descent analysis.

Consider, for example, the following grammar:

Sentence = Subject Predicate.
Subject = "cats" | "dogs".
Predicate = "sleep" | "eat".

Suppose we are given the token sequence

    "dogs" "eat"

and want to determine whether this sequence is a well-formed sentence. This amounts to determining whether this sequence can be derived from the non-terminal Sentence. But, the grammar specifies that the set of token sequences generated by the nonterminal Sentence is equal to the set of sequences generated by the non-terminal sequence Subject Predicate. Thus, the original problem can be reduced to determining whether the input sequence can be divided into two subsequences such that the first one can be derived from the non-terminal Subject, and the second one from the non-terminal Predicate.

How can a sequence be divided into two parts, of which the first is generated by the non-terminal Subject? It, can, obviously, be done by testing whether the sequence begins with one of the tokens "cats" or "dogs".

Thus we come to the following method of analyzing token sequences.

Each non-terminal A appearing in the grammar is associated with a function A having the following declaration:

$func? A e.Token = e.Rest;

This function A tests whether the input token sequence e.Token begins with a sequence derivable from the non-terminal A, and, if so, deletes this beginning and returns the rest of the input sequence thus obtained. Otherwise, if the input sequence does not begin with a sequence derivable from the non-terminal A, the function A returns a failure.

It goes without saying that the above method is applicable only in cases where, for each non-terminal A and each input sequence Z there exists no more than one way of dividing Z into two subsequences, of which the first is derivable from A. In many cases, however, the grammar can be rewritten in such a way that this restriction will be satisfied. An interested reader may find further details in [Wir1976].

Proceeding from the above consideration, we can now define the function Sentence either deleting from the input sequence the beginning derivable from the non-terminal Sentence, or failing, if this is unfeasible.

$func? Sentence e.Token = e.Rest;
$func? Subject  e.Token = e.Rest;
$func? Predicate   e.Token = e.Rest;
$func? Token  s  e.Token = e.Rest;

Sentence  eZ =
  <Subject eZ> :: eZ,
  <Predicate  eZ> :: eZ,
    = eZ;

Subject  eZ =
  \{
  <Token "dogs" eZ> :: eZ = eZ;
  <Token "cats"  eZ> :: eZ = eZ;
  };

Predicate  eZ =
  \{
  <Token "sleep" eZ> :: eZ = eZ;
  <Token "eat" eZ> :: eZ = eZ;
  };

Token  s eZ =
  eZ : s eZ0
    = eZ0;

The function Token is used for deleting a terminal symbol, which is passed as the first argument.

Now we can return to considering the module CmpPrs, in which we have to deal with two additional problems.

First, instead of returning the input token sequence as a whole, the scanner produces tokens one by one. Thus, each of the parsing functions, instead of taking as argument the whole token sequence, takes as argument a single token, the one that has been read last. This token is the one to be analyzed next. Similarly, each of the parsing functions, instead of returning the whole rest of the token sequence, returns only the first unparsed token. (It should be kept in mind, however, that each token is represented by two Refal Plus symbols.)

Second, in addition to checking the syntax correctness of the source program, the parser has to transform the token sequence into the corresponding abstract program, i.e. into an abstract syntax tree. Thus, the parsing function associated with a non-terminal A is usually declared as follows:

$func A sC sI = sC sI tX;

where sC sI represent the current token, and tX is the result of translating the token sequence consumed by the function into an abstract syntax tree.

Third, if a syntax error is detected, the parser, instead of returning a failure, must produce an error $error(Ge), where Ge is an error message describing the error. For this reason, the parsing functions are declared as unfailing ones.

Here is the syntax of the abstract programs produced by the parser:

t.Program = (Program t.Statement).
t.Statement =
     (Assign s.Name t.Expr) |
     (If t.Test t.Statement t.Statement) |
     (While t.Test t.Statement) |
     (Read s.Name) |
     (Write t.Expr) |
     (Seq t.Statement t.Statement)
     (Skip).
t.Test = (Test s.CompOper t.Expr t.Expr).
t.Expr =
     (Const s.Value) |
     (Name s.Name) |
     (Op t.Oper t.Expr t.Expr).
s.CompOper = Eq | Ne | Gt | Ge | Lt | Le.
s.Oper = Add | Sub | Div | Mul.
s.Name = s.Word.
s.Value = s.Int.

Thus, a construction written in abstract syntax usually has the form

    (KeyWord Gt₁ Gt₂ ... Gt_N)

where the key word _KeyWord is a word symbol representing the construct's name, and the ground terms Gt₁, Gt₂, ..., Gt_N represent the component constructs also written in abstract syntax. Since the correspondence between the constructs written in concrete and abstract syntax is evident, we won't dwell on this point.

Here is the implementation of the module CmpPrs:

//
// File: CmpPrs.rf
//

$use CmpScn;

$func  Program        sC sI         = sC sI tX;
$func  StatementSeq   sC sI         = sC sI tX;
$func  RestStSeq      sC sI tX0     = sC sI tX;
$func  Statement      sC sI         = sC sI tX;
$func  Test           sC sI         = sC sI tX;
$func  Expr           sC sI         = sC sI tX;
$func  RestExpr       sC sI tX1     = sC sI tX;
$func  Term           sC sI         = sC sI tX;
$func  RestTerm       sC sI tX1     = sC sI tX;
$func  Factor         sC sI         = sC sI tX;
$func  CompOp         sC sI         = sC sI s.CompOper;
$func? AddOp          sC sI         = sC sI s.Oper;
$func? MultOp         sC sI         = sC sI s.Oper;
$func? Token          sX  sC sI     = sC sI ;
$func  Accept         sX  sC sI     = sC sI ;
$func? Name           sC sI         = sC sI s.Name;
$func? Value          sC sI         = sC sI s.Value;


Parse  s.Chl =
  <InitScanner s.Chl>,
  <Program <ReadToken>> :: sC sI t.Program,
  <TermScanner>,
  {                      // Is the rest of the program
  sC sI : Key Eof        // empty?
    = t.Program;
    =  $error sC sI " instead of Eof after the program";
  };

Program  sC sI =                       // Program.
  <StatementSeq sC sI> :: sC sI tX,
    = sC sI (Program tX);

StatementSeq  sC sI =                  // Statement
  <Statement sC sI> :: sC sI tX0,      // sequence.
    = <RestStSeq sC sI tX0>;

RestStSeq  sC sI tX0 =
  \? {
  <Token ";" sC sI> :: sC sI \!
    <StatementSeq sC sI> :: sC sI tX,
    = sC sI (Seq tX0 tX);
  \!
    = sC sI tX0;
  };

Statement  sC sI =                      // Statement.
  \? {
  <Name sC sI> :: sC sI s.Name \!
    <Accept ":=" sC sI> :: sC sI,
    <Expr sC sI> :: sC sI t.Expr,
      = sC sI (Assign s.Name t.Expr);
  <Token "IF" sC sI> :: sC sI \!
    <Test sC sI> :: sC sI t.Test,
    <Accept "THEN" sC sI> :: sC sI,
    <Statement sC sI> :: sC sI t.Then,
    <Accept "ELSE" sC sI> :: sC sI,
    <Statement sC sI> :: sC sI t.Else,
      = sC sI (If t.Test t.Then t.Else);
  <Token "WHILE" sC sI> :: sC sI \!
    <Test sC sI> :: sC sI t.Test,
    <Accept "DO" sC sI> :: sC sI,
    <Statement sC sI> :: sC sI t.Do,
      = sC sI (While t.Test t.Do);
  <Token "READ" sC sI> :: sC sI \!
    <Name sC sI> :: sC sI s.Name,
      = sC sI (Read s.Name);
  <Token "WRITE" sC sI> :: sC sI \!
    <Expr sC sI> :: sC sI t.Expr,
      = sC sI (Write t.Expr);
  <Token "(" sC sI> :: sC sI \!
    <StatementSeq sC sI> :: sC sI t.Stmt,
    <Accept ")" sC sI> :: sC sI,
      = sC sI t.Stmt;
  \!
      = sC sI (Skip);
  };

Test  sC sI =                           // Test.
  <Expr sC sI> :: sC sI t.Expr1,
  <CompOp sC sI> :: sC sI t.Op,
  <Expr sC sI> :: sC sI t.Expr2,
    = sC sI (Test t.Op t.Expr1 t.Expr2);

Expr  sC sI =                           // Expression.
  <Term sC sI> :: sC sI t.X0,
    = <RestExpr sC sI t.X0>;

RestExpr  sC sI t.X1 =
  \? {
  <AddOp sC sI> :: sC sI s.Op \!
    <Term sC sI> :: sC sI t.X2,
      = <RestExpr sC sI (Op s.Op t.X1 t.X2)>;
  \!
      = sC sI t.X1;
  };

Term  sC sI =                           // Term.
  <Factor sC sI> :: sC sI t.X0,
    = <RestTerm sC sI t.X0>;

RestTerm  sC sI t.X1 =
  \? {
  <MultOp sC sI> :: sC sI s.Op \!
    <Factor sC sI> :: sC sI t.X2,
      = <RestTerm sC sI (Op s.Op t.X1 t.X2)>;
  \!
      = sC sI t.X1;
  };

Factor  sC sI =                    // Factor.
  \? {
  <Name sC sI> :: sC sI s.Name \!
    = sC sI (Name s.Name);
  <Value sC sI> :: sC sI s.Value \!
    = sC sI (Const s.Value);
  <Token "(" sC sI> :: sC sI \!
    <Expr sC sI> :: sC sI t.Expr,
    <Accept ")" sC sI> :: sC sI,
      = sC sI t.Expr;
  \!
      $error "Invalid factor start: " sC sI;
  };

CompOp  sC sI =              // Comparison operator.
  {
  sC : Key,
  ("=" Eq) ("<>" Ne) ("<=" Le) ("<" Lt) (">=" Ge) (">" Gt)
  : e (sI s.Op) e
    = <ReadToken> s.Op;
    = $error "Invalid comparison operation: " sC sI;
  };

AddOp  Key sI =             // Additive operator.
  ("+" Add) ("-" Sub) : e (sI s.Op) e
    = <ReadToken> s.Op;

MultOp  Key sI =            // Multiplicative operator.
  ("*" Mult) ("/" Div) : e (sI s.Op) e
    = <ReadToken> s.Op;

// Tries to consume a key word sI?, and
// returns a failure, if this is impossible.

Token  sI Key sI = <ReadToken>;

// Tries to consume a key word sI?, and
// generates an error, if this is impossible.

Accept
  {
  sI Key sI = <ReadToken>;
  sX sC  sI  = $error sC sI " instead of " Key sX;
  };

// Variable name.

Name  Name sI = <ReadToken> sI;

// Value.

Value  Value sI = <ReadToken> sI;

Bibliography

[Wir1976] N.Wirth. Algorithms + Data Structures = Programs. Prentice-Hall, Inc.. Englewood Cliffs, New Jersey. 1976.