The Scanner

The result produced by the scanner is a token sequence, each token being represented by two symbols. The first of the symbols indicates the class of the token.

In the following we describe the syntax of ground expressions by means of an extended Backus-Naur form (EBNF), with non-terminals written as Refal Plus variables. The ground expressions denoted by the non-terminals are assumed to correspond to the types of the non-terminals.

Thus the syntax of the token sequence produced by the scanner can be described as follows:
e.Tokens = { e.Token }.
e.Token =
     Key s.Key | Name s.Name | Value s.Value |
     Char s.Char.
s.Key  = s.Word.
s.Name = s.Word.
s.Value = s.Int.

A token of the form Key s.Key represents a keyword, s.Key being the word symbol whose character representation corresponds to the key word. A token of the form Name s.Name represents a variable name, s.Name being the word symbol whose character representation corresponds to the variable name (which, syntactically, is an identifier). A token of the form Value s.Value represents a numeric constant, s.Value being the corresponding numeric symbol. A token of the form Char s.Char represents an unidentified character s.Char.

When the reading of the source program has been finished, the scanner generates the token Key Eof.

The module CmpScn has the following implementation:
//
// File: CmpScn.rf
//

$use StdIO Class Convert Box;

$func  ScanToken
          s.Chl e.Line = s.TokenKey s.TokenInfo (e.Line1);
$func  ScanIdRest
          (e.IdChars)  e.Chars = s.TokenKey s.Word (e.Rest);
$func  ScanIntRest
          (e.IntChars) e.Chars = s.TokenKey s.Int  (e.Rest);
$func? IsBlank           s.Char = ;
$func? IsOneCharToken  s.Char = ;
$func? CompoundToken  s.Char e.Line = s.Word e.Rest;
$func? IsKeyWord         s.Word = ;

// Boxes for storing the channel to be read,
// and the rest of the current line.

$box ScanChl ScanLine;

InitScanner  s.Chl =              // Scanner initialization.
  <Store &ScanChl s.Chl>,         // The channel into box.
  <Store &ScanLine >;             // The current line is empty.

TermScanner  =                    // Scanner termination.
  <Store &ScanChl   >,            // Forgetting the channel
  <Store &ScanLine  >;            // and the current line.

ReadToken  =                      // A token is read.
  <Get &ScanChl> : s.Chl,
  <Get &ScanLine> :: e.Line,
  <ScanToken s.Chl e.Line>
        :: s.TokenKey s.TokenInfo (e.Line),
  <Store &ScanLine e.Line>,
    = s.TokenKey s.TokenInfo;

ScanToken  s.Chl e.Line =
  e.Line :
  {
  =                                     // The line rest is
    {                                   // empty. Reading the
    <ReadLineCh s.Chl> :: e.Line        // next line.
      = <ScanToken s.Chl e.Line>;
      = Key Eof ();                     // End of file.
    };
  s.Char e.Rest =                       // Examining the
    {                                   // current character.
    <IsBlank s.Char>
      = <ScanToken s.Chl e.Rest>;
    <IsLetter s.Char>
      = <ScanIdRest (s.Char) e.Rest>;
    <IsDigit s.Char>
      = <ScanIntRest (s.Char) e.Rest>;
    <IsOneCharToken s.Char>
      = Key <ToWord s.Char> (e.Rest);
    <CompoundToken s.Char e.Rest> :: s.Word e.Rest
      = Key s.Word (e.Rest);
      = Char s.Char (e.Rest);          // Unidentified character.
    };
  };

// Getting the rest of an identifier.

ScanIdRest  (e.IdChars) e.Rest =
  {
  e.Rest : s.Char e.Rest1,
              \{<IsLetter s.Char>; <IsDigit  s.Char>;}
    = <ScanIdRest (e.IdChars s.Char) e.Rest1>;
    = <ToWord <ToUpper e.IdChars>> : s.Word,
      {<IsKeyWord s.Word> = Key; = Name;} :: s.TokenKey,
      = s.TokenKey s.Word (e.Rest);
  };

// Getting the rest of an integer.

ScanIntRest  (e.IntChars) e.Rest =
  {
  e.Rest : s.Char e.Rest1, <IsDigit  s.Char>
    = <ScanIntRest (e.IntChars s.Char) e.Rest1>;
    = Value <ToInt e.IntChars> (e.Rest);
  };

IsBlank  s.Char =             // A whitespace character?
  ' \n\t' : e s.Char e;

IsOneCharToken  s.Char =      // A one-character token?
  ';()+-*/' : e s.Char e;

CompoundToken                 // Trying to get a multi-
  \{                          // character token.
  ':=' e.Rest = ":=" e.Rest;
  '<=' e.Rest = "<=" e.Rest;
  '<>' e.Rest = "<>" e.Rest;
  '<'  e.Rest = "<"  e.Rest;
  '>=' e.Rest = ">=" e.Rest;
  '>'  e.Rest = ">"  e.Rest;
  '='  e.Rest = "="  e.Rest;
  };

IsKeyWord            // Is the identifier a key word?
  \{
  DO ; ELSE ; IF ; READ ; THEN ; WHILE ; WRITE ;
  };