The result produced by the scanner is a token sequence, each token being represented by two
symbols. The first of the symbols indicates the class of the token.
In the following we describe the syntax of ground expressions by means of an extended
Backus-Naur form (EBNF), with non-terminals written as Refal Plus variables. The ground
expressions denoted by the non-terminals are assumed to correspond to the types of the
non-terminals.
Thus the syntax of the token sequence produced by the scanner can be described as follows:
e.Tokens = { e.Token }.
e.Token =
Key s.Key | Name s.Name | Value s.Value |
Char s.Char.
s.Key = s.Word.
s.Name = s.Word.
s.Value = s.Int.
A token of the form Key s.Key represents a keyword, s.Key
being the word symbol whose character representation corresponds to the key word. A token of
the form Name s.Name represents a variable name, s.Name
being the word symbol whose character representation corresponds to the variable name (which,
syntactically, is an identifier). A token of the form Value s.Value
represents a numeric constant, s.Value being the corresponding numeric
symbol. A token of the form Char s.Char represents an unidentified character
s.Char.
When the reading of the source program has been finished, the scanner generates the token
Key Eof.
The module CmpScn has the following implementation:
//
// File: CmpScn.rf
//
$use StdIO Class Convert Box;
$func ScanToken
s.Chl e.Line = s.TokenKey s.TokenInfo (e.Line1);
$func ScanIdRest
(e.IdChars) e.Chars = s.TokenKey s.Word (e.Rest);
$func ScanIntRest
(e.IntChars) e.Chars = s.TokenKey s.Int (e.Rest);
$func? IsBlank s.Char = ;
$func? IsOneCharToken s.Char = ;
$func? CompoundToken s.Char e.Line = s.Word e.Rest;
$func? IsKeyWord s.Word = ;
// Boxes for storing the channel to be read,
// and the rest of the current line.
$box ScanChl ScanLine;
InitScanner s.Chl = // Scanner initialization.
<Store &ScanChl s.Chl>, // The channel into box.
<Store &ScanLine >; // The current line is empty.
TermScanner = // Scanner termination.
<Store &ScanChl >, // Forgetting the channel
<Store &ScanLine >; // and the current line.
ReadToken = // A token is read.
<Get &ScanChl> : s.Chl,
<Get &ScanLine> :: e.Line,
<ScanToken s.Chl e.Line>
:: s.TokenKey s.TokenInfo (e.Line),
<Store &ScanLine e.Line>,
= s.TokenKey s.TokenInfo;
ScanToken s.Chl e.Line =
e.Line :
{
= // The line rest is
{ // empty. Reading the
<ReadLineCh s.Chl> :: e.Line // next line.
= <ScanToken s.Chl e.Line>;
= Key Eof (); // End of file.
};
s.Char e.Rest = // Examining the
{ // current character.
<IsBlank s.Char>
= <ScanToken s.Chl e.Rest>;
<IsLetter s.Char>
= <ScanIdRest (s.Char) e.Rest>;
<IsDigit s.Char>
= <ScanIntRest (s.Char) e.Rest>;
<IsOneCharToken s.Char>
= Key <ToWord s.Char> (e.Rest);
<CompoundToken s.Char e.Rest> :: s.Word e.Rest
= Key s.Word (e.Rest);
= Char s.Char (e.Rest); // Unidentified character.
};
};
// Getting the rest of an identifier.
ScanIdRest (e.IdChars) e.Rest =
{
e.Rest : s.Char e.Rest1,
\{<IsLetter s.Char>; <IsDigit s.Char>;}
= <ScanIdRest (e.IdChars s.Char) e.Rest1>;
= <ToWord <ToUpper e.IdChars>> : s.Word,
{<IsKeyWord s.Word> = Key; = Name;} :: s.TokenKey,
= s.TokenKey s.Word (e.Rest);
};
// Getting the rest of an integer.
ScanIntRest (e.IntChars) e.Rest =
{
e.Rest : s.Char e.Rest1, <IsDigit s.Char>
= <ScanIntRest (e.IntChars s.Char) e.Rest1>;
= Value <ToInt e.IntChars> (e.Rest);
};
IsBlank s.Char = // A whitespace character?
' \n\t' : e s.Char e;
IsOneCharToken s.Char = // A one-character token?
';()+-*/' : e s.Char e;
CompoundToken // Trying to get a multi-
\{ // character token.
':=' e.Rest = ":=" e.Rest;
'<=' e.Rest = "<=" e.Rest;
'<>' e.Rest = "<>" e.Rest;
'<' e.Rest = "<" e.Rest;
'>=' e.Rest = ">=" e.Rest;
'>' e.Rest = ">" e.Rest;
'=' e.Rest = "=" e.Rest;
};
IsKeyWord // Is the identifier a key word?
\{
DO ; ELSE ; IF ; READ ; THEN ; WHILE ; WRITE ;
};