PB-88_Interpretation_and_Code_Generation_Based_on_Intermediate_Languages_May78.pdf

(1681 KB) Pobierz
Interpretation and Code Generation
Based on Intermediate Languages*
Peter Kornerup **,
Bent Bruun Kristensen,
Ole Lehrmann Madsen,
Computer Science Department
Aarhus University
Denmark
Abstract
The possibility of supporting . high level languages through intermediate
languages to be used for direct interpretation and as intermediate forms in
compilers is investigated. An accomplished project in the construction of an
interpreter and a code generator using one common intermediate form is
evaluated. The subject is analyzed in general, and a proposal for an improved
design scheme is given.
*
This work has been supported by the Danish Natural Science Research
Council, Grants No. 511-1546 and No. 511-5524.
** Part of this work was performed while this author was on leave at the
University of Southwestern Louisiana, Lafayette, USA.
1.
Introduction
The purpose of this paper is to investigate the possibility of designing suitable
languages to be used for interpretation and as intermediate forms in compilers,
partly by informing about the experiences from a project involving the design of
a stack machine, the construction of an emulator on a microprogrammable
minicomputer system, and the construction of a codegenerator for a traditional
minicomputer, partly by analyzing the subject in general.
Intermediate languages have mostly been defined in connection with
selfcompiling compilers as a means of aid in bootstrapping such compilers onto
other host machines (e.g. with BCPL [16] and PASCAL [22]1. Such
intermediate languages are then primarily designed for a straightforward
interpretation. By writing an interpreter in some language already available on
the new host system (usually a high level language), an interpretive (and slow)
language processor can be available while rewriting the code generation parts in
the compiler. Usually the code generation does not take the same intermediate
language as its input, but uses other intermediate forms, often because a one
pass scheme is wanted where the code generation is integrated in the analytic
part. However, for the implementation of such language processors on
minicomputer systems of a traditional architecture, some advantages may be
gained by using multipass schemes to reduce storage requirements.
Hence it would be advantageous to be able to use such intermediate languages
directly to give efficient code generation on minicomputers. This requires that
the intermediate language (the hypothetical machine) is being designed not only
for interpretation, but also such that sufficient information for code generation
is available.
For microprogrammable processors an intermediate language may be used in a
production language processor system by more or less direct interpretation of
the language. By a suitable assembly process a compiled program in the
intermediate (symbolic) form may be brought into a compact internal
representation to be efficiently executed by a microprogrammed interpreter.
The construction of such an assembler and interpreter will most often turn out
to be much easier than that of a code generator for a conventional computer.
It is worth noticing, that the process of constructing that part of a compiler
which can produce such an intermediate language, also is the part which can
be constructed by an automated process based on a formal description of the
source language, given a suitable translator writing system. Hence it is most
likely, not only that a high degree of portability is achieved, but also that more
2
correct language processors can be constructed, because the source language
as well as the intermediate language can be precisely defined.
The question of concern is not the old UNCOl problem, but to define a suitable
level for such intermediate· languages such that various implementation
strategies can use the same interface to the particular source language
processor.
The question will be analyzed on the background of some practical projects,
described below:
During the last few years some microprogrammable processors, named RIKKE
[18] and MATHilDA [4]. have been designed and built at the Computer
Science Department, Aarhus University, and it was decided to implement
PASCAL [22] on a configuration of these.
The first step was to design a pseudo-machine (P-code [11]1 to support
PASCAL. The main goal of the design was that it should be possible to
implement
a microprogrammed
interpreter for
P-code
on
the
RIKKE/MATHllDA system. Furthermore it should be easy to implement a
compiler from PASCAL to P-code. A final requirement was that it should be
possible to use P-code as an intermediate form in the process of implementing
PASCAL on traditional computers.
All three steps were realized, and this paper will try to draw some conclusions
on the experience gained, before analyzing the subject in general.
Finally some criteria for the design of such intermediate languages are being
extracted, and their application to PASCAL is exemplified in an improved
intermediate form given in an appendix.
2. The P-code design and the compiler
The P-code machine is oganized with three stacks, a runtime stack for the
allocation of data segments of procedures (activation records), an address stack
for the evaluation of addresses and integer arithmetic, and an evaluation stack
for expressions in rea Is and sets, and the manipulation of packed records.
Variables in the data segments are addressed by a block level number and an
ordinal number. Furthermore, there is an area for the code of the procedures,
organized in segments, one for each procedure.
3
Straightforward stack code consists of instructions loading operands on the
stack, and operators working on the stack, i.e. a postfix Polish form. To avoid
some explicit loads of operands on the stacks some of the operators in P-code
also exist in a version with one of the operands being an argument of the
instruction. This is the case for all arithmetic and relational operators which may
have an integer constant as an argument. Store operations have the address to
be stored into as an argument of the instruction, avoiding an explicit load of the
address. Also the jump instructions have the jump address as an argument.
P-code also contains operators for maintaining the runtime stack, i.e. mark,
enter and exit instructions. But it does not contain special instructions reflecting
control structures like repetition or conditionals, nor does it have instructions
for accessing components of structured data, except for· instructions to pack
and unpack fields of a word and bit testing and insertion (for packed records
and powersets).
A compiler from PASCAL to P-code has been implemented by means of an
LALR(1) parser generator [12]. Basically a simple translation scheme is used,
however for several reasons the compiler became more complicated than
anticipated.
To optimize in space and time, a quite complicated
pseudo-evaluation is performed, which includes handling of the variants of the
instructions, subexpressions in literals, and certain optimizations of boolean
expressions. During the design of P-code, and the construction of the
compiler, an interpreter of P-code written in PASCAL was used for testing
purposes.
Except for files other than input and output, the compiler and P-code supports
what may be known as PASCAL 1 [22].
3. The interpreter and the code generator
The next step in implementing PASCAL on the RIKKE/MATHILDA system was
to design and implement a microcoded interpreter for the P-code machine [4].
RIKKE and MATHILDA are two independent processors, with 16 bit
respectively 64 bit word sizes, coupled together both directly and by means of
an additional 64 bit wide shared memory called WIDE STORE. Furthermore
RIKKE has its own 16 bit wide memory. The whole system constitutes the
RIKKE/MATHILDA system. The P-code machine is realized on this system in
the following way: The evaluation stack is kept in MATHILDA, the address
stack in RIKKE, the runtime stack in WIDE STORE, and the code segments in
Zgłoś jeśli naruszono regulamin