ADTO9G.PDF

(466 KB) Pobierz
6
Abstract data types
This opened my mind, I started to grasp what it means to use the tool known as algebra. I’ll
be damned if anyone had ever told me before: over and again Mr. Dupuy
[the
mathematics
teacher] was making pompous sentences on the subject, but not once would he say this
simple word: it is a
division of labor,
which like any division of labor produces miracles,
and allows the mind to concentrate all of its forces on just one side of objects, on just one
of their qualities.
What a difference it would have made for us if Mr. Dupuy had told us: This cheese is soft
or it is hard; it is white, it is blue; it is old, it is young; it is yours, it is mine, it is light or it
is heavy. Of so many qualities let us consider only the weight. Whatever that weight may be,
let us call it A. Now, without thinking of the weight any more, let us apply to A everything
that we know of quantities.
Such a simple thing; yet no one was saying it to us in that faraway province…
Stendhal,
The Life of Henry Brulard,
1836.
For abstraction consists only in separating the perceptible qualities of bodies, either from
other qualities, or from the bodies to which they apply. Errors arise when this separation
is poorly done or wrongly applied: poorly done in philosophical questions, and wrongly
applied in physical and mathematical questions. An almost sure way to err in philosophy is
to fail to simplify enough the objects under study; and an infallible way to obtain defective
results in physics and mathematics is to view the objects as less composite than they are.
Denis Diderot,
A Letter on the Blind for the Benefit of Those Who Can See,
1749.
L
etting objects play the lead role in our software architectures requires that we describe
them adequately. This chapter shows how.
You are perhaps impatient to dive into the depths of object technology and explore
the details of multiple inheritance, dynamic binding and other joys; then you may at first
look at this chapter as an undue delay since it is mostly devoted to the study of some
mathematical concepts (although all the mathematics involved is elementary).
But in the same way that even the most gifted musician will benefit from learning a
little music theory, knowing about abstract data types will help you understand and enjoy
the practice of object-oriented analysis, design and programming, however attractive the
concepts might already appear without the help of the theory. Since abstract data types
122
A BSTR ACT D ATA TY PES §6.1
establish the theoretical basis for the entire method, the consequences of the ideas
introduced in this chapter will be felt throughout the rest of this book.
There is more. As we will see at chapter end, these consequences actually extend
beyond the study of software proper, yielding a few principles of intellectual investigation
which one may perhaps apply to other disciplines.
6.1 CRITERIA
To obtain proper descriptions of objects, we need a method satisfying three conditions:
• The descriptions should be precise and unambiguous.
• They should be complete — or at least as complete as we want them in each case (we
may decide to leave some details out).
• They should not be
overspecifying.
The last point is what makes the answer non-trivial. It is after all easy to be precise,
unambiguous and complete if we “spill the beans” by giving out all the details of the
objects’ representation. But this is usually
too much
information for the authors of
software elements that need to access the objects.
This observation is close to the comments that led to the notion of information
hiding. The concern there was that by providing a module’s source code (or, more
generally, implementation-related elements) as the primary source of information for the
authors of software elements that rely on that module, we may drown them in a flood of
details, prevent them from concentrating on their own job, and hamper prospects of
smooth evolution. Here the danger is the same if we let modules use a certain data
structure on the basis of information that pertains to the structure’s representation rather
than to its essential properties.
“Information Hid-
ing”, page 51.
6.2 IMPLEMENTATION VARIATIONS
To understand better why the need for abstract data descriptions is so crucial, let us
explore further the potential consequences of using physical representation as the basis for
describing objects.
A well-known and convenient example is the description of stack objects. A stack
object serves to pile up and retrieve other objects in a last-in, first-out (“LIFO”) manner,
the latest inserted element being the first one to be retrieved. The stack is a ubiquitous
structure in computing science and in many software systems; the typical compiler or
interpreter, for example, is peppered with stacks of many kinds.
Stacks, it must be said, are also ubiquitous in didactic presentations of abstract data types,
so much so that Edsger Dijkstra is said to have once quipped that “abstract data types are
a remarkable theory, whose purpose is to describe stacks”. Fair enough. But the notion of
abstract data type applies to so many more advanced cases in the rest of this book that I
do not feel ashamed of starting with this staple example. It is the simplest I know which
includes about every important idea about abstract data types.
§6.2 IMPLEMENTA TIO N VARIATIONS
123
Stack representations
Several possible physical representations exist for stacks:
Three possible
representations
for a stack
capacity
count
“Push” operation:
count
:=
count + 1
representation
[count] :=
x
(
ARRAY
_
UP
)
representation
1
capacity
“Push” operation:
representation
[free] :=
x
free
:=
free – 1
(
ARRAY
_
DOWN
)
representation
item
last
previous
item
previous
item
previous
item
free
1
“Push” operation:
new
(n)
n item
:=
x
n previous
:=
last
last
:=
n
q
q
(
LINKED
)
previous
The figure illustrates three of the most common representations. Each has been given
a name for ease of reference:
ARRAY
_
UP
:
represent a stack through an array
representation
and an integer
count
whose value ranges from 0 (for an empty stack) to
capacity,
the size of the array
representation;
stack elements are stored in the array at indices 1 up to
count.
like
ARRAY
_
UP
, but with elements stored from the end of the array
rather than from the beginning. Here the integer is called
free
(it is the index of the
highest free array position, or 0 if all positions are occupied) and ranges from
capacity
for an empty stack down to 0. The stack elements are stored in the array at
indices
capacity
down to
free + 1.
a linked representation which stores each stack element in a cell with two
fields:
item
representing the element, and
previous
containing a pointer to the cell
containing the previously pushed element. The representation also needs
last,
a
pointer to the cell representing the top.
LINKED
:
ARRAY
_
DOWN
:
124
A BSTR ACT D ATA TY PES §6.2
Next to each representation, the figure shows a program extract (in Pascal-like
notation) giving the corresponding implementation for a basic stack operation: pushing an
element
x
onto the top.
For the array representations,
ARRAY
_
UP
and
ARRAY
_
DOWN
, the instructions
increase or decrease the top indicator (count or
free)
and assign
x
to the corresponding
array element. Since these representations support stacks of at most
capacity
elements,
robust implementations should include guards of the respective forms
if
count
<
capacity
then
if
free
>
0
then
which the figure omits for simplicity.
For
LINKED
, the linked representation, pushing an element requires four operations:
create a new cell
n
(done here with Pascal’s
new
procedure, which allocates space for a
new object); assign
x
to the new cell’s
item
field; chain the new cell to the earlier stack top
by assigning to its
previous
field the current value of
last;
and update
last
so that it will
now be attached to the newly created cell.
Although these are the most frequently used stack representations, many others exist.
For example if you need
two
stacks of elements of the same type, and have only limited
space available, you may rely on a single array with two integer top markers,
count
as in
ARRAY
_
UP
and
free
as in
ARRAY
_
DOWN
; one of the stacks will grow up and the other will
grow down. The representation is full if and only if
count
=
free.
capacity
Stack 2
free
Head-to-head
representation
for two stacks
count
representation
1
The advantage, of course, is to lessen the risk of running out of space: with two
arrays of capacity
n
representing stacks under
ARRAY
_
UP
or
ARRAY
_
DOWN
, you exhaust
the available space whenever
either
stack reaches
n
elements; with a single array of size
2n
holding two head-to-head stacks, you run out when the
combined
size reaches
2n,
a less
likely occurrence if the two stacks grow independently. (For any variable values
p
and
q,
max
(
p + q)
max
(
p) + max
(q).)
Each of these and other possible representations is useful in some cases. Choosing
one of them as “the” definition of stacks would be a typical case of overspecification. Why
should we consider
ARRAY
_
UP
, for example, more representative than
LINKED
? The most
visible properties of
ARRAY
_
UP
— the array, the integer
count,
the upper bound — are
irrelevant to an understanding of the underlying structure.
Stack 1
§6.2 IMPLEMENTA TIO N VARIATIONS
125
The danger of overspecification
Why is it so bad to use a particular representation as specification?
“ABOUT SOFT-
WARE MAINTE-
NANCE”, 1.3, page
17.
The results of the Lientz and Swanson maintenance study, which you may recall,
give a hint. More than 17% of software costs was found to come from the need to take into
account changes of data formats. As was noted in the discussion, too many programs are
closely tied to the physical structure of the data they manipulate. A method relying on the
physical representation of data structures to guide analysis and design would not be likely
to yield flexible software.
So if we are to use objects or object types as the basis of our system architectures,
we should find a better description criterion than the physical representation.
How long is a middle initial?
Lest stacks make us forget that, beyond the examples favored by computer scientists, data
structures are ultimately connected with real-life objects, here is an amusing example,
taken from a posting on the Risks forum (comp.risks Usenet newsgroup) of the dangers of
a view of data that is too closely dependent on concrete properties:
Risks forum, 10.74,
3 Jan. 1993. Post-
ing by Darrell D.E.
Long: ``Dehuman-
ization by old
Cobol programs''.
Abbreviated.
See exercise
E6.5,
page 161.
My dear mother blessed
(or
perhaps cursed) all of her children with two middle initials,
in my case “D” and “E”. This has caused me a good deal of trouble.
It seems that TRW sells certain parts of your credit information, such as your name and
a demographic profile. I recently got a new credit card from Gottchalks and found to my
chagrin that my name had been truncated to “Darrell D. Long”. I went to the credit
manager and was assured that things would be fixed. Well, two things happened: I got a
new credit card, this time as “Darrell E. Long”, and TRW now has an annotation in my
file to the effect “File variation: middle initial is E”. Soon after this I start getting mail
for “Darrell E. Long”
(along
with the usual “Darrell Long” and “Darrell D. Long” and
the occasional “Darrell D. E. Long”).
I called up the credit bureau and it seems that the programmer who coded up the TRW
database decided that all good Americans are entitled to only one middle initial. As the
woman on the phone patiently told me “They only allocated enough megabytes
(sic)
in
the system for one middle initial, and it would probably be awfully hard to change”.
Aside from the typical example of technobabble justification (“megabytes”), the
lesson here is the need to avoid tying software to the exact physical properties of data.
TRW’s system seems similar to those programs, mentioned in an earlier discussion, which
“knew” that postal codes consist of exactly five digits.
See page
18.
The author of the message reproduced above was mainly concerned about junk mail,
an unpleasant but not life-threatening event; the archives of the Risks forum are full of
computer-originated name confusions with more serious consequences. The “millenium
problem”, mentioned in the discussion of software maintenance, is another example of the
dangers of accessing data based on physical representation, this one with hundreds of
millions of dollars’ worth of consequences.
Zgłoś jeśli naruszono regulamin