ADTO9G.PDF

(466 KB) Pobierz

Abstract data types

This opened my mind, I started to grasp what it means to use the tool known as algebra. I’ll

be damned if anyone had ever told me before: over and again Mr. Dupuy

[the

mathematics

teacher] was making pompous sentences on the subject, but not once would he say this

simple word: it is a

division of labor,

which like any division of labor produces miracles,

and allows the mind to concentrate all of its forces on just one side of objects, on just one

of their qualities.

What a difference it would have made for us if Mr. Dupuy had told us: This cheese is soft

or it is hard; it is white, it is blue; it is old, it is young; it is yours, it is mine, it is light or it

is heavy. Of so many qualities let us consider only the weight. Whatever that weight may be,

let us call it A. Now, without thinking of the weight any more, let us apply to A everything

that we know of quantities.

Such a simple thing; yet no one was saying it to us in that faraway province…

Stendhal,

The Life of Henry Brulard,

1836.

For abstraction consists only in separating the perceptible qualities of bodies, either from

other qualities, or from the bodies to which they apply. Errors arise when this separation

is poorly done or wrongly applied: poorly done in philosophical questions, and wrongly

applied in physical and mathematical questions. An almost sure way to err in philosophy is

to fail to simplify enough the objects under study; and an infallible way to obtain defective

results in physics and mathematics is to view the objects as less composite than they are.

Denis Diderot,

A Letter on the Blind for the Benefit of Those Who Can See,

1749.

etting objects play the lead role in our software architectures requires that we describe

them adequately. This chapter shows how.

You are perhaps impatient to dive into the depths of object technology and explore

the details of multiple inheritance, dynamic binding and other joys; then you may at first

look at this chapter as an undue delay since it is mostly devoted to the study of some

mathematical concepts (although all the mathematics involved is elementary).

But in the same way that even the most gifted musician will benefit from learning a

little music theory, knowing about abstract data types will help you understand and enjoy

the practice of object-oriented analysis, design and programming, however attractive the

concepts might already appear without the help of the theory. Since abstract data types

122

A BSTR ACT D ATA TY PES §6.1

establish the theoretical basis for the entire method, the consequences of the ideas

introduced in this chapter will be felt throughout the rest of this book.

There is more. As we will see at chapter end, these consequences actually extend

beyond the study of software proper, yielding a few principles of intellectual investigation

which one may perhaps apply to other disciplines.

6.1 CRITERIA

To obtain proper descriptions of objects, we need a method satisfying three conditions:

• The descriptions should be precise and unambiguous.

• They should be complete — or at least as complete as we want them in each case (we

may decide to leave some details out).

• They should not be

overspecifying.

The last point is what makes the answer non-trivial. It is after all easy to be precise,

unambiguous and complete if we “spill the beans” by giving out all the details of the

objects’ representation. But this is usually

too much

information for the authors of

software elements that need to access the objects.

This observation is close to the comments that led to the notion of information

hiding. The concern there was that by providing a module’s source code (or, more

generally, implementation-related elements) as the primary source of information for the

authors of software elements that rely on that module, we may drown them in a flood of

details, prevent them from concentrating on their own job, and hamper prospects of

smooth evolution. Here the danger is the same if we let modules use a certain data

structure on the basis of information that pertains to the structure’s representation rather

than to its essential properties.

“Information Hid-

ing”, page 51.

6.2 IMPLEMENTATION VARIATIONS

To understand better why the need for abstract data descriptions is so crucial, let us

explore further the potential consequences of using physical representation as the basis for

describing objects.

A well-known and convenient example is the description of stack objects. A stack

object serves to pile up and retrieve other objects in a last-in, first-out (“LIFO”) manner,

the latest inserted element being the first one to be retrieved. The stack is a ubiquitous

structure in computing science and in many software systems; the typical compiler or

interpreter, for example, is peppered with stacks of many kinds.

Stacks, it must be said, are also ubiquitous in didactic presentations of abstract data types,

so much so that Edsger Dijkstra is said to have once quipped that “abstract data types are

a remarkable theory, whose purpose is to describe stacks”. Fair enough. But the notion of

abstract data type applies to so many more advanced cases in the rest of this book that I

do not feel ashamed of starting with this staple example. It is the simplest I know which

includes about every important idea about abstract data types.

§6.2 IMPLEMENTA TIO N VARIATIONS

123

Stack representations

Several possible physical representations exist for stacks:

Three possible

representations

for a stack

capacity

count

“Push” operation:

count

count + 1

representation

[count] :=

(

ARRAY

)

representation

capacity

“Push” operation:

representation

[free] :=

free

free – 1

(

ARRAY

DOWN

)

representation

item

last

item

free

“Push” operation:

new

(n)

n item

n previous

last

(

LINKED

)

The figure illustrates three of the most common representations. Each has been given

a name for ease of reference:

•

ARRAY

represent a stack through an array

representation

and an integer

count

whose value ranges from 0 (for an empty stack) to

capacity,

the size of the array

representation;

stack elements are stored in the array at indices 1 up to

count.

ARRAY

, but with elements stored from the end of the array

rather than from the beginning. Here the integer is called

free

(it is the index of the

highest free array position, or 0 if all positions are occupied) and ranges from

capacity

for an empty stack down to 0. The stack elements are stored in the array at

indices

capacity

down to

free + 1.

a linked representation which stores each stack element in a cell with two

fields:

item

representing the element, and

containing a pointer to the cell

containing the previously pushed element. The representation also needs

last,

pointer to the cell representing the top.

LINKED

ARRAY

DOWN

•

124

A BSTR ACT D ATA TY PES §6.2

Next to each representation, the figure shows a program extract (in Pascal-like

notation) giving the corresponding implementation for a basic stack operation: pushing an

element

onto the top.

For the array representations,

ARRAY

and

ARRAY

DOWN

, the instructions

increase or decrease the top indicator (count or

free)

and assign

to the corresponding

array element. Since these representations support stacks of at most

capacity

elements,

robust implementations should include guards of the respective forms

count

capacity

then

…

free

then

…

which the figure omits for simplicity.

For

LINKED

, the linked representation, pushing an element requires four operations:

create a new cell

(done here with Pascal’s

new

procedure, which allocates space for a

new object); assign

to the new cell’s

item

field; chain the new cell to the earlier stack top

by assigning to its

field the current value of

last;

and update

last

so that it will

now be attached to the newly created cell.

Although these are the most frequently used stack representations, many others exist.

For example if you need

two

stacks of elements of the same type, and have only limited

space available, you may rely on a single array with two integer top markers,

count

as in

ARRAY

and

free

as in

ARRAY

DOWN

; one of the stacks will grow up and the other will

grow down. The representation is full if and only if

count

free.

capacity

Stack 2

free

Head-to-head

representation

for two stacks

count

representation

The advantage, of course, is to lessen the risk of running out of space: with two

arrays of capacity

representing stacks under

ARRAY

DOWN

, you exhaust

the available space whenever

either

stack reaches

elements; with a single array of size

holding two head-to-head stacks, you run out when the

combined

size reaches

2n,

a less

likely occurrence if the two stacks grow independently. (For any variable values

and

max

(

p + q)

≤

max

(

p) + max

(q).)

Each of these and other possible representations is useful in some cases. Choosing

one of them as “the” definition of stacks would be a typical case of overspecification. Why

should we consider

ARRAY

, for example, more representative than

LINKED

? The most

visible properties of

ARRAY

— the array, the integer

count,

the upper bound — are

irrelevant to an understanding of the underlying structure.

Stack 1

§6.2 IMPLEMENTA TIO N VARIATIONS

125

The danger of overspecification

Why is it so bad to use a particular representation as specification?

“ABOUT SOFT-

WARE MAINTE-

NANCE”, 1.3, page

17.

The results of the Lientz and Swanson maintenance study, which you may recall,

give a hint. More than 17% of software costs was found to come from the need to take into

account changes of data formats. As was noted in the discussion, too many programs are

closely tied to the physical structure of the data they manipulate. A method relying on the

physical representation of data structures to guide analysis and design would not be likely

to yield flexible software.

So if we are to use objects or object types as the basis of our system architectures,

we should find a better description criterion than the physical representation.

How long is a middle initial?

Lest stacks make us forget that, beyond the examples favored by computer scientists, data

structures are ultimately connected with real-life objects, here is an amusing example,

taken from a posting on the Risks forum (comp.risks Usenet newsgroup) of the dangers of

a view of data that is too closely dependent on concrete properties:

Risks forum, 10.74,

3 Jan. 1993. Post-

ing by Darrell D.E.

Long: ``Dehuman-

ization by old

Cobol programs''.

Abbreviated.

See exercise

E6.5,

page 161.

My dear mother blessed

(or

perhaps cursed) all of her children with two middle initials,

in my case “D” and “E”. This has caused me a good deal of trouble.

It seems that TRW sells certain parts of your credit information, such as your name and

a demographic profile. I recently got a new credit card from Gottchalks and found to my

chagrin that my name had been truncated to “Darrell D. Long”. I went to the credit

manager and was assured that things would be fixed. Well, two things happened: I got a

new credit card, this time as “Darrell E. Long”, and TRW now has an annotation in my

file to the effect “File variation: middle initial is E”. Soon after this I start getting mail

for “Darrell E. Long”

(along

with the usual “Darrell Long” and “Darrell D. Long” and

the occasional “Darrell D. E. Long”).

I called up the credit bureau and it seems that the programmer who coded up the TRW

database decided that all good Americans are entitled to only one middle initial. As the

woman on the phone patiently told me “They only allocated enough megabytes

(sic)

the system for one middle initial, and it would probably be awfully hard to change”.

Aside from the typical example of technobabble justification (“megabytes”), the

lesson here is the need to avoid tying software to the exact physical properties of data.

TRW’s system seems similar to those programs, mentioned in an earlier discussion, which

“knew” that postal codes consist of exactly five digits.

See page

18.

The author of the message reproduced above was mainly concerned about junk mail,

an unpleasant but not life-threatening event; the archives of the Risks forum are full of

computer-originated name confusions with more serious consequences. The “millenium

problem”, mentioned in the discussion of software maintenance, is another example of the

dangers of accessing data based on physical representation, this one with hundreds of

millions of dollars’ worth of consequences.

Plik z chomika:

musli_com

Inne pliki z tego folderu:

ADTO9G.PDF (466 KB)
CONCUE2K.PDF (933 KB)
CLASSUU1.PDF (571 KB)
ANALY5E5.PDF (229 KB)
ADAO8T.PDF (234 KB)

ADTO9G.PDF

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: