Abstraction, Encapsulation, and Information Hiding
By Edward V. Berard
The Object Agency
"A powerful agent is the right word. Whenever we come upon one
of those intensely right words in a book or a newspaper the
resulting effect is physical as well as spiritual, and electrically
prompt."
-- Mark Twain (Samuel Langhorne Clemens), Essay on
William Dean Howells,1906
PROLOGUE
I recently read a magazine article that said, "Encapsulation is just a
fancy name for information hiding." Since the writer was non-technical,
I just assumed that he was attempting to show that he really did not
understand technical matters. However, the passage reminded me of
several situations in which other authors -- both technical and
non-technical -- had confused encapsulation and information hiding.
Information hiding is not only confused with encapsulation, it is also
often confused with abstraction. For example, in a class I was teaching
recently, one of the students remarked that my definition of
information hiding was remarkably close to my definition of
abstraction. Since I had taken both definitions from different sources at
different times, I had not thought of comparing them side-by-side. When
I did, I was startled at how close the definitions were.
This led to some fanciful speculation on my part. "If encapsulation
could be confused with information hiding," I reasoned, "and
information hiding could also be confused with abstraction, then could
someone argue that abstraction and encapsulation were the same thing?"
Of course, when I said it this way, the argument sounded absurd.
Still, I was curious. I decided to gather a number of different
definitions for abstraction, information hiding, and encapsulation, and
to compare them. This article details what I found.
ABSTRACTION
"A view of a problem that extracts the essential information
relevant to a particular purpose and ignores the remainder of
the information."
-- [IEEE, 1983]
"The essence of abstraction is to extract essential properties
while omitting inessential details."
-- [Ross et al, 1975]
"Abstraction is a process whereby we identify the important
aspects of a phenomenon and ignore its details."
-- [Ghezzi et al, 1991]
"Abstraction is generally defined as 'the process of
formulating generalised concepts by extracting common qualities
from specific examples.'"
-- [Blair et al, 1991]
"Abstraction is the selective examination of certain aspects of
a problem. The goal of abstraction is to isolate those aspects
that are important for some purpose and suppress those aspects
that are unimportant."
-- [Rumbaugh et al, 1991]
"The meaning [of abstraction] given by the Oxford English
Dictionary (OED) closest to the meaning intended here is 'The
act of separating in thought'. A better definition might be
'Representing the essential features of something without
including background or inessential detail.'"
-- [Graham, 1991]
"[A] simplified description, or specification, of a system that
emphasizes some of the system's details or properties while
suppressing others. A good abstraction is one that emphasizes
details that are significant to the reader or user and suppress
details that are, at least for the moment, immaterial or
diversionary."
-- [Shaw, 1984]
"An abstraction denotes the essential characteristics of an
object that distinguish it from all other kinds of object and thus
provide crisply defined conceptual boundaries, relative to the
perspective of the viewer."
-- [Booch, 1991]
One point of confusion regarding abstraction is its use as both a
process and an entity. Abstraction, as a process, denotes the
extracting of the essential details about an item, or a group of items,
while ignoring the inessential details. Abstraction, as an entity,
denotes a model, a view, or some other focused representation for an
actual item. Abstraction is most often used as a complexity mastering
technique. For example, we often hear people say such things as: "just
give me the highlights" or "just the facts, please." What these people
are asking for are abstractions.
We can have varying degrees of abstraction, although these "degrees" are
more commonly referred to as "levels." As we move to higher levels of
abstraction, we focus on the larger and more important pieces of
information (using our chosen selection criteria). Another common
observation is that as we move to higher levels of abstraction, we tend
to concern ourselves with progressively smaller volumes of information,
and fewer overall items. As we move to lower levels of abstraction, we
reveal more detail, typically encounter more individual items, and
increase the volume of information with which we must deal.
[IEEE, 1983], [Ross et al, 1975], [Ghezzi et al, 1991], [Blair et al,
1991], [Rumbaugh et al, 1991], and [Graham, 1991] all appear to view
abstraction as a process. (Note that the [Blair et al, 1991] definition
is somewhat different from the others in that it suggests examining a
number of "specific examples" -- as opposed to examining a single
item.) [Shaw, 1984] and [Booch, 1991] describe abstraction as an
entity. Both views are equally valid, and, in fact, necessary.
We also note that there are many different types of abstraction, e.g.,
functional abstraction, data abstraction, process abstraction, and even
object abstraction. (See, for example, the following references:
[Alexandridis, 1986], [Guttag, 1977], [Liskov and Guttag, 1986], [Park,
1991], [Shaw, 1984], and [Zimmer, 1985].) Each of the above
definitions, because they are general definitions of abstraction,
correctly avoids describing which specific categories of information
are emphasized or de-emphasized.
Usually, abstraction is not defined in terms of information hiding,
e.g., note the use of words such as "ignore" and "extracting." However,
we should also note the use of the words "suppress" and "suppressing"
in some of the above examples. In short, you might say that abstraction
dictates that some information is more important than other
information, but (correctly) does not specify a specific mechanism for
handling the unimportant information.
INFORMATION HIDING
"The second decomposition was made using 'information hiding'
... as a criterion. The modules no longer correspond to steps in
the processing. ... Every module in the second decomposition is
characterized by its knowledge of a design decision which it hides
from all others. Its interface or definition was chosen to reveal as
little as possible about its inner workings."
-- [Parnas, 1972b]
"... the purpose of hiding is to make inaccessible certain
details that should not affect other parts of a system."
-- [Ross et al, 1975]
"... [I]nformation hiding: a module is characterized by the
information it hides from other modules, which are called its
clients. The hidden information remains a secret to the client
modules."
-- [Ghezzi et al, 1991]
"[Information hiding is] the principle that users of a software
component (such as a class) need to know only the essential
details of how to initialize and access the component, and do not
need to know the details of the implementation."
-- [Budd, 1991]
"The technique of encapsulating software design decisions in
modules in such a way that the module's interfaces reveal little
as possible about the module's inner workings; thus each module is
a 'black box' to the other modules in the system."
-- [IEEE, 1983]
"The process of hiding all the details of an object that do not
contribute to its essential characteristics; typically, the
structure of an object is hidden, as well as the implementation
of its methods. The terms information hiding and encapsulation
are usually interchangeable."
-- [Booch, 1991]
"The principle of information hiding is central. It says that
modules are used via their specifications, not their
implementations. All information about a module, whether
concerning data or function, is encapsulated with it and,
unless specifically declared public, hidden from other modules."
-- [Graham, 1991]
In his classic 1972 article ([Parnas, 1972b]), D.L. Parnas describes two
different implementation scenarios for a simple key word in context
(KWIC) application. One is decomposed and modularized based on the
steps one might take in accomplishing the purpose of the application.
(Parnas speculates that this approach would be taken by someone who is
basing their design on a flowchart.)
The second (and better) scenario is modularized based on "design
decisions." Parnas observes, "We propose instead that one begins with a
list of difficult design decisions or design decisions which are likely
to change. Each module is then designed to hide such a decision from
the others." Like Dijkstra ([Dijkstra, 1968]), Parnas advocates that
the details of these difficult and likely-to-change decisions be hidden
from the rest of the system. Further, the rest of the system will have
access to these design decisions only through well-defined, and (to a
large degree) unchanging interfaces. (See also [Parnas, 1972a].)
In truth, both of the scenarios presented by Parnas involve "information
hiding." In his first scenario, the hidden information involves the
details of the procedural steps necessary to accomplish the
application. (By 1971, when Parnas first published his work in a
university technical report, programmers had known for almost 20 years
of the usefulness of subroutines in mastering complexity.) The second,
and (very arguably) superior, scenario requires that the hidden
information be the details of difficult and/or likely-to-change design
decisions.
"Hiding information," in and of itself, was not new. For that matter,
the isolation of difficult and/or likely-to-change design decisions in
modules was also not new. (Dijkstra had done this earlier in his
implementation of the "THE"-Multiprogramming System.) The significance
of Parnas's 1972 article on software module specification lay in two
areas:
- His avocation and specification of the (then innovative)
technique of basing system modularization on design
decisions. (You would have to say that the article presented
a significantly different view of Dijkstra's "levels of
abstraction" approach.)
- His use of the term "information hiding." Virtually every
article which mentions the topic traces its origin to
[Parnas, 1972b].
Obviously, Parnas did not say all information hiding is good, nor did he
say that all information hiding techniques are equally useful. He was
identifying a particularly pragmatic approach to information hiding.
Just as with abstraction, there are degrees of information hiding. For
example, at the programming language level, C++ provides for public,
private, and protected members ([Ellis and Stroustrup, 1990]), and Ada
has both private and limited private types ([ARM, 1983]).
We can now identify some of the sources of confusion about the
differences between information hiding and abstraction, i.e.:
- Abstraction can be (and often is) used as a technique for
identifying which information should be hidden. For example,
in functional abstraction we might say that it is important
to be able to add items to a list, but the details of how that
is accomplished are not of interest and should be hidden. Using
data abstraction, we would say that a list is a place where we
can store information, but how the list is actually implemented
(e.g., as an array or as a series of linked locations) is
unimportant and should be hidden.
Confusion can occur when people fail to distinguish between the
hiding of information, and a technique (e.g., abstraction)
that is used to help identify which information is to be
hidden.
- Some of the definitions for abstraction can also be sources
of confusion. For example, words like "ignore," "omit," "extract,"
and "without including" are rather passive, and would not
necessarily imply the deliberate hiding of any information, e.g.,
"the information is there, and accessible, but we just ignore it."
However, words like "suppress" and "suppressing" present a somewhat
different image -- quite possibly the active and deliberate hiding
of information.
Now, let's look at the other definitions for information hiding:
- The [Ross et al, 1975] definition somewhat generalizes
Parnas's definition, but still stipulates that the information
that should be hidden are those "details that should not
affect other parts of a system."
- The [Ghezzi et al, 1991] definition also presents a somewhat
generalized view of Parnas's view on information hiding.
- The [Budd, 1991] and [Booch, 1991] definitions are
specialized to an object-oriented view of the world.
- Note the use of the words "encapsulating" and "encapsulated"
in [IEEE, 1983] and [Graham, 1991] respectively. As we shall see
in the next section, there is a significant difference
between information hiding and encapsulation. However, some
people might attempt to infer incorrectly from the [IEEE, 1983]
and [Graham, 1991] definitions for information hiding, that
encapsulation and information hiding are the same thing.
ENCAPSULATION
"1. to enclose in or as if in a capsule"
-- [Mish, 1988]
"The concept of encapsulation as used in an object-oriented
context is not essentially different from its dictionary
definition. It still refers to building a capsule, in the case a
conceptual barrier, around some collection of things."
-- [Wirfs-Brock et al, 1990]
"It is a simple, yet reasonable effective, system-building
tool. It allows suppliers to present cleanly specified
interfaces around the services they provide. A consumer has full
visibility to the procedures offered by an object, and no visibility
to its data. From a consumer's point of view, and object is a
seamless capsule that offers a number of services, with no
visibility as to how these services are implemented ... The
technical term for this is encapsulation."
-- [Cox, 1986]
"Encapsulation or equivalently information hiding refers to the
practice of including within an object everything it needs, and
furthermore doing this in such a way that no other object need ever
be aware of this internal structure."
-- [Graham, 1991]
"We say that the changeable, hidden information becomes the
secret of the module; also, according to a widely used jargon, we
say that such information is encapsulated within the implementation."
-- [Ghezzi et al, 1991]
"Data hiding is sometimes called encapsulation because the data
and its code are put together in a package or 'capsule.'"
-- [Smith, 1991]
"Encapsulation is used as a generic term for techniques which
realize data abstraction. Encapsulation therefore implies the
provision of mechanisms to support both modularity and information
hiding. There is therefore a one to one correspondence in this
case between the technique of encapsulation and the principle of
data abstraction."
-- [Blair et al, 1991]
"Encapsulation (also information hiding) consists of separating
the external aspects of an object which are accessible to other
objects, from the internal implementation details of the object,
which are hidden from other objects."
-- [Rumbaugh et al, 1991]
"[E]ncapsulation -- also known as information hiding --
prevents clients from seeing its inside view, were the behavior
of the abstraction is implemented."
-- [Booch, 1991]
Like abstraction, the word "encapsulation" can be used to describe
either a process or an entity. As a process, encapsulation means the
act of enclosing one or more items within a (physical or logical)
container. Encapsulation, as an entity, refers to a package or an
enclosure that holds (contains, encloses) one or more items. It is
extremely important to note that nothing is said about "the walls of
the enclosure." Specifically, they may be "transparent," "translucent,"
or even "opaque."
Programming languages have long supported encapsulation. For example,
subprograms (e.g., procedures, functions, and subroutines), arrays, and
record structures are common examples of encapsulation mechanisms
supported by most programming languages. Newer programming languages
support larger encapsulation mechanisms, e.g., "classes" in Simula
([Birtwistle et al. 1973]), Smalltalk ([Goldberg and Robson, 1983]),
and C++, "modules" in Modula ([Wirth, 1983]), and "packages" in Ada.
If encapsulation was "the same thing as information hiding," then one
might make the argument that "everything that was encapsulated was also
hidden." This is obviously not true. For example, even though
information may be encapsulated within record structures and arrays,
this information is usually not hidden (unless hidden via some other
mechanism).
Another example of encapsulated, but not hidden, information is the
(highly undesirable) "block of global information" technique
reminiscent of FORTRAN's common blocks. Unfortunately, it is quite easy
in some object-oriented languages to create blocks of global data in
the form of classes. Specifically, it is possible to create classes
with nothing but constants and variables in their public interfaces,
i.e., there are no operations in the interface. (For reasons why this
is undesirable, see discussions of "module coupling," e.g., [Myers,
1978] and [Yourdon and Constantine, 1979].)
It is indeed true that encapsulation mechanisms such as classes allow
some information to be hidden. However, these same encapsulation
mechanisms also allow some information to be visible. Some even allow
varying degrees of visibility, e.g., C++'s public, protected, and
private members.
Even arguing that encapsulation is necessary for information hiding is
not as simple as one might suspect. Of course, one could very loosely
define encapsulation such that any hidden information is (logically or
physically) encapsulated in something.
Examining the cited definitions for encapsulation above, we make the
following observations:
- [Wirfs-Brock et al, 1990] comes closest to a simple,
straightforward definition for encapsulation.
- Brad Cox's definition ([Cox, 1986]) allows for encapsulation
to hide some information ("full visibility to the procedures
offered by an object"), while hiding other information ("no
visibility to its data").
- Although not as clean as it could be, the definition supplied
by [Blair et al, 1991] presents an accurate view of the
relationship among abstraction, information hiding, and
encapsulation.
- [Ghezzi et al, 1991] at least acknowledges the confusion
associated with information hiding and encapsulation, i.e.,
"widely used jargon."
- [Booch, 1991], [Graham, 1991], [Rumbaugh et al, 1991], and
[Smith, 1991] make no (or very little) distinction between
"information hiding" and "encapsulation."
CONCLUSIONS
Abstraction, information hiding, and encapsulation are very different,
but highly-related, concepts. One could argue that abstraction is a
technique that helps us identify which specific information should be
visible, and which information should be hidden. Encapsulation is then
the technique for packaging the information in such a way as to hide
what should be hidden, and make visible what is intended to be visible.
It is not hard to see how abstraction, information hiding, and
encapsulation became confused with one another. Further, one could
argue that, regardless of their "dictionary definitions," these terms
have evolved new meanings in the context of software engineering, e.g.,
in much the same way as "paradigm" has. (See, e.g., [Kuhn, 1962].)
However, a stronger argument can be made for keeping the concepts, and
thus the terms, distinct.
BIBLIOGRAPHY
[Alexandridis, 1986]. N.A. Alexandridis, "Adaptable Software and
Hardware : Problems and Solutions," Computer , Vol. 19, No. 2, February
1986, pp. 29 - 39.
[ARM, 1983]. Reference Manual for the Ada Programming Language,
ANSI/MIL-STD 1815A (1983) , United States Department of Defense,
February 1983.
[Birtwistle et al. 1973]. G. Birtwistle, O. Dahl, B. Myhrtag and K.
Nygaard, Simula Begin, Auerbach Press, Philadelphia, 1973.
[Blair et al, 1991]. G. Blair, J. Gallagher, D. Hutchison, and D.
Sheperd, Object-Oriented Languages, Systems and Applications, Halsted
Press, New York, New York, 1991.
[Booch, 1991]. G. Booch, Object-Oriented Design With Applications,
Benjamin/Cummings, Menlo Park, California, 1991.
[Budd, 1991]. T. Budd, An Introduction to Object-Oriented Programming,
Addison-Wesley, Reading, Massachusetts, 1991.
[Cox, 1986]. B.J. Cox, Object Oriented Programming: An Evolutionary
Approach, Addison-Wesley, Reading, Massachusetts, 1986.
[Dijkstra, 1968]. E.W. Dijkstra, "Structure of the
'THE'-Multiprogramming System," Communications of the ACM, Vol. 11, No.
5, May 1968, pp. 341-346.
[Ellis and Stroustrup, 1990]. M.A. Ellis and B. Stroustrup, The
Annotated C++ Reference Manual, Addison-Wesley, Reading, Massachusetts,
1990.
[Ghezzi et al, 1991]. C. Ghezzi, M. Jazayeri, and D. Mandrioli,
Fundamentals of Software Engineering, Prentice-Hall, Englewood Cliffs,
New Jersey, 1991.
[Goldberg and Robson, 1983]. A. Goldberg and D. Robson, Smalltalk-80:
The Language and Its Implementation, Addison-Wesley, Reading,
Massachusetts, 1983.
[Graham, 1991]. I. Graham, Object-Oriented Methods, Addison-Wesley,
Reading, Massachusetts, 1991.
[Guttag, 1977]. J. Guttag, "Abstract Data Types and the Development of
Data Structures," Communications of the ACM, Vol. 20, No. 6, June 1977,
pp. 396 - 404.
[IEEE, 1983]. IEEE, IEEE Standard Glossary of Software Engineering
Terminology, The Institute of Electrical and Electronic Engineers, New
York,New York, 1983.
[Kuhn, 1962]. T.S. Kuhn, The Structure of Scientific Revolutions,
University of Chicago Press, Chicago, Illinois, 1962.
[Liskov and Guttag, 1986]. B. Liskov and J. Guttag, Abstraction and
Specification in Program Development, The MIT Press, Cambridge,
Massachusetts, 1986.
[Mish, 1988], F.C. Mish, Editor in Chief, Webster's Ninth New Collegiate
Dictionary, Merriam-Webster Inc., Springfield, Massachusetts, 1988.
[Myers, 1978]. G.J. Myers, Composite/Structured Design, Van Nostrand
Reinhold, New York, New York, 1978.
[Park, 1991]. H.-S. Park, "Abstract Object Types = Abstract Knowledge
Types + Abstract Data Types + Abstract Connector Types," Journal of
Object-Oriented Programming, Vol. 4, No. 3, June 1991, pp. 37 - 39, 42
- 44, 46 - 48, 51 - 52.
[Parnas, 1971]. D.L. Parnas, Information Distribution Aspects of Design
Methodology, Technical Report, Department of Computer Science,
Carnegie-Mellon University, February 1971.
[Parnas, 1972a]. D.L. Parnas, "A Technique for Software Module
Specification With Examples," Communications of the ACM, Vol. 15, No.
5, May 1972, pp. 330 - 336.
[Parnas, 1972b]. D.L. Parnas, "On the Criteria To Be Used in Decomposing
Systems Into Modules," Communications of the ACM, Vol. 5, No. 12,
December 1972, pp. 1053-1058.
[Ross et al, 1975]. D.T. Ross, J.B. Goodenough, and C.A. Irvine,
"Software Engineering: Process, Principles, and Goals," IEEE Computer,
Vol. 8, No. 5, May 1975, pp. 17 - 27.
[Rumbaugh et al, 1991]. J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy,
and W. Lorensen, Object-Oriented Modeling and Design, Prentice-Hall,
Englewood Cliffs, New Jersey, 1991.
[Shaw, 1984]. M. Shaw, "Abstraction Techniques in Modern Programming
Languages," IEEE Software, Vol. 1, No. 4, October 1984, pp. 10 - 26.
[Smith, 1991]. D.N. Smith, Concepts of Object-Oriented Programming,
McGraw-Hill, New York, New York, 1991.
[Wirfs-Brock et al, 1990]. R. Wirfs-Brock, B. Wilkerson, and L. Wiener,
Designing Object-Oriented Software, Prentice-Hall, Englewood Cliffs,
New Jersey, 1990.
[Wirth, 1983]. N. Wirth, Programming In Modula-2, Second Edition,
Springer-Verlag, New York, New York, 1983.
[Yourdon and Constantine, 1979]. E. Yourdon and L.L. Constantine,
Structured Design: Fundamentals of a Discipline of Computer Program and
Systems Design, Prentice-Hall, Englewood Cliffs, New Jersey, 1979.
[Zimmer, 1985]. J.A. Zimmer, Abstraction for Programmers, McGraw-Hill,
New York, New York, 1985.