






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of abstraction and its importance in computer science, specifically in the context of abstract data types. Abstraction is the process of focusing on essential features while ignoring irrelevant details. Various abstract data types such as stacks, queues, and maps, and their real-life equivalents. It also explains the relationship between abstraction and encapsulation, and provides examples and exercises.
Typology: Papers
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Abstraction is the process of trying to identify the most important or inherent qualities of an object or model, and ignoring or omitting the unimportant aspects. It brings to the forefront or highlights certain features, and hides other elements. In computer science we term this process information hiding. Abstraction is used in all sorts of human endeavors. Think of an atlas. If you open an atlas you will often first see a map of the world. This map will show only the most significant features. For example, it may show the various mountain ranges, the ocean currents, and other extremely large structures. But small features will almost certainly be omitted. A subsequent map will cover a smaller geographical region, and will typically possess more detail. For example, a map of a single continent (such as South America) may now include political boundaries, and perhaps the major cities. A map over an even smaller region, such as a country, might include towns as well as cities, and smaller geographical features, such as the names of individual mountains. A map of an individual large city might include the most important roads leading into and out of the city. Maps of smaller regions might even represent individual buildings. Notice how, at each level, certain information has been included, and certain information has been purposely omitted. There is simply no way to represent all the details when an artifact is viewed at a higher level of abstraction. And even if all the detail could be described (using tiny writing, for example) there is no way that people could assimilate or process such a large amount of information. Hence details are simply left out. Abstraction is an important means of controlling complexity. When something is viewed at an abstract level only the most important features are being emphasized. The details that are omitted need not be remembered or even recognized. Another term that we often use in computer science for this process is encapsulation. An encapsulation is a packaging, placing items into a unit, or capsule. The key consequence of this process is that the encapsulation can be viewed in two ways, from the inside and from the outside. The outside view is often a description of the task being performed, while the inside view includes the implementation of the task. An example of the benefits of abstraction can be seen by imagining calling the function used to compute the square root of a double precision number. The only information you typically need to know is the name of the function (say, sqrt), the argument types, and perhaps what it will do in exceptional conditions (say, if you pass it a negative number). The computation of the square root is actually a double sqrt (double n) { double result = n/2; while (… ) { … } return result; }
nontrivial process. As we described in Chapter 2, the function will probably use some sort of approximation technique, such as Newtons iterative method. But the details of how the result is produced have been abstracted away, or encapsulated within the function boundary, leaving you only the need to understand the description of the desired result. Programming languages have various different techniques for encapsulation. The previous paragraph described how functions can be viewed as one approach. The function cleanly separates the outside, which is concerned with the “what” – what is the task to be performed, from the inside, the “how” – how the function produces its result. But there are many other mechanisms that serve similar purposes. Some languages (but not C) include the concept of an interface. An interface is typically a collection of functions that are united in serving a common purpose. Once again, the interface shows only the function names and argument types (this is termed the function signature ), and not the bodies, or implementation of these actions. In fact, there might be more than one implementation for a single interface. At a higher level, some languages include features such as modules, or packages. Here, too, the intent is to provide an encapsulation mechanism, so that code that is outside the package need only know very limited details from the internal code that implements the package. Interface Files The C language, which we use in this book, has an older and more primitive facility. Programs are typically divided into two types of files. Interface files, which traditionally end with a .h file extension, contain only function prototypes, interface descriptions for individual files. These are matched with an implementation file, which traditionally end with a .c file extension. Implementation files contain, as the name suggests, implementations of the functions described in the interface files, as well as any supporting functions that are required, but are not part of the public interface. Interface files are also used to describe standard libraries. More details on the standard C libraries are found in Appendix A.
The study of data structures is concerned largely with the need to maintain collections of values. These are sometimes termed containers. Even without discussing how these collections can be implemented, a number of different types of containers can be identified purely by their purpose or behavior. This type of description is termed an abstract data type. public interface Stack { public void push (Object a); public Object top (); public void pop (); public boolean isEmpty (); };
A queue , on the other hand, removes values in exactly the same order that they were inserted. This is termed FIFO order (first-in, first-out). A queue of people waiting in line to enter a theater is a useful metaphor. The deque combines features of the stack and queue. Elements can be inserted at either end, and removed from either end, but only from the ends. A good mental image of a deque might be placing peas in a straw. They can be inserted at either end, or removed from either end, but it is not possible to access the peas in the middle without first removing values from the end. A priority queue maintains values in order of importance. A metaphor for a priority queue is a to-do list of tasks waiting to be performed, or a list of patients waiting for an operating room in a hospital. The key feature is that you want to be able to quickly find the most important item, the value with highest priority. A map , or dictionary , maintains pairs of elements. Each key is matched to a corresponding value. They keys must be unique. A good metaphor is a dictionary of word/definition pairs. Each of these abstractions will be explored in subsequent chapters, and you will develop several implementations for all of them.
Before a container can be used in a running program it must be matched by an implementation. The majority of this book will be devoted to explaining different implementations techniques for the most common data abstractions. Just as there are only a few classic abstract data types, with many small variations on a common theme, there are only a handful of classic implementation techniques, again with many small variations. The most basic way to store a collection of values is an array. An array is nothing more than a fixed size block of memory, with adjacent cells in memory holding each element in the collection:
A disadvantage of the array is the fixed size, which typically cannot be changed during the lifetime of the container. To overcome this we can place one level of indirection between the user and the storage. A dynamic array stores the size and capacity of a container, and a pointer to an array in which the actual elements are stored. If necessary, the internal array can be increased during the course of execution to allow more elements to be stored. This increase can occur without knowledge of the user. Dynamic arrays are introduced in Worksheet 14, and used in many subsequent worksheets. The fact that elements in both the array and the dynamic array are stored in a single block is both an advantage and a disadvantage. When collections remain roughly the same size during their lifetime the array uses only a small amount of memory. However, if a collection changes size dramatically then the block can end up being largely unused. An alternative is a linked list. In a linked list each element refers to (points to) the next in sequence, and are not necessary stored in adjacent memory locations. Both the array and the linked list suffer from the fact that they are linear organizations. To search for an element, for example, you examine each value one after another. This can be very slow. One way to speed things up is to use a tree, specifically a binary tree. A search in a binary tree can be performed by moving from the top (the root) to the leaf (the bottom) and can be much faster than looking at each element in turn. element 0 element 1 element 2 element 3 element 4
Many more variations on these themes, such as skip lists (a randomized tree structure imposed on a simple linked list), or heaps (a binary tree organized as a priority queue) will be presented as we explore this topic.
In the first part of this book we have made little reference to the type of values being held in a collection. Where we have used a data structure, such as the array used in sorting algorithms, the element type has normally been a simple floating-point number (a double). In the development that follows we want to generalize our containers so that they can maintain values of many different types. Unfortunately, C provides only very primitive facilities for doing so. A major tool we will use is symbolic name replacement provided by the C preprocessor. This facility allows us to define a name and value pair. Prior to compilation, the C preprocessor will systematically replace each occurrence of the name with a value. For example, we will define our element type as follows:
Following this definition, we can use the name EleType to represent the type of value our container will hold. This way, the user need only change the one definition in order to modify the type of value a collection can maintain. Another feature of the preprocessor allows us to make this even easier. The statement #ifndef informs the preprocessor that any text between the statement and a matching #endef statement should only be included if the argument to the ifndef is n ot already def ined. The definition of EleType in the interface file will be written as follows
These statements tell the preprocessor to only define the name EleType if it has not already been defined. In effect, it makes double into our default value, but allows the user to provide an alternative, by preceding the definition with an alternative definition. If the user wants to define the element type as an integer, for example, they simple precede the above with a line
A second feature of C that we make extensive use of in the following chapters is the equivalence between arrays and pointers. In particular, when an array must be allocated dynamically (that is, at run time), it is stored in a pointer variable. The function used to allocate memory is termed malloc. The malloc function takes as argument an integer representing the number of bytes to allocate. The computation of this quantity is made easier by another function, sizeof, which computes the size of its argument type.
You saw an example of the use of malloc and sizeof in the merge sort algorithm described in Chapter 4. There the malloc was used to create a temporary array. Here is another example. The following bit of code takes an integer value stored in the variable n, and allocates an array that can hold n elements of whatever type is represented by EleType. Because the malloc function can return zero if there is not enough memory for the request, the result should always be checked. Because malloc returns an indetermined pointer type, the result must be cast to the correct form. int n; EleType * data; … n = 42; /* n is given some value */ … data = (EleType ) malloc(n * sizeof(EleType)); / array of size n is allocated / assert (data != 0); / check that allocation worked */ … free (data); Dynamically allocated memory must be returned to the memory manager using the free operation. You should always make sure that any memory that is allocated is eventually freed. This is an idiom you will see repeatedly starting in worksheet 14. We will be making extensive use of pointers, but treating them as if they were arrays. Pointers in C can be indexed, exactly as if were arrays.
each of the classic abstractions describe one or more sequences of actions that should always produce an error.
Wikipedia (http://en.wikipedia.org/wiki/Main_Page) has a good explanation of the concept of abstract data types. Links from that page explore most of the common ADTs. Another definition of ADT, as well as definitions of various forms of ADTs, can be found on DADS (http://www.nist.gov/dads/) the Dictionary of Algorithms and Data Structures maintained by the National Institute of Standards and Technology. Wikipedia has entries for many common C functions, such as malloc. There are many on-line tutorials for the C programming language. A very complete tutorial has been written in Brian Brown, and is mirrored at many sites. You can find this by googling the terms “Brian Brown C programming”.